WebCrawlerAPI Skill
Use WebCrawlerAPI to get page content as markdown (single page scrape) or crawl entire websites (multi-page).
Setup — API Key
The API key must be set as an environment variable before running any curl commands:
export WEBCRAWLERAPI_API_KEY="your_api_key"
Get your key:
- Go to https://webcrawlerapi.com/
- Sign up at https://dash.webcrawlerapi.com/
- Visit https://dash.webcrawlerapi.com/access
- Copy your API key
If WEBCRAWLERAPI_API_KEY is not set, stop and ask the user to set it before proceeding.
Decision: Scrape vs Crawl
Scrape (single page) — default when user asks for:
- Content/markdown of a specific page or URL
- "Get me this page", "scrape this URL", "what does this page say"
- No mention of "website", "full site", "all pages", "crawl"
Crawl (multi-page) — when user asks for:
- "Crawl this website", "get all pages from", "full website content"
- Mentions a domain broadly (not a specific path)
- Wants multiple pages
Scrape — Single Page
Use POST /v2/scrape. Synchronous — result is returned immediately.
curl --fail --silent --show-error \
--request POST \
--url "https://api.webcrawlerapi.com/v2/scrape" \
--header "Authorization: Bearer ${WEBCRAWLERAPI_API_KEY}" \
--header "Content-Type: application/json" \
--data '{
"url": "<URL>",
"output_formats": ["markdown"]
}'
Scrape Response
The response contains markdown field directly — no polling needed:
{
"success": true,
"status": "done",
"markdown": "## Page Title\n\nPage content...",
"page_status_code": 200,
"page_title": "Page Title"
}
On success
Output the markdown content directly to the user. No need to save to files for scrape.
On failure
If success is false, show the error_code and error_message to the user.
Crawl — Full Website
Use POST /v1/crawl. Asynchronous — returns a job ID, then poll for results.
Step 1: Start the crawl
curl --fail --silent --show-error \
--request POST \
--url "https://api.webcrawlerapi.com/v1/crawl" \
--header "Authorization: Bearer ${WEBCRAWLERAPI_API_KEY}" \
--header "Content-Type: application/json" \
--data '{
"url": "<URL>",
"items_limit": 25,
"output_formats": ["markdown"]
}'
Response:
{ "id": "<JOB_ID>" }
Step 2: Poll job status (background loop)
Use a background Bash job to poll every 10 seconds until status is done or error:
JOB_ID="<JOB_ID>"
while true; do
RESULT=$(curl --fail --silent --show-error \
--request GET \
--url "https://api.webcrawlerapi.com/v1/job/${JOB_ID}" \
--header "Authorization: Bearer ${WEBCRAWLERAPI_API_KEY}")
STATUS=$(echo "$RESULT" | python3 -c "import sys,json; print(json.load(sys.stdin)['status'])")
echo "Job status: $STATUS"
if [ "$STATUS" = "done" ] || [ "$STATUS" = "error" ]; then
echo "$RESULT"
break
fi
sleep 10
done
Step 3: Download and save results
When job is done, for each job_item with status: done:
- Fetch the content from
markdown_content_url - Save to
.webcrawlerapi/<hostname>/<sanitized-path>.md
mkdir -p ".webcrawlerapi/<hostname>"
# For each job_item, fetch markdown_content_url and save:
curl --silent "<markdown_content_url>" \
--output ".webcrawlerapi/<hostname>/<sanitized-filename>.md"
Sanitize filenames: replace ://, /, ?, #, : with _. Trim leading underscores.
Step 4: Report to user
After saving, tell the user:
- Total pages crawled
- How many succeeded vs failed
- Where files were saved:
.webcrawlerapi/<hostname>/ - List the saved files
Notes
- Default
items_limitfor crawl: 25 (ask user if they want more) - For scrape, just output the markdown — don't save to disk
- For crawl, always save to
.webcrawlerapi/directory in current working dir - If the job returns
errorstatus, showlast_errorfrom job items and the job-level error if present - Never hardcode the API key — always use
${WEBCRAWLERAPI_API_KEY}
微信扫一扫