Crawler
Web crawling and scraping reference — robots.txt protocol, Scrapy framework, anti-bot detection, headless browsers, and legal considerations. No API keys or credentials required — outputs reference documentation only.
Commands
| Command | Description |
|---------|-------------|
| intro | Crawling vs scraping, robots.txt, sitemap |
| standards | HTTP caching, structured data, meta tags |
| troubleshooting | Anti-bot detection, JS rendering, encoding |
| performance | Concurrency, dedup, incremental, distributed |
| security | Legal landscape, ethical guidelines, proxies |
| migration | BeautifulSoup to Scrapy, requests to Playwright |
| cheatsheet | Scrapy commands, CSS/XPath, curl, user-agents |
| faq | Legality, JS pages, blocking, storage |
Output Format
All commands output plain-text reference documentation via heredoc. No external API calls, no credentials needed, no network access.
Powered by BytesAgain | bytesagain.com | hello@bytesagain.com
微信扫一扫