OnionClaw — Tor / Dark Web OSINT
v2.1.13 · by JacobJandon · MIT-0 License github.com/JacobJandon/OnionClaw
OnionClaw routes all requests through the Tor network. It queries 12 verified
dark web search engines simultaneously, fetches .onion hidden-service pages,
rotates Tor circuits, schedules recurring watch/alert jobs, and produces
structured OSINT reports (Markdown, JSON, STIX, MISP, CSV) using the Robin
investigation pipeline.
Setup (run once after install)
# 1. Install Python dependencies
pip3 install requests[socks] beautifulsoup4 python-dotenv stem
# 2. Interactive first-run wizard (sets up .env, torrc, and Tor in one step)
python3 {baseDir}/setup.py
# — OR — manual setup:
cp {baseDir}/.env.example {baseDir}/.env
# Edit {baseDir}/.env — add your LLM key (search + fetch work without one)
Start Tor (required before any command):
# Linux:
sudo apt install tor && sudo systemctl start tor
# macOS:
brew install tor && brew services start tor
# Custom (no root needed — setup.py can do this automatically):
tor -f /tmp/sicry_tor.conf &
# torrc: SocksPort 9050 / ControlPort 9051 / CookieAuthentication 1 / DataDirectory /tmp/tor_data
Enable circuit rotation (required for renew.py and --daemon-poll):
Add to /etc/tor/torrc:
ControlPort 9051
CookieAuthentication 1
Then: systemctl restart tor
setup.py does this automatically.
Commands
Check Tor is running
Always run this first before any dark web operation.
python3 {baseDir}/check_tor.py
Returns your exit IP and tor_active: true/false. If false, tell the user to
start Tor before continuing.
Rotate Tor identity
Get a fresh exit node and a new three-hop circuit. Use between sessions or whenever a new IP is needed.
python3 {baseDir}/renew.py
Returns success: true/false. If false, ensure ControlPort 9051 is enabled
and TOR_DATA_DIR is set in .env (or use setup.py).
Check which search engines are alive
Ping all 12 engines via Tor and return latency + up/down for each.
python3 {baseDir}/check_engines.py
Run before a large search session; pass the alive engine names to --engines
to skip dead ones and save time.
Search the dark web
Query all 12 dark web engines simultaneously. Returns deduplicated
{title, url, engine} results.
# Basic:
python3 {baseDir}/search.py --query "SEARCH_TERM"
# Limit results:
python3 {baseDir}/search.py --query "SEARCH_TERM" --max 30
# Specific engines:
python3 {baseDir}/search.py --query "SEARCH_TERM" --engines Ahmia Tor66 Ahmia-clearnet
Available engines: Ahmia, OnionLand, Amnesia, Torland, Excavator, Onionway, Tor66, OSS, Torgol, TheDeepSearches, DuckDuckGo-Tor, Ahmia-clearnet
Tip: Use short keyword queries (≤5 words). Dark web indexes respond far better to focused keywords than natural-language questions.
Fetch a .onion page
Read the full text of any .onion URL (or clearnet URL) through Tor.
python3 {baseDir}/fetch.py --url "http://SOME.onion/path"
Returns: {title, text (first 3000 chars), links, status, error}.
If status: 0 or error is set, the hidden service is offline — they go
down frequently; try a different result from search.py.
OSINT analysis
Analyse raw dark web text with an LLM and produce a structured sectioned report.
# From a string:
python3 {baseDir}/ask.py --query "QUERY" --mode MODE --content "RAW_TEXT"
# From a file:
python3 {baseDir}/ask.py --query "QUERY" --mode MODE --file /path/to/content.txt
# From stdin (pipe):
echo "CONTENT" | python3 {baseDir}/ask.py --query "QUERY" --mode MODE
Analysis modes:
| Mode | Use for |
|---|---|
| threat_intel | General OSINT (default) — artifacts, insights, next steps |
| ransomware | Malware / C2 / MITRE ATT&CK TTPs, victim orgs, indicators |
| personal_identity | PII / breach exposure, severity, protective actions |
| corporate | Leaked credentials / code / internal docs, IR steps |
# With custom focus appended to the prompt:
python3 {baseDir}/ask.py --query "QUERY" --mode threat_intel \
--custom "Focus on cryptocurrency wallet addresses"
Full OSINT pipeline (single command)
Runs the complete Robin pipeline: refine query → check live engines → search → filter best results → batch scrape → OSINT analysis → save report
python3 {baseDir}/pipeline.py --query "INVESTIGATION_QUERY" --mode MODE
Essential flags:
| Flag | Default | Description |
|---|---|---|
| --query TEXT | required | Investigation topic (natural language OK — refined automatically) |
| --mode MODE | threat_intel | threat_intel / ransomware / personal_identity / corporate |
| --max N | 30 | Max raw results from search |
| --scrape N | 8 | Pages to batch-fetch (use 0 to skip scraping and get results-only report) |
| --custom TEXT | | Extra LLM instructions appended to the mode prompt |
| --out FILE | | Save report to file (exits 1 on permission error) |
| --format FMT | md | Output format: md / json / csv / stix / misp |
| --no-llm | | Skip all LLM steps — dump raw results / entity extraction only |
| --confidence | | Show BM25 confidence score per result |
| --engines NAME… | | Restrict to specific engines (skip dead ones) |
| --no-cache | | Bypass query/page cache for this run |
| --clear-cache | | Flush the result cache, then run |
| --resume JOB_ID | | Resume a checkpointed pipeline run by job ID |
| --interactive | | After the report, open a follow-up REPL for drill-down |
| --output-dir DIR | | Write <job_id>.<ext> into DIR (batch pipeline friendly) |
| --modes | | List all modes and their engine routing, then exit |
| --engine-stats | | Print per-engine reliability / latency table, then exit |
| --check-update | | Check for a newer OnionClaw release and exit |
| --version | | Print version and exit |
MISP-specific flags:
| Flag | Default | Description |
|---|---|---|
| --misp-threat-level N | 2 | MISP threat level 1–4 (1=high, 4=undefined) |
| --misp-distribution N | 0 | MISP distribution (0=your org, 1=connected, 2=all, 3=inherited) |
Watch / alert flags:
| Flag | Description |
|---|---|
| --watch | Register this query as a recurring watch job and exit |
| --interval HOURS | Re-run interval in hours for --watch (default 6) |
| --watch-check | Run all due watch jobs now and print alerts |
| --watch-check --output-dir DIR | Same but write each job's JSON to DIR (exits 1 on write error) |
| --watch-list | List all active watch jobs |
| --watch-disable JOB_ID | Disable a watch job by ID |
| --watch-clear-all | Disable ALL active watch jobs at once |
| --watch-daemon | (deprecated alias) Run as a blocking daemon loop |
| --daemon-poll SECONDS | Run --watch-check every N seconds in a daemon loop |
Daemon mode (continuous monitoring)
Keep OnionClaw running and poll watch jobs at a fixed interval:
python3 {baseDir}/pipeline.py --daemon-poll 3600 # check every hour
Scheduling watch jobs
Register a query as a recurring alert:
# Register (runs every 6 hours by default):
python3 {baseDir}/pipeline.py --query "ransomware hospital 2026" --watch --interval 6
# List all active jobs:
python3 {baseDir}/pipeline.py --watch-list
# Check due jobs now and write JSON files for each:
python3 {baseDir}/pipeline.py --watch-check --output-dir /tmp/alerts/
# Disable one job:
python3 {baseDir}/pipeline.py --watch-disable <JOB_ID>
# Clear all:
python3 {baseDir}/pipeline.py --watch-clear-all
Typical investigation flows
"Search the dark web for X"
python3 {baseDir}/check_tor.py— verify connectedpython3 {baseDir}/search.py --query "X"— search all 12 enginespython3 {baseDir}/fetch.py --url "URL"— read top 2–3 resultspython3 {baseDir}/ask.py --mode threat_intel --query "X" --content "..."— generate report
"Has company.com appeared in dark web leaks?"
python3 {baseDir}/check_tor.pypython3 {baseDir}/pipeline.py --query "company.com credentials leak" --mode corporate- Present the structured report
"Investigate ransomware group X"
python3 {baseDir}/check_tor.pypython3 {baseDir}/pipeline.py --query "GROUP_NAME ransomware" --mode ransomware
"Write a STIX bundle for this investigation"
python3 {baseDir}/pipeline.py \
--query "QUERY" --mode threat_intel \
--format stix --out bundle.json
"Fetch this .onion URL"
python3 {baseDir}/check_tor.pypython3 {baseDir}/fetch.py --url "URL"- Show the user the title + text content
"Monitor for new leaks mentioning acme.com, alert me daily"
python3 {baseDir}/pipeline.py \
--query "acme.com leak credentials" --watch --interval 24
# Later, in a cron job or daemon:
python3 {baseDir}/pipeline.py --watch-check --output-dir /tmp/acme-alerts/
Output formats
| Format | Flag | Use for |
|---|---|---|
| Markdown | --format md (default) | Human-readable reports, --out report.md |
| JSON | --format json | Structured machine-readable, automation |
| CSV | --format csv | Spreadsheet import, result lists |
| STIX 2.1 | --format stix | Threat-intel platforms (MISP, OpenCTI, Splunk ES) |
| MISP | --format misp | Direct MISP event import |
Important notes
- All traffic routes through Tor — tell the user this when relevant.
.onionhidden services go offline frequently.status: 0means the site is temporarily unreachable — try a different result fromsearch.py.- Dark web search indexes go down often — run
check_engines.pyfirst and pass only alive engine names with--engines. - LLM tools (
ask.py, pipeline steps 3/5/7) require an API key in{baseDir}/.env. SetLLM_PROVIDER=ollamafor fully local inference with no key.search.py,fetch.py,check_tor.py,renew.py, andcheck_engines.pywork with no key at all. --scrape 0skips page fetching. The pipeline still runs step 7 (LLM analysis on search-result metadata only) and writes--out/--output-dirnormally. AWARN: --scrape 0notice is printed to stderr.- Use responsibly and lawfully — OSINT, security research, and threat intelligence only.
Maintenance
Update the bundled sicry.py engine
OnionClaw bundles sicry.py from the upstream
SICRY™ repo.
After a new SICRY™ release, sync the bundled copy:
# Pull latest:
python3 {baseDir}/sync_sicry.py
# Pull a specific release tag:
python3 {baseDir}/sync_sicry.py --tag v2.1.13
# Preview without writing:
python3 {baseDir}/sync_sicry.py --dry-run
Checking for OnionClaw updates
OnionClaw checks the GitHub Releases API (published releases only — not plain git tags) for newer versions. A one-line notice is printed automatically at pipeline startup when an update is available.
# On-demand update check:
python3 {baseDir}/pipeline.py --check-update
# Programmatic:
import sicry
r = sicry.check_update()
if not r["up_to_date"]:
print(f"Update: {r['current']} → {r['latest']} {r['url']}")
# Upgrade:
git -C {baseDir} pull
python3 {baseDir}/sync_sicry.py
微信扫一扫