x推文自动抓取工具
抓取 x.com 用户推文,生成中英双语日报,发布到飞书云文档,并将链接推送到微信。
完整流程 / Full Workflow:
- 启动 Chrome CDP → 2. 抓取推文 → 3. 翻译排版 → 4. 推送飞书 → 5. 发送链接到微信
支持两种时间范围模式:
- 默认模式: 前一天 00:00 ~ 23:59(北京时间),即完整一天
- 自定义模式: 通过
TIME_START/TIME_END指定任意起止时间(ISO 8601 格式)
触发条件 / Trigger Criteria
当用户提出以下请求时使用本技能:
- 抓取 x.com 用户推文并推送到飞书
- 从 x.com 内容生成"日报"(daily report)
- 获取用户从前一天 0 点到 24 点(北京时间)的推文
- 将飞书日报链接发送到微信
- "Scrape tweets from x.com and push to Feishu, send link to WeChat"
Workflow
Phase 1 — Launch Chrome with CDP
The scraping script requires a logged-in Chrome instance with DevTools Protocol enabled. The user's normal Chrome cannot be reused directly (sandbox restrictions); a temporary profile must be created.
-
Kill existing Chrome:
pkill -9 -f "Google Chrome" -
Copy essential session files to a temp profile (do NOT copy the full profile — it is tens of GB and will hang):
rm -rf /tmp/chrome-debug-profile mkdir -p /tmp/chrome-debug-profile/Default for f in "Cookies" "Cookies-journal" "Login Data" "Login Data-journal" \ "Network" "Preferences" "Web Data" "Web Data-journal"; do src="$HOME/Library/Application Support/Google/Chrome/Default/$f" [ -e "$src" ] && cp -r "$src" /tmp/chrome-debug-profile/Default/ 2>/dev/null done # Also copy top-level files for f in "Local State" "Last Version"; do src="$HOME/Library/Application Support/Google/Chrome/$f" [ -e "$src" ] && cp "$src" /tmp/chrome-debug-profile/ 2>/dev/null done -
Launch Chrome with the temp profile and debugging port:
nohup /Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome \ --remote-debugging-port=9222 \ --user-data-dir=/tmp/chrome-debug-profile \ --no-first-run \ --no-default-browser-check \ > /tmp/chrome-cdp.log 2>&1 & sleep 6 -
Verify CDP is reachable:
curl -s --connect-timeout 5 http://127.0.0.1:9222/json/versionIf this returns JSON with a
Browserfield, Chrome is ready.
All of the above MUST use dangerouslyDisableSandbox: true — macOS sandbox
blocks process management of system Chrome otherwise.
Phase 2 — Scrape Tweets
Run the bundled scraping script.
Default time range (前一天 00:00 ~ 23:59 北京时间):
cd <workspace> && \
TARGET_USERNAME="DeItaone" \
OUTPUT_DIR="<workspace>" \
NODE_OPTIONS="" \
NODE_PATH=<workspace_node_modules> \
<node_path> <skill_dir>/scripts/scrape_tweets.js
Custom time range (自定义时间范围):
cd <workspace> && \
TARGET_USERNAME="DeItaone" \
TIME_START="2026-05-24T09:00:00+08:00" \
TIME_END="2026-05-26T09:00:00+08:00" \
OUTPUT_DIR="<workspace>" \
NODE_OPTIONS="" \
NODE_PATH=<workspace_node_modules> \
<node_path> <skill_dir>/scripts/scrape_tweets.js
Environment variables / 环境变量:
| 变量 / Variable | 说明 / Description | 默认值 / Default |
|---|---|---|
| TARGET_USERNAME | x.com 用户名(不含 @) | DeItaone |
| TIME_START | 时间窗口起点(ISO 8601),如 2026-05-24T09:00:00+08:00 | 自动计算(前一天 00:00 北京时间) |
| TIME_END | 时间窗口终点(ISO 8601),如 2026-05-26T09:00:00+08:00 | 自动计算(前一天 23:59 北京时间) |
| OUTPUT_DIR | tweets_raw.json 输出目录 | 当前工作目录 |
| CDP_URL | Chrome DevTools Protocol 地址 | http://127.0.0.1:9222 |
时间格式说明 / Time Format:
- 支持时区偏移:
2026-05-24T09:00:00+08:00(北京时间)、2026-05-24T01:00:00Z(UTC) - TIME_START 和 TIME_END 必须同时提供或同时省略
- TIME_START 必须早于 TIME_END
The script:
If the script reports "Not logged in", the temp profile did not retain the session. In that case, ask the user to log into x.com in the debug Chrome window and re-run.
Phase 3 — Translate & Format
-
Read
tweets_raw.jsonand translate each tweet into Chinese. Keep ticker tags ($NVDA,$TSLA) in the translation. Use financial-news terminology. -
Write translations as a JSON mapping file
translations.json:{ "ENGLISH PREFIX TEXT...": "中文翻译...", ... }Use the first 60–80 characters of each English tweet as the key.
-
Run the formatting script:
TRANSLATIONS_PATH=<workspace>/translations.json \ python3 <skill_dir>/scripts/format_for_feishu.py \ <workspace>/tweets_raw.json \ <workspace>/report.md \ --title "Title 日报" \ --author @username -
The script produces a markdown file with the structure defined in
references/feishu_format.md. It matches translations by longest prefix match and inserts placeholders for any unmatched tweets. -
Scan the generated markdown for
(翻译待补充)placeholders. For any remaining, manually add the translations by editing the markdown file.
Phase 4 — Push to Feishu
Use lark-cli to create a Feishu cloud document from the markdown:
LARK_CLI="<path-to-lark-cli>"
NODE_OPTIONS="" "$LARK_CLI" docs +create \
--api-version v2 \
--as user \
--doc-format markdown \
--content "@<path-to-report.md>" \
--title "Title 日报"
- Always use
--api-version v2and--doc-format markdown. - The
@prefix on--contentsignals a file path. - Run from the directory containing the markdown, or use an absolute path.
- The command returns a Feishu document URL. Save this URL — it will be used in Phase 5 to send to WeChat. Also display it to the user.
Phase 5 — Push Link to WeChat / 推送链接到微信
After the Feishu document is created, send the link to the user's WeChat via the
WorkBuddy Mini Program (微信小程序) using deliver_attachments.
-
Create a simple summary file containing the Feishu link:
cat > <workspace>/feishu_link.md << 'EOF' # 📊 推文日报已生成 **飞书文档链接 / Feishu Doc Link:** <FEISHU_DOC_URL> **博主 / Author:** @<TARGET_USERNAME> **时间范围 / Time Range:** <time_range> **推文数量 / Tweet Count:** <count> --- 点击上方链接查看完整日报 / Click the link above to view the full report. EOF -
Use
deliver_attachmentsto push the summary to WeChat:deliver_attachments({ attachments: ["<workspace>/feishu_link.md"], explanation: "推送飞书日报链接到微信小程序" })
Note / 注意: This requires the user to have the "产物回传到小程序" (Deliver Artifacts to Mini Program) toggle enabled in WorkBuddy Mini Program connection settings.
Phase 6 — Cleanup
- Close the debug Chrome:
pkill -f "chrome-debug-profile" - The
tweets_raw.json,translations.json,report.md, andfeishu_link.mdfiles in the workspace are intermediate artifacts. Keep them for traceability.
Key Pitfalls
| Problem | Cause | Fix |
|---------|-------|-----|
| CDP connection refused | Chrome not running with --remote-debugging-port | Re-launch Chrome per Phase 1 |
| "Not logged in" in script output | Temp profile missing cookies | Ask user to log in via the debug Chrome window |
| Full profile copy hangs | Chrome profile is 10–50 GB | Only copy the files listed in Phase 1 step 2 |
| xcancel.com pagination blocked | Anti-bot verification | Never use xcancel/Nitter — always use x.com via CDP |
| lark-cli auth expired | Token TTL | Re-run — the CLI auto-refreshes |
Bundle Contents
scripts/scrape_tweets.js— Playwright CDP scraper for x.com timelinesscripts/format_for_feishu.py— Generates bilingual markdown from raw tweetsreferences/feishu_format.md— Document structure and Feishu CLI reference
微信扫一扫