Web Collection
Use this skill for browser-extension collection tasks on:
- Douyin
- TikTok
- Xiaohongshu
- Amazon
- Bilibili
Online Documentation Routing
The online documents are the source of truth for user-facing guidance. Do not duplicate their long-form onboarding, paid-access, UI operation, or QA content in this skill. When a user asks for guidance, read the relevant online document if tools can access it; if not, give the user the link and say the online document should be used as the current guide.
- Quick start / first-time use / UI operation guide:
- https://vcn5grhrq8y0.feishu.cn/wiki/MoXrwwUN7iiUFkk8eWycs9JqntA?from=from_copylink
- Use when the user asks how to use the plugin, how to start, how to configure it for the first time, where to click, how paid access works, or needs step-by-step human instructions.
- QA / troubleshooting guide:
- https://vcn5grhrq8y0.feishu.cn/wiki/F83hw2w6Xi7EOFkIScrccrQYnnd?from=from_copylink
- Use when the user asks about errors, failed export, missing data, connector/cloud/local issues, access problems, supported scenarios, or other common questions.
Decision rule:
- If the user wants to learn, install, buy, activate, configure for the first time, or understand the UI, do not start collection. Send the quick-start link first and answer from that document when accessible.
- If the user reports a problem, asks whether a behavior is normal, or asks how to recover a failed run/export, send the QA link first and answer from that document when accessible.
- If the user asks to collect data now, continue with the execution contract below.
For collection execution, keep using this SKILL.md and the bundled scripts as the agent contract. For complex or ambiguous execution requests, read references/learning-guide.md as an offline routing and recovery summary after checking whether the QA guide applies.
Core Rules
- Use the user's normal Chrome environment, not the isolated
openclawbrowser profile. - Prefer the cloud connector flow over local connector mode unless the user explicitly asks for local mode or cloud credentials are unavailable.
- Never ask for configuration that is already present in environment variables.
- Local and cloud use the same recommended defaults and overall collection flow.
- Cloud adds only two extra required values:
idandtoken. - Default to synchronous closed-loop execution.
- Do not reply before the collection script finishes.
- Choose one execution mode first:
local: talk to the local bridge directly and only run the local send-command scriptcloud: call the cloud connector dispatch API and only run the cloud send-command script
- In
cloudmode, do not rewrite the collection payload. Only wrap it in:device_idactionpayload
- For personal bitable export, do not send
personalSmartfrom this skill. Sendpersonalplusdeduplication.enabled=true; the new plugin switches to its smart personal export internally.
First-Time Setup
This skill uses one preferences file:
$OPENCLAW_STATE_DIR/skill-state/web-collection/preferences.json
Fallback:
$HOME/.openclaw/skill-state/web-collection/preferences.json
Helper script:
bash {baseDir}/scripts/export_preference.sh show
bash {baseDir}/scripts/export_preference.sh check
bash {baseDir}/scripts/export_preference.sh apply-recommended
bash {baseDir}/scripts/export_preference.sh set-key defaultConnectionMode cloud
bash {baseDir}/scripts/export_preference.sh set-key defaultCloudDeviceId desktop-local-smoke-fix
bash {baseDir}/scripts/export_preference.sh set-key defaultCloudToken <user_api_key>
bash {baseDir}/scripts/export_preference.sh set-key defaultExportMode csv
bash {baseDir}/scripts/export_preference.sh set-key defaultDeduplicationEnabled true
bash {baseDir}/scripts/export_preference.sh set-key defaultDeduplicationStrategy keepOld
Required defaults:
defaultExportModedefaultMaxItemsdefaultFetchDetaildefaultDetailSpeed
Optional defaults with built-in fallback:
defaultDeduplicationEnableddefaults totruedefaultDeduplicationStrategydefaults tokeepOld
Mode-specific defaults:
local- no extra required key if the default local bridge URL works
cloud- cloud base URL is fixed to
https://i-sync.cnby default defaultCloudDeviceIdonly when it is not already provided by environment variablesdefaultCloudTokenonly when it is not already provided by environment variables
- cloud base URL is fixed to
run.sh enforces this. If these are incomplete, collection must not start.
First-run flow
On first use:
- Determine the execution mode first. Default to
cloudwhen the user does not specify a mode. - If the mode is
cloud, collect these values only when they are not already available from environment variables or stored preferences:defaultCloudDeviceIddefaultCloudToken
- Then handle the common defaults:
- 导出方式
- 默认采集条数
- 是否默认采集详情
- 默认采集速度
- 是否开启导出去重
- 去重保留策略
- Ask only one question for the common defaults:
推荐配置自己配置
- If the user chooses
推荐配置, run:
bash {baseDir}/scripts/export_preference.sh apply-recommended
- If the user chooses
自己配置, ask for all common values in one message, not one by one. - Only continue when
checkpasses.
Preferred cloud prompt:
检测到你要走云端分发,还需要这两个配置:
- device_id
- API token
请一次性发给我。
Preferred quick-reply prompt for common defaults:
常用配置还需要确认一次。
这些配置包括:
- 导出方式
- 默认采集条数
- 是否默认采集详情
- 默认采集速度
- 是否开启导出去重
- 去重保留策略
你可以直接用推荐配置,也可以自己配置。
[[quick_replies: 推荐配置, 自己配置]]
Preferred custom-config prompt:
好,我们一次性把默认配置定好。请直接按下面格式回复:
导出方式:CSV / 多维表格
默认采集条数:10 / 20 / 50 / 100
是否默认采集详情:是 / 否
默认采集速度:fast / medium / slow
是否开启导出去重:是 / 否
去重保留策略:保留原始数据 / 保留新数据
说明:
- 多维表格:适合查看、筛选、分享
- CSV:适合本地保存
- 采集详情:开启后结果更完整,但一般更慢
- 采集速度:推荐 fast
- 导出去重:推荐开启
- 保留原始数据:重复数据保留第一次导入版本
Recommended defaults:
- 运行位置:
cloud - 导出方式:
多维表格 - 采集条数:
20 - 采集详情:
true - 采集速度:
fast - 导出去重:
true - 去重保留策略:
keepOld(保留原始数据)
Cloud Mode
Use cloud mode when the collection request should be sent to the platform backend first, and then dispatched to the user's connected local connector.
Cloud responsibilities:
- call
/api/v1/connector/cloud/dispatch - authenticate with
Authorization: Bearer <user_api_key> - include
device_id - keep the collection body unchanged inside
payload - enforce a strict cloud payload template before dispatch to avoid missing fields
- default fallback when missing:
maxItems=20,mode=search,interval=300,fetchDetail=true,detailSpeed=fast
- default fallback when missing:
- poll
/api/v1/connector/cloud/commands/{command_id}for final status and result - if single-command query is unavailable, fallback to
/api/v1/connector/cloud/commands?device_id=... - treat
result+task_updatesas the source of completion snapshot
Do not:
- call the user's local
19820port from the cloud path - rewrite
payloadsemantics - mix local admin token logic into cloud requests
Connector Command Ladder
When collection fails, parameters look incomplete, or status is unclear, run connector checks in this order instead of guessing.
Layer 1: capability
GET /api/helpGET /api/routesGET /api/filters(or platform/method scoped)
Layer 2: diagnostics
GET /api/statusGET /api/platform-stateGET /api/cloud/statusPOST /api/preflightwith the final request body
Layer 3: execution and tracking
POST /api/collectGET /api/tasks/:id(local mode)GET /api/v1/connector/cloud/commands/{command_id}(cloud mode, preferred)GET /api/v1/connector/cloud/commands?device_id=...(cloud fallback)POST /api/stoporPOST /api/resetwhen stuck
Local command template (admin token required):
TOKEN="$(cat ~/.meixi-connector/bridge-admin-token.txt)"
curl -s -H "x-connector-admin-token: $TOKEN" "http://127.0.0.1:19820/api/status"
Cloud command template (async result):
curl -s -H "Authorization: Bearer <token_or_api_key>" \
"https://i-sync.cn/api/v1/connector/cloud/commands/<command_id>"
Export Behavior
bitable- run with
--export-target bitable - expect
export.tableUrlon success - keep the public connector mode as
personal - default payload includes
export.deduplication.enabled=trueandstrategy=keepOld - this intentionally triggers the new plugin's internal smart personal export path
- run with
csv- run with
--export-target csv - do not require a table link in the final reply
- do not attach deduplication settings
- run with
Deduplication fields are owned by the connector/plugin. Do not ask the user to choose a field and do not pass a field from this skill.
Smart Export Conversation Rule
For bitable export, this skill must treat personal export as a smart-export conversation, not as a plain success-or-fail step.
If collection succeeded but export failed:
- do not present the result as a full task failure first
- explicitly say collection completed and the failure happened while exporting to the original bitable table
- explain the likely causes in user-facing language
- then ask the user to choose one recovery path:
新建表继续导出导出为CSV
Recommended failure explanation:
采集已经完成,失败发生在导出到原多维表格这一步。
原表导出失败通常有几种原因:
- 原目标表不存在
- 原目标表字段结构与本次数据不一致
- 导出服务临时异常
为了避免这次结果丢失,我现在可以继续帮你导出。你可以选择:
- 新建表继续导出:在当前多维表格中创建一个新的数据表后继续导出
- 导出为CSV:直接把这次结果保存成 CSV 文件
[[quick_replies: 新建表继续导出, 导出为CSV]]
If the user chooses 导出为CSV, switch the follow-up run to CSV export.
If the user chooses 新建表继续导出 or 导出为CSV after a failed export and the previous result includes a connector taskId, do not start a new collection. Re-export from the cached task records:
bash {baseDir}/scripts/reexport_task.sh --task-id "<taskId>" --export-target csv
bash {baseDir}/scripts/reexport_task.sh --task-id "<taskId>" --export-target bitable --new-table
This uses POST /api/tasks/:id/export and must be described as reusing the previous task's cached records. If no task id is available, be honest that the skill cannot guarantee a no-recollect retry.
Expected bitable connector payload shape:
{
"autoExport": true,
"exportMode": "personal",
"export": {
"enabled": true,
"mode": "personal",
"deduplication": {
"enabled": true,
"strategy": "keepOld"
}
}
}
Do not use this shape:
{
"exportMode": "personalSmart"
}
Entry Point
Preferred wrapper:
bash {baseDir}/scripts/run.sh ...
The wrapper:
- runs
scripts/preflight_check.shfirst - applies stored preferences
- enforces required setup
- defaults to cloud dispatch mode when no mode is specified
- cloud mode dispatches only through
scripts/cloud_dispatch_loop.sh - local mode dispatches only through
scripts/collect_and_export_loop.sh - never mixes the local and cloud send-command scripts
Bundled Resources
scripts/preflight_check.sh- validates required defaults before dispatch
- treats environment-variable configuration as already satisfied and never asks for it
scripts/run.sh- chooses exactly one dispatch path based on
connection-mode - never mixes local and cloud send-command scripts
- chooses exactly one dispatch path based on
scripts/collect_and_export_loop.sh- local-only send-command script
scripts/cloud_dispatch_loop.sh- cloud-only send-command script
scripts/reexport_task.sh- re-exports cached records from a previous connector task id
- never starts a new collection
scripts/export_preference.sh- stores reusable defaults and masks cloud token in human-readable output
references/learning-guide.md- compact guidance for complex requests and asking rules
Common Commands
Douyin keyword search:
bash {baseDir}/scripts/run.sh \
--platform douyin \
--keyword "AI" \
--ensure-bridge
Douyin keyword search via cloud dispatch:
bash {baseDir}/scripts/run.sh \
--connection-mode cloud \
--cloud-device-id desktop-local-smoke-fix \
--cloud-token '<user_api_key>' \
--platform douyin \
--keyword "AI员工"
Amazon keyword search:
bash {baseDir}/scripts/run.sh \
--platform amazon \
--keyword "Chinese porcelain" \
--ensure-bridge
Bilibili keyword search:
bash {baseDir}/scripts/run.sh \
--platform bilibili \
--keyword "古董" \
--ensure-bridge
Platform Defaults
Wrapper defaults:
douyin=>videoKeywordtiktok=>keywordSearchxiaohongshu=>keywordSearchamazon=>keywordSearchbilibili=>keywordSearch
Supported methods:
douyin:videoKeyword,creatorKeyword,creatorLink,creatorVideo,videoComment,videoInfo,videoLinktiktok:keywordSearch,userVideo,tiktokComment,tiktokCreatorKeyword,tiktokCreatorLinkxiaohongshu:keywordSearch,creatorNote,creatorLink,creatorKeyword,noteLink,noteCommentamazon:keywordSearch,productLink,productReviewbilibili:keywordSearch,videoInfo,creatorVideo,bilibiliComment
Closed Loop
local mode:
- verify
pluginConnected=true - wait for idle state
- start
/api/collect - handle
TASK_RUNNINGviastop -> wait idle -> retry - poll
/api/tasks/<taskId>untilcompletedorerror - if export is required, verify the expected export result
cloud mode:
- query
/api/v1/connector/cloud/status?device_id=... - dispatch
action=collectto/api/v1/connector/cloud/dispatch - keep querying command result (
/api/v1/connector/cloud/commands/{command_id}preferred) - after each poll, refresh current collection state from command status
- wait for
completedor terminal error state - on completion, read
resultandtask_updatesfor records/count/export snapshot and include key fields in the final reply
Quick query examples:
curl -H "Authorization: Bearer <token_or_api_key>" \
"https://i-sync.cn/api/v1/connector/cloud/commands?device_id=<device_id>"
curl -H "Authorization: Bearer <token_or_api_key>" \
"https://i-sync.cn/api/v1/connector/cloud/commands/<command_id>"
Final Reply
When successful:
- Mention whether the run used
localorcloudmode. - If
cloudmode was used, include the command status. - If export mode is
bitableandexport.tableUrlexists, include the table link first. - If export mode is
csv, explicitly say export mode is CSV. - Then include:
- status
- export status
- collected count
- short analysis
When bitable export is expected but no table link exists, explicitly say export did not finish correctly.
Troubleshooting
pluginConnected=false- Chrome/plugin is not connected to the bridge
- bridge/status mismatch
- in
localmode, ensure collect, status, and stop all use the same local base URL
- in
- cloud dispatch auth failed
- check
defaultCloudToken
- check
- cloud dispatch could not reach device
- check
defaultCloudDeviceIdand whether the local connector is online
- check
TASK_RUNNING- use stop + retry, or
--force-stop-before-start
- use stop + retry, or
- long record output hiding key fields
- trust the connector loop's compact summary output rather than raw task JSON
微信扫一扫