返回 Skill 列表
extension
分类: 数据与分析无需 API Key

roadshow-capture-skill

自动化捕获路演/投资者演示文稿为PDF,使用Playwright驱动,支持NetRoadShow和DealRoadShow

person作者: nikker1974hubclawhub

RoadShow Capture

Automatically opens roadshow links, handles authentication, captures every page as a screenshot, and compiles them into a single PDF. Supports NetRoadShow and DealRoadShow.

Principle: Use Playwright directly, NOT Camofox. Camofox has serious limitations on roadshow platforms (hash routing encoding, popup blocking, reCAPTCHA, Angular sandbox). Playwright handles everything Camofox can't.


Unified Entry Point

scripts/roadshow-capture.py auto-routes by URL domain:

# Auto-detects platform — no need to pick the right script
python roadshow-capture.py --url "https://www.netroadshow.com/nrs/home/#!/?show=SHOW_ID"
python roadshow-capture.py --url "https://dealroadshow.com/e/MTNA2026"

Supported: netroadshow.com → NetRoadShow flow, dealroadshow.com → DealRoadShow flow.


First-Time Setup (Email Configuration)

  1. Tell the agent your roadshow email
  2. It writes to scripts/.env:
    NRS_EMAIL=your-email@your-company.com
    
  3. No further prompts needed

If NRS_EMAIL is unset, the agent will prompt once and save it to .env. No need to set environment variables manually.


NetRoadShow Workflow

Script: scripts/netroadshow-capture.py
URL format: https://www.netroadshow.com/nrs/home/#!/?show=SHOW_ID

Verified Flow (KODIT Audio Roadshow Plus, 32 slides, 2026-05-12)

1. browser.goto(show_url)
   → Angular SPA routing works natively, no hash fix needed
   → URL auto-rewrites to /nrs/home/?show=SHOW_ID

2. page.locator("#homeEmailInput").first.fill(email)
   with page.expect_popup() as info:
       page.get_by_text("Launch Show").click()
   pp = info.value  ← MUST use expect_popup(), NOT page.on("popup")

3. Popup → /presentation/v2/{id}/disclaimer/
   Agree button is <div class="disclaimer-btn btn-agree">, NOT <button>
   ✅ pp.evaluate('document.querySelector(".btn-agree").click()')
   ❌ page.get_by_text("Agree").click() — doesn't trigger

4. "Resume previous session" / "Start from beginning" prompt
   Always pick "Start from beginning" (hard rule, no user prompt)

5. URL → /presentation/v2/{id}/MediaSlides

6. ArrowRight × (N-1) → screenshot → Pillow compose PDF (must use format='PDF')

URL State Machine

/home/#!/?show=SHOW_ID    →  Email input page
   ↓ Fill email → Launch Show (expect_popup)
/home/?show=SHOW_ID        →  Popup opens
   ↓ Popup
/presentation/v2/{id}/disclaimer/  →  Legal disclaimer (div button)
   ↓ Agree (JS evaluate)
/presentation/v2/{id}/disclaimer/  →  Resume / Start from beginning
   ↓ Start from beginning (always)
/presentation/v2/{id}/MediaSlides  →  Slide viewer
   ↓ ArrowRight × capture
   PDF compose

NetRoadShow-Specific Pitfalls

| Pitfall | Solution | |---------|----------| | Popup/new window | expect_popup() blocking wait | | div button (not button) | page.evaluate('.btn-agree').click() | | Hash routing #!/?show=X | Playwright handles natively | | Session taint after failure | Fresh browser context: homepage → fake entry code → Continue → recover | | Password login blocked | Email-Only flow (corporate email domain verified) |


DealRoadShow Workflow

Script: scripts/dealroadshow-capture.py
URL format: https://dealroadshow.com/e/XXXXXXXX

Verified Flow (ArcelorMittal USD Bond, 32 slides, 2026-05-13)

1. browser.goto(deal_url)
   → dealroadshow.com 302 redirects to finsight.com/login/investor/e/XXXX
   → May show Cookie banner → page.get_by_role("button", name="Ok").click()

2. Fill email + Launch Deal Roadshow
   page.locator("input[type='email']").first.fill(email)
   page.get_by_text("Launch Deal Roadshow").click()
   → ~3-4s later form submits, navigates from /login/ to /e/XXXX (disclaimer)
   ⚠ During these 3-4s, DO NOT call page.evaluate() (context gets destroyed)

3. I Agree — MUST use Playwright locator, NOT evaluate
   ✅ page.locator("button:has-text('I Agree')").first.click(timeout=10000)
   ❌ page.evaluate('...click()')  ← clicks but doesn't trigger navigation

4. Wait for slides to load (fixed sleep, NOT networkidle)
   Current page: input[data-test="currentSlideInput"] value attribute
   Total pages: "of N" in body_text (r'\bof\s*(\d{1,4})\b')

5. ArrowRight × screenshot → poll input value for increment → no change = last page
   Pillow compose PDF (must use format='PDF')

DealRoadShow Key Points

  • No popup: All navigation stays in the same page, no expect_popup() needed
  • I Agree button: Must use Playwright locator .click(), NOT page.evaluate() (JS click doesn't trigger navigation)
  • Page detection: Read current slide from input[data-test="currentSlideInput"] value; extract total from "of N" text
  • URL format varies: Sometimes /e/XXXX/1 (with page number), sometimes /e/XXXX (without) — can't rely on URL for end-of-slides detection
  • End detection: Poll the input value after each ArrowRight — if it doesn't increment, you're on the last page

Platform Comparison

| Dimension | NetRoadShow | DealRoadShow | |-----------|-------------|--------------| | Popup | expect_popup() new window | Same-page navigation | | Disclaimer button | <div>, JS evaluate | Standard <button>, locator click (evaluate doesn't trigger nav) | | URL format | SPA hash routing | Inconsistent — sometimes /e/XXXX/1, sometimes /e/XXXX | | URL changes after nav | No (SPA) | Maybe, maybe not | | Page detection | Slide count text | input[data-test="currentSlideInput"] value | | Total page detection | "All N Slides" text | "of N" text (r'\bof\s*(\d{1,4})\b') | | Cookie banner | None | Possible, "Ok" button | | Wait strategy | Fixed sleep + networkidle | Fixed sleep + domcontentloaded (networkidle hangs on WebSocket) | | Post-launch navigation | Instant (expect_popup blocks) | Delayed 3-4s (POST form submit), don't evaluate during this period |


Common Pitfalls

Pillow PDF Requires format='PDF'

# ✅ MUST specify format='PDF'
imgs[0].save(path, save_all=True, append_images=imgs[1:], format='PDF', resolution=150)

Playwright headless flag

Both platforms need --no-sandbox:

browser = p.chromium.launch(headless=True, args=["--no-sandbox"])

Headless trust

Mac Safari UA + Playwright headless Chromium works on both platforms.

Shared email config

Both platforms share the NRS_EMAIL environment variable. No separate config needed.

Wait strategy (DealRoadShow only)

  • Don't use wait_for_url — page uses pushState navigation, wait_for_url and wait_for_function miss events
  • Don't use networkidle — audio player has WebSocket persistent connections that hang
  • Use fixed sleep + domcontentloaded instead

Page detection (DealRoadShow only)

# Read current page (works regardless of URL format)
cur = page.evaluate('document.querySelector("[data-test=currentSlideInput]").value')
# Navigate
page.keyboard.press("ArrowRight")
time.sleep(1)
# Poll until page number increments
for _ in range(10):
    new_cur = page.evaluate('document.querySelector("[data-test=currentSlideInput]").value')
    if int(new_cur) > int(cur): break
    time.sleep(0.5)

Notes

  • User-Agent set to macOS Safari for better server-side trust
  • Navigation uses page.keyboard.press("ArrowRight") for both platforms
  • Screenshot size: 1920×1080, ~350KB~900KB each
  • Output PDF: ~4-5 MB (32 slides)
  • Reference: references/netroadshow-practice.md (operational notes with Mermaid flowcharts)
  • GitHub: https://github.com/nikker1974/roadshow-capture-skill (Hermes + OpenClaw compatible)