返回 Skill 列表
extension
分类: 内容与媒体无需 API Key

agent-eyes

为AI代理提供视觉上下文分析。提供网页的屏幕截图、可访问性扫描、DOM快照以及元素描述。当您需要查看网页外观、分析可访问性问题、检查DOM结构或获取详细的元素信息时使用。在收到如“截个图”、“检查可访问性”、“这个页面看起来怎么样”、“分析用户界面”、“检查此元素”或任何视觉/用户界面分析任务请求时触发。

person作者: jakexiaohubgithub

Agent Eyes

Visual context analyzer for web pages. Provides AI agents with the ability to "see" web applications through screenshots, accessibility scans, DOM snapshots, and element descriptions.

Prerequisites

  • Python 3.10+
  • uv package manager (recommended)
  • Playwright browsers installed: playwright install chromium

Compact Mode (Token-Efficient Output)

All commands support --compact / -c flag for token-efficient output:

| Mode | Screenshot | DOM | A11y | Total Tokens | |------|------------|-----|------|--------------| | Standard | Base64 inline | depth=5, 20 children | Full violations | ~500K+ | | Compact | File path only | depth=3, 10 children | Summary only | ~3-5K |

Use compact mode when context window size is a concern (which is most of the time).

# Compact context - reduces ~500K tokens to ~3-5K tokens
uv run $SKILL_DIR/agent_eyes.py context http://localhost:3000 --compact

# Compact screenshot - always saves to file, never returns base64
uv run $SKILL_DIR/agent_eyes.py screenshot http://localhost:3000 --compact

# Compact a11y - returns summary + top N issues only
uv run $SKILL_DIR/agent_eyes.py a11y http://localhost:3000 --compact

# Compact DOM - stricter limits on depth and children
uv run $SKILL_DIR/agent_eyes.py dom http://localhost:3000 --compact

Commands

All commands use uv run for automatic dependency management:

SKILL_DIR=".claude/skills/agent-eyes/scripts"

Screenshot

Capture full page or element screenshots:

# Full page screenshot (saves to .canvas/screenshots/)
uv run $SKILL_DIR/agent_eyes.py screenshot http://localhost:3000

# Element screenshot
uv run $SKILL_DIR/agent_eyes.py screenshot http://localhost:3000 --selector ".hero"

# Save to specific path
uv run $SKILL_DIR/agent_eyes.py screenshot http://localhost:3000 --output ./tmp/page.png

# Get as base64 (for inline context) - NOT recommended, use --compact instead
uv run $SKILL_DIR/agent_eyes.py screenshot http://localhost:3000 --base64

# RECOMMENDED: Compact mode - always saves to file, never returns base64
uv run $SKILL_DIR/agent_eyes.py screenshot http://localhost:3000 --compact

Accessibility Scan

Run axe-core accessibility analysis:

# Full page scan (WCAG 2.1 AA)
uv run $SKILL_DIR/agent_eyes.py a11y http://localhost:3000

# Scoped to element
uv run $SKILL_DIR/agent_eyes.py a11y http://localhost:3000 --selector "main"

# WCAG AAA level
uv run $SKILL_DIR/agent_eyes.py a11y http://localhost:3000 --level AAA

# RECOMMENDED: Compact mode - summary + top issues only (~1-2K tokens vs 100K+)
uv run $SKILL_DIR/agent_eyes.py a11y http://localhost:3000 --compact
uv run $SKILL_DIR/agent_eyes.py a11y http://localhost:3000 --compact --max-issues 5

DOM Snapshot

Get simplified DOM tree:

# Full page DOM
uv run $SKILL_DIR/agent_eyes.py dom http://localhost:3000

# Subtree only
uv run $SKILL_DIR/agent_eyes.py dom http://localhost:3000 --selector ".content"

# Control depth and children
uv run $SKILL_DIR/agent_eyes.py dom http://localhost:3000 --depth 3 --max-children 10

# RECOMMENDED: Compact mode - depth=3, max-children=10, text=50 chars
uv run $SKILL_DIR/agent_eyes.py dom http://localhost:3000 --compact

Describe Element

Get detailed element information (styles, bounding box, attributes):

uv run $SKILL_DIR/agent_eyes.py describe http://localhost:3000 --selector ".hero-button"

Full Context

Get comprehensive context bundle (screenshot + a11y + DOM + description):

# Full context for page
uv run $SKILL_DIR/agent_eyes.py context http://localhost:3000

# Focused on element
uv run $SKILL_DIR/agent_eyes.py context http://localhost:3000 --selector ".hero"

# Without screenshot (smaller output)
uv run $SKILL_DIR/agent_eyes.py context http://localhost:3000 --no-screenshot

# RECOMMENDED: Compact mode - file paths only, limited DOM/a11y (~3-5K tokens)
uv run $SKILL_DIR/agent_eyes.py context http://localhost:3000 --compact

# Compact with custom limits
uv run $SKILL_DIR/agent_eyes.py context http://localhost:3000 --compact \
  --dom-depth 2 --max-children 5 --max-issues 5

Output Format

All commands return JSON to stdout:

{
  "ok": true,
  "...": "command-specific fields"
}

On error:

{
  "ok": false,
  "error": "Error description"
}

Compact Mode Output Examples

Compact context output (~3-5K tokens instead of ~500K):

{
  "ok": true,
  "url": "http://localhost:3000",
  "title": "My App",
  "timestamp": "2026-01-22T10-30-00-000Z",
  "compact": true,
  "screenshot_path": ".canvas/screenshots/2026-01-22T10-30-00-000Z.png",
  "screenshot_size": 443281,
  "dom": {
    "tag": "body",
    "children": [...]
  },
  "a11y_summary": {
    "total_violations": 5,
    "by_severity": {"critical": 1, "serious": 2, "moderate": 2, "minor": 0},
    "top_issues": [
      {"id": "color-contrast", "impact": "serious", "affected_count": 3}
    ]
  }
}

Compact a11y output (~1-2K tokens instead of ~100K):

{
  "ok": true,
  "total_violations": 15,
  "by_severity": {"critical": 2, "serious": 5, "moderate": 6, "minor": 2},
  "by_category": {"color": 3, "aria": 5, "keyboard": 2},
  "top_issues": [
    {
      "id": "color-contrast",
      "impact": "serious",
      "description": "Elements must have sufficient color contrast...",
      "affected_count": 3,
      "help_url": "https://dequeuniversity.com/rules/axe/..."
    }
  ],
  "passes": 42,
  "incomplete": 3
}

Typical Agent Workflow

  1. Start dev server (if not running):

    npm run dev &
    
  2. Take initial screenshot to see current state:

    uv run $SKILL_DIR/agent_eyes.py screenshot http://localhost:3000
    
  3. Run accessibility scan to find issues:

    uv run $SKILL_DIR/agent_eyes.py a11y http://localhost:3000
    
  4. Inspect specific element for details:

    uv run $SKILL_DIR/agent_eyes.py describe http://localhost:3000 --selector ".problematic-button"
    
  5. Get full context for comprehensive analysis:

    uv run $SKILL_DIR/agent_eyes.py context http://localhost:3000 --selector ".hero"
    

Example: Analyze and Fix A11y Issues

# 1. Get accessibility violations
uv run $SKILL_DIR/agent_eyes.py a11y http://localhost:3000

# Output shows violations like:
# {
#   "ok": true,
#   "violations": [
#     {
#       "id": "color-contrast",
#       "impact": "serious",
#       "description": "Elements must have sufficient color contrast",
#       "nodes": [{"html": "<button class='cta'>..."}]
#     }
#   ]
# }

# 2. Describe the element to understand current styles
uv run $SKILL_DIR/agent_eyes.py describe http://localhost:3000 --selector ".cta"

# 3. Make code changes to fix the contrast issue

# 4. Re-run a11y to verify fix
uv run $SKILL_DIR/agent_eyes.py a11y http://localhost:3000

Notes

  • Screenshots are saved to .canvas/screenshots/ by default with ISO timestamps
  • The tool runs headless Chromium via Playwright
  • All commands wait for networkidle before capturing
  • DOM snapshots are simplified to reduce output size
  • A11y scans use axe-core, the industry standard accessibility testing engine

Token Budget Guide

| Operation | Standard Mode | Compact Mode | |-----------|---------------|--------------| | Screenshot | ~100-470K tokens (base64) | ~50 tokens (path only) | | DOM Snapshot | ~50-150K tokens | ~2-3K tokens | | A11y Scan | ~50-100K tokens | ~500-1K tokens | | Full Context | ~500K+ tokens | ~3-5K tokens |

Recommendation: Always use --compact flag unless you specifically need base64 data for inline image processing. The compact mode reduces token usage by 99% while preserving all essential information.

When to Use Each Mode

| Mode | Use Case | |------|----------| | Standard | Debugging, when you need full HTML snippets, when feeding to vision model | | Compact | Most agent workflows, design reviews, accessibility audits, CI/CD pipelines |