返回 Skill 列表
extension
分类: 其它无需 API Key

Image Reader

使用OCR(光学字符识别)从图像中提取文本。当需要读取图像、截图、照片或其他图像中的文本内容时使用此技能。

person作者: rendaixue-bytehubclawhub

Image Reader - OCR Text Extraction

A high-performance OCR skill for extracting text from images. Powered by RapidOCR with PP-OCRv4 models, supporting Chinese and English text recognition.

Features

  • Multi-language: Chinese (simplified/traditional), English, and mixed text
  • High accuracy: PP-OCRv4 model with >95% accuracy on typical screenshots
  • Structured output: Text with confidence scores and bounding boxes
  • Image info: Dimensions, format, and color mode included
  • Fast: CPU-only, no GPU required

Quick Start

python scripts/read_image.py /path/to/image.jpg

Usage Examples

Extract text from a screenshot

python scripts/read_image.py screenshot.png

JSON Output

The script outputs structured JSON:

{
  "success": true,
  "text": "Full extracted text",
  "lines": [
    {
      "text": "Individual line",
      "confidence": 0.98,
      "box": [[x1,y1], [x2,y2], [x3,y3], [x4,y4]]
    }
  ],
  "line_count": 5,
  "image_info": {
    "format": "PNG",
    "size": [1920, 1080],
    "mode": "RGB"
  }
}

Requirements

pip install rapidocr onnxruntime pillow

First run will download OCR models (~50MB) automatically.

Common Use Cases

  • UI Screenshots: Extract text from app/website screenshots
  • Document Photos: Read text from photographed documents
  • Diagrams: Extract labels and annotations
  • Receipts: Parse receipt/invoice data

Output Fields

| Field | Type | Description | |-------|------|-------------| | success | bool | Whether OCR succeeded | | text | string | All extracted text | | lines | array | Individual text lines with metadata | | line_count | int | Number of text lines detected | | image_info | object | Image metadata |

Technical Details

  • Engine: RapidOCR (ONNX Runtime backend)
  • Models: PP-OCRv4 (detection + recognition)
  • Languages: Chinese, English (auto-detected)
  • Performance: ~1-2 seconds per image on CPU

License

MIT License

Third-party dependencies:

  • RapidOCR - Apache 2.0 License
  • ONNX Runtime - MIT License