Zhipu Web Page Reader

Fetch and parse web page content via Zhipu AI's Reader API (/paas/v4/reader), using lightweight cURL. Returns parsed page content in Markdown or plain text format, along with metadata like title and description.

Quick Start

Basic cURL Usage

curl --request POST \
  --url https://open.bigmodel.cn/api/paas/v4/reader \
  --header "Authorization: Bearer $ZHIPU_API_KEY" \
  --header 'Content-Type: application/json' \
  --data '{
    "url": "https://www.example.com"
  }'

Script Usage

A wrapper shell script is provided for convenience.

# Basic Fetch (returns Markdown by default)
bash scripts/zhipu_fetch.sh --url "https://www.example.com"

# Fetch as plain text, no cache
bash scripts/zhipu_fetch.sh \
  --url "https://docs.python.org/3/" \
  --format text \
  --no-cache

# Fetch with image and link summaries
bash scripts/zhipu_fetch.sh \
  --url "https://news.example.com/article" \
  --images-summary \
  --links-summary

# Fetch without images, disable GFM
bash scripts/zhipu_fetch.sh \
  --url "https://blog.example.com/post" \
  --no-images \
  --no-gfm

API Parameter Reference

| Parameter | Type | Required | Default | Description | |-----------|------|----------|---------|-------------| | url | string | ✅ | - | URL of the web page to fetch | | timeout | integer | - | 20 | Request timeout in seconds | | no_cache | boolean | - | false | Disable caching (true/false) | | return_format | string | - | markdown | Return format: markdown or text | | retain_images | boolean | - | true | Retain images in output (true/false) | | no_gfm | boolean | - | false | Disable GitHub Flavored Markdown (true/false) | | keep_img_data_url | boolean | - | false | Keep image data URLs (true/false) | | with_images_summary | boolean | - | false | Include images summary (true/false) | | with_links_summary | boolean | - | false | Include links summary (true/false) |

Response Structure

The API returns JSON with the parsed page content.

{
  "id": "task-id",
  "created": 1704067200,
  "request_id": "request-id",
  "model": "model-name",
  "reader_result": {
    "title": "Page Title",
    "description": "Brief page description",
    "url": "https://www.example.com",
    "content": "Parsed page content (Markdown or text)",
    "external": {
      "stylesheet": {}
    },
    "metadata": {
      "keywords": "page, keywords",
      "viewport": "width=device-width",
      "description": "Meta description",
      "format-detection": "telephone=no"
    }
  }
}

Key Response Fields

| Field | Description | |-------|-------------| | reader_result.content | Main parsed content (body text, images, links) | | reader_result.title | Page title | | reader_result.description | Brief page description | | reader_result.url | Original page URL | | reader_result.metadata | Page metadata (keywords, viewport, etc.) |

Common Use Cases

| Scenario | Command | |----------|---------| | Read a documentation page | --url <doc_url> | | Extract text only (no images) | --url <url> --no-images --format text | | Force fresh fetch (bypass cache) | --url <url> --no-cache | | Get content with all summaries | --url <url> --images-summary --links-summary | | Long page with extended timeout | --url <url> --timeout 60 |

Environment Requirements

Environment variable ZHIPU_API_KEY must be configured.
curl command must be available in your system path.