Smart Charts

Upload a data file → get interactive charts and analysis reports, automatically.

Quick Start / 快速上手

3 步生成图表：

上传数据 — 将 CSV / Excel / JSON 文件拖入对话框
确认分析方向 — 查看数据摘要，确认推荐的图表类型
查看结果 — 交互式图表（HTML）+ 分析报告（Markdown）

示例：

用户: 帮我分析这份销售数据 [上传 sales_2024.csv]

AI:  已加载 sales_2024.csv（120 行 × 8 列）
     关键字段: date, region, product, revenue, profit, quantity

     推荐分析方向:
     1. 各区域营收对比 → 柱状图
     2. 月度营收趋势   → 折线图
     3. 产品利润占比   → 饼图

     确认后开始生成？

用户: 确认

AI:  [生成交互式图表] [生成分析报告]

Activation Triggers / 触发条件

Load this skill when any of the following is met:

User mentions: "analyze data", "generate chart", "data visualization", "chart", "visualization" / 用户提到：「分析数据」「生成图表」「数据可视化」
User provides a data file and asks for analysis or visualization
User asks to generate charts or a report from tabular data

Capability Boundaries / 能力边界

Supported / 支持

| Category | Details | |----------|---------| | File formats | CSV (.csv/.tsv), Excel (.xlsx/.xls), JSON (.json), Plain text (.txt) | | Chart types | 16 types: line, bar, pie, scatter, area, radar, heatmap, treemap, graph, boxplot, waterfall, gauge, sankey, funnel, sunburst, wordcloud | | Multi-file | Up to ~10 files; auto-merge when schemas match | | Data size | Single file ≤ 100 MB; recommended ≤ 50 MB for smooth rendering | | Encoding | Auto-detects UTF-8, GBK, GB2312 | | Report templates | Markdown, Word (.docx), PDF, Plain text |

Limitations / 限制

| Limitation | Details | Workaround | |------------|---------|------------| | No database support | Cannot connect to MySQL/PostgreSQL etc. | Export to CSV first, then upload | | No real-time data | Cannot fetch live APIs or streaming data | Prepare a static data file | | No geo maps | No map-based visualization (choropleth, etc.) | Use external tools for geo data | | Large datasets | >100 MB files are rejected; >50 MB may render slowly | Filter/split data before uploading | | Complex joins | Auto-merge requires ≥50% column overlap | Specify join keys manually | | Nested JSON | Only 1-level nested objects supported | Flatten nested structures before upload | | Non-tabular data | Images, audio, video not supported | Convert to tabular format first | | Transform code | LLM-generated code is sandboxed; no file I/O, imports, or network access | Pre-process data externally if needed |

Installation / 安装

# 标准安装（含哈希校验，确保包完整性）
pip install -r requirements.txt --require-hashes

# 如需更新依赖版本，先重新生成哈希
python {skill_base}/core/generate_hashes.py

Dependencies / 依赖

| Package | Version | Required | Description | |---------|---------|----------|-------------| | pandas | ==3.0.1 | Yes | Data parsing (CSV, Excel, JSON) | | numpy | ==2.4.3 | Yes | Numerical computations | | openpyxl | ==3.1.5 | Yes | Excel file engine | | PyPDF2 | ==3.0.1 | Optional | PDF template extraction | | python-docx | ==1.1.2 | Optional | Word template processing |

All versions are pinned with == and verified with SHA256 hashes to prevent supply-chain attacks. ECharts is loaded via CDN (jsdelivr) — no local installation required.

Security / 安全机制

LLM-generated transform code is executed with multiple safety layers:

| Layer | What it does | Why it matters | |-------|-------------|----------------| | Keyword blacklist | Scans for dangerous keywords (exec, eval, open, import, os.system, etc.) before execution | Prevents file I/O, network access, and system commands that could harm your machine | | AST whitelist | Parses code into an Abstract Syntax Tree; only allows safe node types (assignments, calls, loops, comprehensions) | Blocks code that tries to define classes, import modules, or use advanced Python features that aren't needed for data transformation | | User confirmation | Shows code preview and asks for your approval before executing | In interactive mode, no code runs without your consent; in programmatic mode (auto_confirm=True, e.g. batch generation), confirmation is skipped but blacklist and AST checks still apply | | Sandbox builtins | Only safe built-in functions (len, range, sorted, etc.) are available; open/exec/eval/import are removed | Even if blacklist/AST checks are bypassed, the sandbox prevents access to dangerous functions |

What happens when code is blocked:

CodeValidationError: 代码包含危险关键字，已阻止执行: open(
  Reason: 这些关键字可能用于文件操作，在数据转换场景中不需要。
  如确需使用，请检查数据是否需要预处理。

The error message explains why the code was blocked and how to resolve it.

User Guidance / 用户引导

When no data file is provided

Prompt the user:

Please upload the data file(s) you want to analyze. Supported formats:

CSV (.csv / .tsv / .txt)

Excel (.xlsx / .xls)

JSON (.json)

You can drag files directly into the chat box. Multiple files are supported.

When data files are provided

Step 1 — Parse and display a unified summary:

Files loaded: 3

| File | Rows | Cols | Key Fields | |------|------|------|------------| | east_sales.csv | 120 | 8 | date, revenue, profit… | | south_sales.csv | 98 | 8 | date, revenue, profit… | | products.xlsx | 45 | 5 | name, category, price… |

Step 2 — Infer file relationships and recommend an analysis strategy:

| Situation | Recommendation | |-----------|----------------| | Same schema across files | Merge and compare | | Shared common column(s) | Join on the common key | | Unrelated schemas | Analyze each file separately | | Single file | Analyze directly |

Step 3 — Execute after user confirmation.

Error handling

Each error message explains why it happened and what to do about it:

| Error | User message | Why it happens | How to fix | |-------|-------------|----------------|------------| | File not found | "File not found. Please verify the path or drag the file into the chat." | The file path doesn't point to an existing file. | Check for typos in the path, or drag the file directly into the chat. | | Unsupported format | "Unsupported file format (.abc). Only CSV, Excel, and JSON are supported. Please convert your file and retry." | The file extension is not in the supported list. | Open the file in its original application and "Save As" CSV or Excel. | | File > 100 MB | "File too large (150 MB). The limit is 100 MB because large files cause slow parsing and rendering. Try filtering rows or splitting into smaller files." | Large files exceed memory/time limits for in-browser rendering. | Filter to relevant rows/columns, or split by date/category before uploading. | | Empty file | "The file appears to be empty (0 rows). This usually means the file has headers but no data rows. Please check that it contains valid data." | The file was parsed successfully but yielded 0 data rows. | Open the file and verify it has data rows below the header. | | Encoding error | "Encoding issue detected (not UTF-8/GBK/GB2312). Try re-saving the file as CSV with UTF-8 encoding: in Excel, use Save As → CSV UTF-8." | The file uses an encoding that the parser cannot auto-detect. | Re-save the file with UTF-8 encoding. Most spreadsheet apps have a "CSV UTF-8" export option. | | Cannot auto-merge | "Files have different column structures and cannot be auto-merged. For example, file A has [date, revenue] but file B has [name, score]. You can: (1) analyze them separately, or (2) specify a common column to join on." | The files share less than 50% of their columns, making automatic joining unreliable. | Either analyze each file separately, or tell us which column to use as the join key. | | Code blocked | "Transform code was blocked for security: contains 'open('. This keyword is used for file operations, which aren't needed for data transformation. If your data needs pre-processing that requires file access, please do that step before uploading." | The LLM-generated code contains a dangerous keyword or unsupported syntax. | Pre-process the data externally, or simplify the transform to use only pandas operations. |

Execution Workflow

1. Obtain data file(s)
   └─ User uploads file(s) directly (primary method)
   └─ Or user provides file path(s)

2. Parse data
   └─ Call data_parser.py on all files
   └─ Single file  → parse directly
   └─ Multiple files → parse each, assess merge feasibility

3. Confirm & recommend
   └─ Display a summary table for all files
   └─ Recommend: merge / separate / join
   └─ Recommend chart type(s) based on data characteristics

3.5 Data transform (when needed)
   └─ Compare original data structure with the target chart's input format
   └─ If they match → skip transform, proceed to Step 4
   └─ If they don't match → LLM generates pandas transform code
   └─ Security check: keyword blacklist + AST whitelist validation
   └─ User confirmation: show code preview, wait for approval (skipped when auto_confirm=True in programmatic calls)
   └─ Execute in sandbox → producing a standardized DataFrame
   └─ If transform fails → feed error back to LLM for retry (max 2 attempts)
   └─ If retry still fails → fall back to original data + _prepare_axes auto-detection

4. Generate charts
   └─ Call chart_generator.py → produces ECharts HTML
   └─ Merged data  → cross-group comparison charts
   └─ Separate data → independent charts per file
   └─ Chart type is chosen by the LLM based on data shape

5. Check for a report template
   └─ Scan the templates/ subdirectory under the skill base
   └─ Read each meta.json; let the LLM judge relevance
   └─ No matching template → skip to free-form generation

6. Generate analysis report
   └─ Matching template found → fill template.md with data insights
   └─ No matching template    → LLM generates report freely

7. Present results
   └─ Interactive charts: use preview_url (HTML)
   └─ Markdown report:    use open_result_view

Configuration

output_dir:    output directory (optional; default: ./smart_charts_output)
templates_dir: report template directory (optional; default: ./templates)

Important: Never hard-code absolute paths. All paths must be provided by the user or resolved dynamically from the working directory.

Data Parsing — CLI Reference

Usage / 调用方式

Note: {skill_base} refers to the root directory of this skill (the directory containing SKILL.md). Replace it with the actual path when running commands manually.

# Single file / 单文件
python {skill_base}/core/data_parser.py <file_path> [--summary]

# Multiple files / 多文件
python {skill_base}/core/data_parser.py <file1> <file2> ... [--summary]

# Multiple files with auto-merge / 多文件自动合并
python {skill_base}/core/data_parser.py <file1> <file2> ... [--merge] [--summary]

Merge behavior:

| Condition | Result | |-----------|--------| | Identical column names | Vertical concat; a source_file column is added | | Shared columns exist (≥50% overlap) | Horizontal join on shared key | | No common structure | Error — advise analyzing separately |

Supported formats

| Format | Extensions | Notes | |--------|-----------|-------| | CSV | .csv, .tsv | Auto-detects delimiter and encoding (UTF-8 / GBK / GB2312) | | Plain text | .txt | Auto-detects delimiter (comma / tab / semicolon / pipe) | | Excel | .xlsx, .xls | Reads first non-empty sheet | | JSON | .json | Supports array format and 1-level nested objects |

Chart Generation — CLI Reference

Usage / 调用方式

python {skill_base}/core/chart_generator.py \
  <file_path> <chart_type> \
  --title "Chart Title" \
  --x-axis "date" \
  --y-axis "revenue profit" \
  --output-dir "./output"

Parameters

| Parameter | Required | Description | |-----------|----------|-------------| | file_path | Yes | Path to the data file | | chart_type | Yes | Chart type identifier (see table below) | | --title | No | Chart title; default: "Data Chart" | | --x-axis | No | X-axis field; auto-detected if omitted | | --y-axis | No | Y-axis field(s), space-separated; defaults to first 5 numeric columns | | --transform-code | No | LLM-generated pandas transform code string; validated and executed before chart rendering | | --output-dir | No | Output directory; default: ./smart_charts_output |

Supported chart types

| ID | Name | Best For | Data Shape | |----|------|----------|------------| | line | Line chart | Time-series trends, continuous data | 1 category + 1~N numeric columns | | bar | Bar chart | Category comparison, ranked data | 1 category + 1~N numeric columns | | pie | Pie chart | Composition, share distribution | 1 name + 1 value column | | scatter | Scatter plot | Correlation, density | 2 numeric columns | | area | Area chart | Cumulative change, trend | 1 category + 1~N numeric columns | | radar | Radar chart | Multi-dimension comparison | 1 indicator + N numeric columns | | heatmap | Heatmap | Density, cross-tabulation | 2 category + 1 numeric column | | treemap | Treemap | Hierarchical proportion | 1 name + 1 value column | | graph | Network graph | Entity relationships | source + target (+ value) | | boxplot | Box plot | Distribution, outliers | N numeric columns | | waterfall | Waterfall chart | Incremental change | 1 category + 1 numeric column | | gauge | Gauge chart | KPI progress | 1 numeric column | | sankey | Sankey diagram | Flow transfer | source + target + value | | funnel | Funnel chart | Conversion rate | 1 name + 1 value column | | sunburst | Sunburst chart | Multi-level composition | 1 name + 1 value column | | wordcloud | Word cloud | Frequency, keywords | 1 name + 1 value column |

Report Templates

Users can store custom report templates under the templates_dir directory.

Directory structure

templates/
├── _template_index.json         # Auto-generated metadata index
└── <template_id>/               # Each template has its own directory
    ├── meta.json                # Template metadata card
    ├── template.md              # Template content
    └── original.docx            # Source file (optional)

meta.json schema

{
  "id": "<auto_generated>",
  "name": "Monthly Sales Report",
  "description": "For monthly sales summaries: revenue trend, top products, regional breakdown.",
  "scenarios": ["monthly sales report", "sales performance review", "quarterly comparison"],
  "variables": ["period", "revenue", "profit", "order_count", "mom_growth", "yoy_growth"],
  "categories": ["sales", "finance", "business analysis"],
  "format": "markdown",
  "created_time": "<auto_generated>",
  "modified_time": "<auto_generated>"
}

LLM-driven template matching

Template matching is performed by the LLM, not by hard-coded algorithms.

Discover — template_manager.get_all_templates_summary() collects metadata
Analyze — LLM reasons about the user task and data characteristics
Select — LLM picks the best-matching template (or none)
Fill — Load template content and fill with data insights

Fallback behavior:

| Scenario | Behavior | |----------|----------| | No suitable template | LLM generates report freely | | Partial match | LLM uses template structure as reference, generates the rest | | Empty template library | LLM creates a professional report from scratch |

Template variable syntax (auto-detected)

| Format | Example | |--------|---------| | Single braces | {variable_name} | | Double braces | {{variable_name}} | | Square brackets | [variable_name] | | Percent signs | %variable_name% |

Template Management

Supported template formats

| Format | Extension | Processing | |--------|-----------|-----------| | Markdown | .md, .markdown | Native support | | Word | .docx | Extracts text and preserves formatting | | PDF | .pdf | Extracts text and structure | | Plain text | .txt | Simple template parsing |

Operations / 操作指令

Upload / Save a template

Triggers: upload template, add template, save template

User: Save this sales report as a template.
AI:   Template saved: "Sales Report" (Markdown, 8 variables detected)

View template library

Triggers: my templates, template list, show templates

User: Show my templates.
AI:   Your templates (3):
      1. Monthly Sales Report (Markdown) — monthly sales analysis
      2. Project Progress (Word) — project tracking
      3. Financial Report (PDF) — financial analysis

Auto-matching (seamless)

User: Analyze this month's sales data.
AI:   Matched template: "Monthly Sales Report"
      Auto-filling variables: revenue, profit, growth rate
      Generating professional report…

Template management error handling

| Error | User message | Why | How to fix | |-------|-------------|-----|------------| | Unsupported format | "Template format '.abc' is not supported. Only PDF, Word (.docx), and Markdown are accepted." | The file extension is not in the supported list. | Convert the file to PDF, DOCX, or Markdown and retry. | | Template already exists | "Template 'Sales Report' already exists. This happens when a template with the same name was saved earlier." | Duplicate template name. | Choose: overwrite the existing one, rename the new template, or cancel. | | No match found | "No template matches your task. This is normal if you haven't saved any relevant templates yet." | No template's scenarios/variables align with the current task. | The LLM will generate a report from scratch, or you can save a template for future use. | | Missing variables | "Data missing for variables: revenue, profit. The template expects these fields but they weren't found in your data." | The template requires variables that don't exist in the current dataset. | Check that your data file contains the expected columns, or use a different template. |

Key Principles

Multi-file first — Users often upload multiple files. Guide proactively; handle batches gracefully.
Confirm before executing — Always show a data summary and confirm understanding before recommending analysis direction.
LLM chooses chart types — Recommend based on data semantics; never hard-code mapping rules.
Template-first report generation — Use a saved template when a good match exists; fall back to free-form only when necessary.
Dynamic path resolution — Absolute paths must never be hard-coded; resolve all paths at runtime.
Immediate result presentation — Charts via preview_url; Markdown reports via open_result_view.
Data adapt first — When data structure doesn't match the chart's input format, LLM should proactively generate transform code rather than force-rendering with raw data.
Security by default — All LLM-generated code must pass blacklist + AST validation; user confirmation is required in interactive mode (skipped when auto_confirm=True in programmatic calls, but security checks still apply).

Chart Input Format Spec

Each chart type expects the DataFrame in a specific shape. LLM must check whether the raw data matches; if not, generate transform code.

| Chart Type | Required DataFrame Format | Example Columns | |------------|--------------------------|-----------------| | line | 1 category/time column + 1~N numeric columns | month, productA, productB | | bar | 1 category column + 1~N numeric columns | city, revenue, profit | | area | 1 category/time column + 1~N numeric columns | date, uv, pv | | pie | 1 name column + 1 value column | category, share | | scatter | 2 numeric columns, or 1 category + 1 numeric | height, weight | | radar | 1 indicator column + N numeric columns | metric, productA, productB | | heatmap | 2 category columns + 1 numeric column | row, col, value | | treemap | 1 name column + 1 value column | category, sales | | graph | source + target columns (+ optional value) | from, to, weight | | boxplot | N numeric columns | math, chinese, english | | waterfall | 1 category column + 1 numeric column (increments) | month, profit_delta | | gauge | 1 numeric column (mean used) | completion_rate | | sankey | source + target + value columns | origin, destination, amount | | funnel | 1 name column + 1 value column | stage, count | | sunburst | 1 name column + 1 value column | category, value | | wordcloud | 1 name column + 1 value column | word, frequency |

Transform Code Generation Prompt

When the raw data structure doesn't match the target chart's input format, use the following prompt template to generate transform code:

Known information:
- Raw data columns: {columns_with_dtypes}
- Data sample (first 5 rows): {sample}
- Target chart type: {chart_type}
- Required format for this chart: {chart_input_spec}

Generate a pandas code snippet that transforms df into a result DataFrame matching the chart's input format.

Rules:
1. Only use variables: df, pd, np
2. Must produce a variable named result (pd.DataFrame)
3. Do not modify df in-place; use df.copy() or chain operations
4. Keep code concise; prefer pandas built-in methods (pivot_table, melt, groupby, rename, etc.)
5. If raw data already matches the required format, output an empty string
6. Do NOT use: import, open, exec, eval, os, sys, subprocess, file I/O, network calls

Output format:
```python
# {one-line description of what the transform does}
{transform_code}


### Common transform examples

| Scenario | Raw Data | Chart | Transform Code |
|----------|----------|-------|----------------|
| Long→multi-series line | `month, region, sales` | line | `result = df.pivot_table(index='month', columns='region', values='sales', aggfunc='sum').reset_index()` |
| Long→radar | `city, metric, value` | radar | `result = df.pivot_table(index='city', columns='metric', values='value').reset_index()` |
| Long→pie (filter) | `category, metric_name, metric_value` | pie | `result = df[df['metric_name']=='revenue'][['category','metric_value']].rename(columns={'category':'name','metric_value':'value'})` |
| Rename→sankey | `来源, 去向, 金额` | sankey | `result = df.rename(columns={'来源':'source','去向':'target','金额':'value'})` |
| Wide→long | `date, productA, productB` | pie | `result = df.melt(id_vars=['date'], var_name='name', value_name='value')` |
| Compute delta→waterfall | `month, profit` | waterfall | `tmp = df.copy(); tmp['delta'] = tmp['profit'].diff().fillna(tmp['profit'].iloc[0]); result = tmp[['month','delta']]` |
| Aggregate→bar | `date, product, sales` | bar | `result = df.groupby('product')['sales'].sum().reset_index()` |

smart-charts

Smart Charts

Quick Start / 快速上手

Activation Triggers / 触发条件

Capability Boundaries / 能力边界

Supported / 支持

Limitations / 限制

Installation / 安装

Dependencies / 依赖

Security / 安全机制

User Guidance / 用户引导

When no data file is provided

When data files are provided

Error handling

Execution Workflow

Configuration

Data Parsing — CLI Reference

Usage / 调用方式

Supported formats

Chart Generation — CLI Reference

Usage / 调用方式

Parameters

Supported chart types

Report Templates

Directory structure

meta.json schema

LLM-driven template matching

Template variable syntax (auto-detected)

Template Management

Supported template formats

Operations / 操作指令

Upload / Save a template

View template library

Auto-matching (seamless)

Template management error handling

Key Principles

Chart Input Format Spec

Transform Code Generation Prompt