返回 Skill 列表
extension
分类: 数据与分析无需 API Key

Data Analysis

分析CSV/Excel文件,提取洞察、生成统计、创建图表并输出摘要。适用于用户需要上传或分析电子表格的场景。

person作者: di5cip1ehubclawhub

Data Analysis Skill

Analyze data files (CSV, Excel) and produce actionable insights.

Quick Start

  1. Read the file - Use appropriate library:

    • CSV: csv module or pandas.read_csv()
    • Excel: pandas.read_excel() with openpyxl engine
  2. Explore the data - Get shape, columns, dtypes, missing values

  3. Generate insights - Calculate:

    • Descriptive stats (mean, median, mode, std, min, max)
    • Correlations between numeric columns
    • Value counts for categorical columns
    • Trends over time if date column exists
  4. Create visualizations - Use matplotlib:

    • Bar charts for categorical data
    • Line charts for time series
    • Histograms for distributions
    • Scatter plots for correlations
  5. Summarize - Write findings in plain English

Common Patterns

Sales Data

import pandas as pd

df = pd.read_csv('sales.csv')
summary = {
    'total_revenue': df['amount'].sum(),
    'avg_order': df['amount'].mean(),
    'top_products': df['product'].value_counts().head(5),
    'monthly_trend': df.groupby(pd.to_datetime(df['date']).dt.month)['amount'].sum()
}

Customer Data

demographics = df.groupby('segment').agg({
    'age': ['mean', 'median'],
    'income': ['mean', 'std'],
    'id': 'count'
})

Time Series

df['date'] = pd.to_datetime(df['date'])
monthly = df.resample('M', on='date')['value'].sum()

Output Format

Always include:

  1. Overview - What the data contains (rows, columns, date range)
  2. Key Metrics - Top 5-10 actionable numbers
  3. Insights - 3-5 bullet points of what the data reveals
  4. Visualizations - At least 2 charts for any dataset with 100+ rows
  5. Recommendations - Suggested next steps based on findings

Error Handling

  • Handle missing values: df.fillna(0) or df.dropna()
  • Handle date parsing: Use pd.to_datetime(..., errors='coerce')
  • Handle large files: Process in chunks for files >100MB