返回 Skill 列表
extension
分类: 其它无需 API Key

自动分析视频主体+选择关键帧

上传视频文件,视觉模型智能分析视频主题并生成分析报告,并返回视频高光帧,高光帧数量可选。适用于广告,娱乐,游戏等行业(默认返回3张)。

person作者: u_c3c8f0a0hubenterprise

Video Analyzer

Overview

Extract keyframes from videos, analyze content with vision models, and generate comprehensive reports with 3 representative screenshots. Optimized for token efficiency using I-frame detection.

Workflow

Video Input → Extract Keyframes → Vision Analysis → Select Top 3 → Generate Report → Send Output

Step-by-Step Process

1. Download Video (if from Feishu)

When user sends video via Feishu, the file is auto-saved to:

~/.openclaw/media/inbound/<filename>.mp4

2. Extract Video Metadata

ffmpeg -i <video_path> 2>&1 | grep -E "(Duration|Video)"

Returns: duration, resolution, bitrate, codec info.

3. Extract Keyframes

Use the provided script for optimal keyframe extraction:

bash ~/.openclaw/workspace/skills/video-analyzer/scripts/extract_keyframes.sh <video_path> [output_dir]

Parameters:

  • video_path: Path to video file (required)
  • output_dir: Output directory (optional, defaults to ~/.openclaw/media/keyframes/)

Output: JPEG images at 640px width, named keyframe_XX.jpg

Token efficiency: Uses I-frame detection to extract only meaningful frames, reducing token consumption by ~7% vs uniform sampling.

4. Analyze with Vision Model

Use the image tool with all extracted keyframes:

prompt: "Analyze these keyframes from a video. Please:
1. Describe the video's theme and content
2. Select 3 most representative frames (explain why)"

5. Generate Report

Structure the analysis report:

## 📌 Video Theme
[Description]

## 🖼️ Representative Screenshots
| Frame | Reason |
|-------|--------|
| frame_XX | [Why representative] |

6. Send Output

Send via Feishu:

  1. Analysis report (text message)
  2. 3 representative screenshots (image messages)

Token Consumption Reference

| Video Length | Keyframes | Estimated Tokens | |--------------|-----------|------------------| | 5 seconds | 5-8 | ~8,000-14,000 | | 15 seconds | 12-16 | ~20,000-28,000 | | 30 seconds | 20-30 | ~35,000-50,000 |

Optimization tips:

  • Images account for 95%+ of tokens
  • Shorter videos = fewer tokens
  • Low-motion videos produce fewer keyframes

Resources

scripts/

  • extract_keyframes.sh - Extract keyframes using ffmpeg I-frame detection

references/

  • ffmpeg_reference.md - Advanced ffmpeg commands for video processing