data-pipeline-builder
Build data pipelines without framework expertise. Extract from any source, transform with code, load to any destination — all with natural language commands.
What It Does
- Extract data — From databases, APIs, files, S3, GCS, Kafka
- Transform — Filters, mappings, aggregations, joins, custom code
- Load — To databases, data warehouses, files, APIs
- Schedule — Cron-based or event-triggered execution
- Monitor — Pipeline status, throughput, error rates
- Validate — Schema checks, data quality rules
Quick Start
# 1. Create a simple pipeline
create pipeline from mysql users to postgres users_backup
# 2. Add transformation
add transform to users-backup: filter where active = true
# 3. Schedule it
schedule users-backup daily at 2:00 AM
# 4. Run and monitor
run pipeline users-backup
check pipeline status
Common Use Cases
🔄 Database Synchronization
# Sync production to analytics warehouse
create pipeline from mysql production.orders \
to bigquery analytics.orders
# Run incremental sync every hour
schedule orders-sync hourly
📊 API Data Extraction
# Pull data from REST API
create pipeline from api https://api.shop.com/orders \
to postgres analytics.orders
# Add authentication
set source auth: bearer token xxx
🧹 Data Cleaning
# Clean and transform data
create pipeline from csv raw_data.csv to postgres clean_data
add transform: \
remove duplicates on email \
fill nulls in age with 0 \
validate email format
📈 Analytics Preparation
# Aggregate for dashboards
create pipeline from postgres transactions \
to postgres daily_summary
add transform: \
group by date, product \
aggregate sum(revenue), count(*) \
where date >= yesterday
All Commands
| Command | Purpose |
|---------|---------|
| create pipeline from <src> to <dst> | Define new pipeline |
| add transform <pipeline> | Add transformation step |
| schedule <pipeline> <when> | Set run schedule |
| run pipeline <name> | Execute immediately |
| check pipeline status | View running pipelines |
| pause pipeline <name> | Stop scheduled runs |
| view logs <pipeline> | See execution history |
| validate <pipeline> | Test without executing |
Supported Sources & Destinations
Databases: MySQL, PostgreSQL, MongoDB, Redis, SQLite
Cloud Storage: S3, GCS, Azure Blob
Data Warehouses: BigQuery, Snowflake, Redshift
Streaming: Kafka, Kinesis, Pub/Sub
Files: CSV, JSON, Parquet, Excel
Requirements
- Node.js 18+ or Python 3.8+
- Source/destination connectors (auto-installed)
- Optional: Airflow, Dagster for orchestration
微信扫一扫