Fireworks AI Skill
Fast, cost-effective access to 100+ open-source models with OpenAI-compatible APIs, LoRA fine-tuning, and advanced deployment options.
When to Use This Skill
| Scenario | Example | Relevant Section | |----------|---------|------------------| | Query text models | "Chat completion with Llama" | Quick Reference → Chat Completion | | Fine-tune a model | "Train model on my data" | Fine-Tuning Overview | | Deploy custom model | "On-demand GPU deployment" | Deployments | | Migrate from OpenAI | "Use OpenAI SDK with Fireworks" | OpenAI Compatibility | | Batch processing | "Process 10K prompts offline" | Batch Inference | | Image generation | "FLUX Kontext image editing" | Image Generation | | Embeddings/RAG | "Generate embeddings for search" | Embeddings & Reranking | | CLI operations | "firectl commands" | firectl Reference |
Quick Reference
Chat Completion (OpenAI SDK)
from openai import OpenAI
client = OpenAI(
base_url="https://api.fireworks.ai/inference/v1",
api_key="<YOUR_FIREWORKS_API_KEY>",
)
chat_completion = client.chat.completions.create(
model="accounts/fireworks/models/llama-v3p1-8b-instruct",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Say this is a test"},
],
)
print(chat_completion.choices[0].message.content)
Chat Completion (curl)
curl --request POST \
--url https://api.fireworks.ai/inference/v1/chat/completions \
--header "accept: application/json" \
--header "authorization: Bearer $FIREWORKS_API_KEY" \
--header "content-type: application/json" \
--data '{
"model": "accounts/fireworks/models/llama-v3p1-8b-instruct",
"messages": [{"role": "user", "content": "Hello!"}]
}'
Supervised Fine-Tuning Job
firectl supervised-fine-tuning-job create \
--base-model accounts/fireworks/models/llama-v3p1-8b-instruct \
--dataset my-training-dataset \
--output-model my-fine-tuned-model \
--epochs 3 \
--learning-rate 1e-4 \
--lora-rank 8
Create Dataset for Fine-Tuning
from fireworks.client import Dataset
dataset = Dataset.from_file(
"path/to/training_data.jsonl",
name="my-training-dataset"
)
# Dataset is now available on Fireworks for fine-tuning
Monitor Training Progress
while not job.is_completed:
job.raise_if_bad_state()
print(f"Training state: {job.state}")
time.sleep(10)
job = job.get()
print(f"Training completed! New model: {job.output_model}")
Deploy Fine-Tuned Model (Multi-LoRA)
from fireworks import LLM
base_model = LLM(
model="accounts/fireworks/models/llama-v3p2-3b-instruct",
deployment_type="on-demand",
id="shared-base-deployment",
enable_addons=True
)
Generate Embeddings
from openai import OpenAI
client = OpenAI(
base_url="https://api.fireworks.ai/inference/v1",
api_key="<YOUR_FIREWORKS_API_KEY>",
)
response = client.embeddings.create(
model="fireworks/qwen3-embedding-8b",
input="Your text to embed"
)
embeddings = response.data[0].embedding
Export Billing Metrics
firectl billing export-metrics \
--start-time "2025-01-01" \
--end-time "2025-01-31" \
--filename january_metrics.csv
Create Deployment
firectl deployment create accounts/fireworks/models/deepseek-v3 \
--deployment-shape throughput
Key Concepts
Fine-Tuning Methods
| Method | Use Case | When to Use | |--------|----------|-------------| | SFT (Supervised) | Classification, extraction | Large labeled dataset (~1000+ examples) | | RFT (Reinforcement) | Complex reasoning, agents | Small dataset, verifiable outputs, multi-step tasks | | DPO (Preference) | Alignment, style | Pairwise preference comparisons |
Decision Tree:
- Have 1000+ labeled examples? → SFT
- Task is verifiable but lacks golden outputs? → RFT
- Want to align with preferences? → DPO
LoRA (Low-Rank Adaptation)
Fireworks uses LoRA for efficient fine-tuning:
- Faster & cheaper - Train in hours, not days
- Easy to deploy - Instant deployment on Fireworks
- Flexible - Run multiple LoRAs on single base deployment
Deployment Types
| Type | Use Case | Scaling | |------|----------|---------| | Serverless | Variable traffic, cost optimization | Auto-scale to zero | | On-Demand | Consistent performance, high throughput | Dedicated GPUs | | Reserved | Predictable workloads, discounts | Pre-purchased capacity |
Agent Tracing (RFT)
For reinforcement fine-tuning with agents:
- Use
model_base_urlfrom trainer (points totracing.fireworks.ai) - Attach
FireworksTracingHttpHandlerfor structured logging - Log
Status.rollout_finished()orStatus.rollout_error()on completion - Trainer joins traces + logs via
rollout_id
API Compatibility
Fireworks is OpenAI-compatible. Key differences:
| Feature | OpenAI | Fireworks |
|---------|--------|-----------|
| max_tokens overflow | Error | Auto-truncate (configurable) |
| Streaming usage stats | Not returned | Returned in final chunk |
| Model names | gpt-4 | accounts/fireworks/models/llama-v3p1-8b-instruct |
Set context_length_exceeded_behavior: "error" for OpenAI-like behavior.
firectl CLI Quick Reference
# Authentication
firectl login
# Account operations
firectl account list
# Dataset operations
firectl dataset download <dataset-id>
firectl dataset list
# Fine-tuning jobs
firectl supervised-fine-tuning-job create --help
firectl supervised-fine-tuning-job list
firectl dpo-job resume <job-id>
# Deployments
firectl deployment create <model> --deployment-shape <shape>
firectl deployment scale <deployment-id> --replicas <n>
# Evaluators
firectl evaluator-revision get <evaluator-id>
# Billing
firectl billing export-metrics
Available Models (Highlights)
Text Models:
- DeepSeek V3, DeepSeek R1
- Llama 3.1/3.2/3.3 (8B, 70B, 405B)
- Qwen 2.5 family
- Kimi K2
Embedding Models:
fireworks/qwen3-embedding-8b(serverless)fireworks/qwen3-embedding-4bnomic-ai/nomic-embed-text-v1.5
Reranking Models:
fireworks/qwen3-reranker-8b(serverless)
Image Models:
- FLUX Kontext Pro/Max
- SDXL ControlNet
Browse all: https://fireworks.ai/models
Reference Files
| File | Content | Use For |
|------|---------|---------|
| references/llms-txt.md | Complete API reference (410 pages) | Detailed API docs, all CLI commands, parameters |
Navigation tips:
- Search for specific CLI commands:
firectl <command> - API endpoints follow pattern:
/v1/accounts/{account_id}/<resource> - Fine-tuning docs under
#fine-tuning-*sections - Deployment docs under
#deployment-*sections
Working with This Skill
For Beginners
- Start with Chat Completion example above
- Get API key from https://app.fireworks.ai
- Use OpenAI SDK (familiar interface)
- Try serverless models first (no deployment needed)
For Fine-Tuning
- Prepare JSONL dataset with
messagesformat - Upload with
Dataset.from_file()orfirectl - Choose fine-tuning method (SFT/RFT/DPO)
- Monitor with
firectl supervised-fine-tuning-job list - Deploy LoRA or merge into base model
For Production
- Consider on-demand deployments for consistent performance
- Enable prompt caching for repeated prefixes
- Use batch inference for offline processing
- Monitor usage via billing export or dashboard
- Set up service accounts for CI/CD
Common Patterns
Streaming with Usage Stats
for chunk in client.chat.completions.create(stream=True, ...):
if chunk.usage: # Available in final chunk
print(f"Tokens: {chunk.usage.total_tokens}")
Variable-Length Embeddings
response = client.embeddings.create(
model="fireworks/qwen3-embedding-8b",
input="Your text",
dimensions=128 # Reduce from default for faster similarity
)
Reranking Documents
# Using /rerank endpoint
response = client.post("/rerank", json={
"model": "fireworks/qwen3-reranker-8b",
"query": "search query",
"documents": ["doc1", "doc2", "doc3"]
})
Resources
- Model Library: https://fireworks.ai/models
- Playground: https://app.fireworks.ai/playground
- Usage Dashboard: https://app.fireworks.ai/account/usage
- API Reference: https://docs.fireworks.ai/api-reference
- firectl Docs: https://docs.fireworks.ai/tools-sdks/firectl
Notes
- Generated from official Fireworks AI documentation (410 pages)
- OpenAI SDK examples work directly with Fireworks
- Model names use
accounts/fireworks/models/<model-name>format - Fine-tuning uses LoRA by default (set
--lora-rank 0for full parameter)
微信扫一扫