Multi-Agent Integration (v3.0)
Overview
| Item | Details |
|------|---------|
| Date | 2024-12-29 |
| Goal | Integrate Claude agents for AI-assisted trading decisions |
| Branch | feature/v3.0-multi-agent |
| New Code | ~4,744 lines across 7 files |
| Status | Ready for integration |
Agent Architecture
UnifiedOrchestrator (Opus)
(Synthesizes all recommendations)
|
+---------------------+---------------------+
| | |
Training (5) Trading (5) Selection (4)
- Hyperparameter - Signal Analyst - Symbol Scorer
- Risk Analyst - Risk Guardian - Weight Optimizer
- Reward Engineer - Position Sizer - Compatibility Auditor
- Data Monitor - Execution Timer - Portfolio Coherence
- Backtest Validator - Exit Strategist
|
SharedAgentContext
(Thread-safe singleton)
|
AgentSafetyWrapper
(Bounds, rate limits, veto)
Agent Roles by System
Training Agents (5)
| Agent | Model | Interval | Purpose | |-------|-------|----------|---------| | Hyperparameter Tuner | Sonnet | 5 cycles | Optimizes LR, entropy coef, batch size | | Risk Analyst | Sonnet | 3 cycles | Monitors drawdown, stability, overfitting | | Reward Engineer | Sonnet | 10 cycles | Detects HOLD bias, reward collapse | | Data Monitor | Haiku | 20 cycles | NaN/Inf detection, normalization drift | | Backtest Validator | Sonnet | 15 cycles | Out-of-sample validation |
Live Trading Agents (5)
| Agent | Model | Purpose | |-------|-------|---------| | Signal Analyst | Sonnet | Adjusts confidence (0.7-1.3x multiplier) | | Risk Guardian | Sonnet | VETO AUTHORITY - enforces risk limits | | Position Sizer | Haiku | Scales positions (0.3-2.0x) | | Execution Timer | Haiku | MARKET vs LIMIT order selection | | Exit Strategist | Sonnet | Stop/TP adjustments, exit timing |
Selection Agents (4)
| Agent | Model | Purpose | |-------|-------|---------| | Symbol Scorer | Sonnet | Validates metrics, flags anomalies | | Weight Optimizer | Sonnet | Adjusts scoring weights for regime | | Compatibility Auditor | Haiku | Catches scoring errors | | Portfolio Coherence | Sonnet | Validates diversity, suggests swaps |
Safety Mechanisms
Forbidden Actions (Agents can NEVER take)
bypass_all_gatesdisable_risk_controlsunlimited_positionignore_drawdownforce_margin_calldelete_modelsmodify_api_keys
Safety Bounds (All values clamped)
| Parameter | Bounds | Use Case | |-----------|--------|----------| | lr_multiplier | 0.1 - 3.0 | Training LR adjustment | | entropy_multiplier | 0.1 - 5.0 | Exploration coefficient | | position_scale | 0.3 - 2.0 | Position sizing | | confidence_multiplier | 0.7 - 1.3 | Signal adjustment | | stop_distance_pct | 0.5% - 15% | Stop-loss range |
Rate Limiting
| Limit | Value | |-------|-------| | Max consultations/hour | 100 | | Max actions/hour | 50 | | Min interval between adjustments | 60 seconds | | Max critical actions/day | 10 |
Veto Authority
- Risk Guardian has veto power for safety-critical decisions
- Veto reasoning logged and auditable
- Orchestrator can override only with strong evidence
Operating Modes
| Mode | Behavior | |------|----------| | ADVISORY (default) | Agents recommend, humans decide | | SUPERVISED | Agents recommend with approval required | | AUTONOMOUS | Agents execute recommendations |
Integration: Training
Configuration
from alpaca_trading.training import MultiAgentTrainer, MultiAgentConfig
agent_config = MultiAgentConfig(
# Enable/disable agents
enable_hyperparameter_tuner=True,
enable_risk_analyst=True,
enable_reward_engineer=True,
enable_data_monitor=False, # Optional
enable_backtest_validator=False, # Expensive
# Consultation intervals (validation cycles)
hyperparam_interval=5,
risk_interval=3,
reward_interval=10,
# Safety bounds
max_lr_multiplier=2.0,
min_lr_multiplier=0.1,
# Cost control
max_consultations_per_run=100,
# Logging
log_agent_responses=True,
)
Usage
# Create multi-agent trainer
trainer = MultiAgentTrainer(env, ppo_config, agent_config)
# Train with agent guidance
import asyncio
results = asyncio.run(trainer.train_with_guidance())
# Save agent logs
trainer.save_agent_logs("agent_logs.json")
Integration: Live Trading
Configuration
from alpaca_trading.agents import (
UnifiedOrchestrator,
OrchestratorConfig,
OrchestratorMode,
)
orchestrator = UnifiedOrchestrator(
config=OrchestratorConfig(
mode=OrchestratorMode.ADVISORY, # Start safe
enable_trading_agents=True,
enable_selection_agents=False,
min_consensus_for_action=0.66, # 2/3 agreement
require_unanimous_for_critical=True,
max_total_consultations_per_hour=200,
)
)
Signal Evaluation
# Consult Signal Analyst
analysis = await trading_agents.evaluate_signal(
symbol="AAPL",
direction=1,
confidence=0.72,
magnitude=0.015,
regime="bull",
)
adjusted_confidence = analysis['adjusted_confidence']
recommendation = analysis['recommendation'] # PROCEED, REDUCE_SIZE, SKIP
Gate Check with Risk Guardian
# Risk Guardian has veto authority
risk_assessment = await trading_agents.evaluate_gates(
symbol=symbol,
gate_results=gate_results,
win_rate=0.62,
loss_streak=1,
drawdown=0.032,
exposure=0.45,
)
if risk_assessment['veto']:
logger.warning(f"Risk Guardian veto: {risk_assessment['veto_reason']}")
return False
Position Sizing
# Consult Position Sizer
sizing = await trading_agents.recommend_position_scale(
signal_strength=0.75,
confidence=0.72,
volatility=0.24,
drawdown=0.032,
exposure=0.45,
win_rate=0.62,
)
scale = sizing['scale_multiplier'] # 0.3 - 2.0
final_qty = base_qty * scale
Shared Context
from alpaca_trading.agents import get_shared_context
context = get_shared_context()
# Update portfolio state
context.update_portfolio_state(
total_equity=105000,
cash_available=45000,
daily_pnl=320,
current_drawdown=0.032,
win_rate=0.625,
)
# Update market state
context.update_market_regime("bull", volatility=0.24)
context.update_trading_session("crypto_only")
Cost Estimates
| Agent Type | Model | Est. Cost/Run | |------------|-------|---------------| | Orchestrator | Opus | ~$1.50 | | Hyperparameter Tuner | Sonnet | ~$0.60 | | Risk Analyst | Sonnet | ~$0.90 | | Reward Engineer | Sonnet | ~$0.30 | | Data Monitor | Haiku | ~$0.10 | | Total/Training Run | - | ~$3.50 |
Annual estimate: ~$350-$700 (100 training runs)
Requirements
ANTHROPIC_API_KEYenvironment variableanthropicpackage (pip install anthropic)- Optional:
nest_asynciofor Colab compatibility
Files Location
| File | Lines | Purpose |
|------|-------|---------|
| alpaca_trading/agents/__init__.py | 142 | Module exports |
| alpaca_trading/agents/orchestrator.py | 813 | Coordination |
| alpaca_trading/agents/live_trading.py | 761 | Trading agents |
| alpaca_trading/agents/selection.py | 677 | Selection agents |
| alpaca_trading/agents/safety.py | 373 | Safety guardrails |
| alpaca_trading/agents/shared_context.py | 490 | Shared state |
| alpaca_trading/training/multi_agent.py | 1,065 | Training agents |
Rollout Strategy
| Phase | Description | Risk | |-------|-------------|------| | 1 | Merge branch, run tests | Low | | 2 | Training integration (notebook) | Low | | 3 | Live trading advisory mode | Low | | 4 | Live trading autonomous mode | Medium |
Recommendation: Start with advisory mode for 2-4 weeks to validate agent quality before autonomous mode.
Failed Attempts (Critical)
| Attempt | Why it Failed | Lesson Learned | |---------|---------------|----------------| | Direct training loop modification | Breaks NativePPOTrainer | Use wrapper pattern instead | | Synchronous agent calls | Blocks training loop | Use asyncio for parallel calls | | No rate limiting | Runaway API costs | Always set max_consultations | | No safety bounds | Agents recommended extreme values | Clamp all numeric outputs | | Autonomous mode first | Untested agents made bad trades | Always start advisory |
Key Principles
- Advisory by Default - Start safe, validate before autonomous
- Wrapper Pattern - Don't modify core training/trading code
- Consensus Required - 2/3 agreement for action, unanimous for critical
- Veto Authority - Risk agents can block unsafe actions
- Bounds Enforcement - All numeric values clamped to safe ranges
- Rate Limiting - Prevent API cost explosion
- Full Audit Trail - Every decision logged with reasoning
References
docs/reference/AI_AGENT_REFERENCE.md- Comprehensive guideexamples/multi_agent_training_example.py- Working example- Branch:
feature/v3.0-multi-agent
微信扫一扫