Project Phoenix Domain

Optiver Data

Kaggle Trading at the Close · 480k Rows · 200 Stocks · 17 Columns

Dataset Coverage

480K
Rows
200
Stocks
481
Trading Days
17
Columns

Source: ~/Python/Optiver/OptFeatureViz/train.csv — Kaggle Trading at the Close competition dataset. Each row represents a 10-second auction interval for one stock on one trading day.

Schema Reference

Column Type Description
stock_idintStock identifier (0–199)
date_idintTrading day identifier (0–480)
seconds_in_bucketintSeconds elapsed in the auction window
imbalance_sizefloatVolume of imbalance at current snapshot
imbalance_buy_sell_flagintBuy (1), sell (-1), or neutral (0) imbalance direction
reference_pricefloatPrice at which imbalance is zero
matched_sizefloatVolume matched in auction at current price
far_pricefloatIndicative uncrossing price for all auction orders
near_pricefloatIndicative uncrossing price for limit orders
bid_pricefloatBest bid in continuous order book
bid_sizefloatVolume at best bid
ask_pricefloatBest ask in continuous order book
ask_sizefloatVolume at best ask
wapfloatWeighted average price from bid/ask sizes
targetfloat60-second future price movement (prediction target)
time_idintUnique time bucket identifier
row_idstrUnique row identifier (stock_id–time_id)

System Architecture

unified_agent/ ├── data_client.py # OptiverDataClient: data access layer ├── tool_registry.py # ToolRegistry: 84 tools by category ├── execution_context.py # ExecutionContext: shared pipeline state ├── plan_builder.py # PlanBuilder: regex → execution plan ├── agentic_engine.py # State machine orchestrator ├── feature_engineering.py # V2 microstructure feature generation ├── model_pipeline.py # LightGBM training with time-series CV ├── optuna_tuner.py # Hyperparameter optimization (TPE) ├── walk_forward.py # Walk-forward backtesting ├── feature_drift.py # Feature drift detection (PSI, KS, JS) ├── target_drift.py # Target/concept drift detection ├── data_quality.py # Data quality monitoring ├── monitoring_alerts.py # Alert routing ├── smart_tools.py # Meta-tools (SmartAnalyze, SmartMonitor) └── report_templates.py # Structured report outputs

Execution Flow

CLI + REPL

cli/main.py provides interactive commands and one-shot queries. Natural language maps to tool execution plans.

Planner

PlanBuilder compiles regex patterns into ordered execution steps. More specific patterns take priority.

Engine

AgenticEngine executes plan steps, updates ExecutionContext, and surfaces results.

Reports

report_templates generate analysis, model, and monitoring summaries with structured fields.

Quick Start

# Run interactive REPL cd domains/Optiver/cli && python main.py # One-shot query python main.py "analyze stock 5" # Run domain test suite cd domains/Optiver && python test_domain.py # Full synthesis workflow python main.py "run synthesis workflow standard"