Project Phoenix Domain

Optiver

Trading at the Close · ML Pipeline + Drift Monitoring

84
Tools
6
Variations
5.9M
Price Rows
200
Stocks

Domain Overview

Optiver is a deterministic ML pipeline domain built for the Kaggle Trading at the Close competition. It orchestrates data ingestion, feature engineering, model training, and drift monitoring through a structured tool registry.

Each stage is a narrow, verifiable tool. Load data, engineer features, train LightGBM with walk-forward CV, then gate deployment on drift thresholds.

Phoenix Principles Applied

Write-Then-Verify

Model artifacts and drift reports are written to disk, then explicitly verified before the deployment gate clears.

Abstraction Ladder

Variations progress from raw data loading through feature engineering, model CV, drift detection, and full synthesis.

Domain Doctor Tools

Each tool owns one operation: load, engineer, train, or monitor. No tool crosses pipeline stages.

Anti-Hallucination Templates

Feature drift reports and model summaries use structured templates with stable fields and sourced values.

Test-and-Prove

Smoke tests cover all 6 variations. Deployment gate requires PSI below threshold before promotion.

Pipeline Coverage

Stage Module Status
Data Ingestion data_client.py Online
Feature Engineering feature_engineering.py Online
Model CV model_pipeline.py Ready
Drift Monitoring feature_drift.py Watching
Deployment Gate deployment_gate.py Armed

Module Breakdown

Data Client

Fast access to price, imbalance, and temporal signals from the Kaggle dataset.

5.9M rows · 200 stocks

Feature Forge

V2 microstructure features: WAP, bid-ask spread, order imbalance, temporal lags.

Walk-forward safe

Model Lab

LightGBM with Optuna hyperparameter tuning and time-series cross-validation.

MAE-optimized

Drift Radar

PSI, KS test, JS divergence across feature distributions. Dashboard output.

Gates deployment

Synthesis

Full pipeline run: load → engineer → train → monitor → gate. Single command.

Variation 6

Sample Queries

# Load competition data load_trading_data(source="kaggle", split="train") # Engineer V2 microstructure features engineer_features(version=2, include_temporal=True) # Run walk-forward CV with LightGBM run_model_cv(model="lgbm", folds=5, metric="mae") # Check feature drift before deployment run_drift_monitor(method="psi", threshold=0.2) # Full synthesis workflow run_synthesis_workflow(profile="standard")