Trading at the Close · ML Pipeline + Drift Monitoring
Optiver is a deterministic ML pipeline domain built for the Kaggle Trading at the Close competition. It orchestrates data ingestion, feature engineering, model training, and drift monitoring through a structured tool registry.
Each stage is a narrow, verifiable tool. Load data, engineer features, train LightGBM with walk-forward CV, then gate deployment on drift thresholds.
Model artifacts and drift reports are written to disk, then explicitly verified before the deployment gate clears.
Variations progress from raw data loading through feature engineering, model CV, drift detection, and full synthesis.
Each tool owns one operation: load, engineer, train, or monitor. No tool crosses pipeline stages.
Feature drift reports and model summaries use structured templates with stable fields and sourced values.
Smoke tests cover all 6 variations. Deployment gate requires PSI below threshold before promotion.
| Stage | Module | Status |
|---|---|---|
| Data Ingestion | data_client.py |
Online |
| Feature Engineering | feature_engineering.py |
Online |
| Model CV | model_pipeline.py |
Ready |
| Drift Monitoring | feature_drift.py |
Watching |
| Deployment Gate | deployment_gate.py |
Armed |
Fast access to price, imbalance, and temporal signals from the Kaggle dataset.
V2 microstructure features: WAP, bid-ask spread, order imbalance, temporal lags.
LightGBM with Optuna hyperparameter tuning and time-series cross-validation.
PSI, KS test, JS divergence across feature distributions. Dashboard output.
Full pipeline run: load → engineer → train → monitor → gate. Single command.