Five Layers · Load → Engineer → Train → Monitor → Deploy
Each module is a narrow, verifiable layer. No module crosses pipeline stages. Outputs are written to disk before downstream modules begin.
OptiverDataClient wraps the Kaggle dataset. Fast access to price, imbalance, and temporal signals. Slice by stock, date, or feature subset.
V1 baseline features and V2 Numba-accelerated microstructure features. WAP, bid-ask spread, triplet imbalance, temporal lags, global stock statistics.
LightGBM with time-series CV, walk-forward backtesting, and Optuna hyperparameter tuning. SHAP explainability and stacking ensembles.
PSI, KS test, JS divergence, concept drift, covariate shift, and structural break detection. Dashboard output gates deployment.
Full pipeline orchestration: load → engineer → train → monitor → gate → report. Single command, profile-driven depth.
| Module | File | Responsibility |
|---|---|---|
| Data Client | data_client.py |
Dataset loading, slicing, and summary statistics |
| Feature Forge V1 | feature_engineering.py |
Imbalance, price, and temporal baseline features |
| Feature Forge V2 | feature_engineering.py |
Numba-accelerated microstructure features |
| Model Lab | model_pipeline.py |
LightGBM CV, evaluation, zero-sum adjustment |
| Optuna Tuner | optuna_tuner.py |
TPE Bayesian hyperparameter search |
| Walk-Forward | walk_forward.py |
Production-simulating backtesting |
| Drift Radar | feature_drift.py |
PSI, KS, JS divergence feature monitoring |
| Concept Drift | target_drift.py |
Target and feature-target relationship drift |
| Data Quality | data_quality.py |
Missing values, outlier frequency, structural breaks |
| Alerts | monitoring_alerts.py |
Alert routing from drift and quality signals |
| Synthesis | agentic_engine.py |
End-to-end orchestration and deployment gate |
| Reports | report_templates.py |
Structured analysis, model, and monitoring reports |
Explore mode: load, summary, target analysis, report. Fast turnaround for initial data review.
Model mode: features, LightGBM CV, drift check, basic report. Production-ready validation.
Full mode: parallel features, CV, walk-forward backtest, full monitoring suite, complete report.