Modules | Optiver

Module Overview

Each module is a narrow, verifiable layer. No module crosses pipeline stages. Outputs are written to disk before downstream modules begin.

Data Client

OptiverDataClient wraps the Kaggle dataset. Fast access to price, imbalance, and temporal signals. Slice by stock, date, or feature subset.

5 tools · 480k rows · 200 stocks

Feature Forge

V1 baseline features and V2 Numba-accelerated microstructure features. WAP, bid-ask spread, triplet imbalance, temporal lags, global stock statistics.

16 tools · walk-forward safe

Model Lab

LightGBM with time-series CV, walk-forward backtesting, and Optuna hyperparameter tuning. SHAP explainability and stacking ensembles.

19 tools · MAE-optimized

Drift Radar

PSI, KS test, JS divergence, concept drift, covariate shift, and structural break detection. Dashboard output gates deployment.

16 tools · deployment gate

Synthesis

Full pipeline orchestration: load → engineer → train → monitor → gate → report. Single command, profile-driven depth.

5 tools · V6 capstone

Module File Map

Module	File	Responsibility
Data Client	`data_client.py`	Dataset loading, slicing, and summary statistics
Feature Forge V1	`feature_engineering.py`	Imbalance, price, and temporal baseline features
Feature Forge V2	`feature_engineering.py`	Numba-accelerated microstructure features
Model Lab	`model_pipeline.py`	LightGBM CV, evaluation, zero-sum adjustment
Optuna Tuner	`optuna_tuner.py`	TPE Bayesian hyperparameter search
Walk-Forward	`walk_forward.py`	Production-simulating backtesting
Drift Radar	`feature_drift.py`	PSI, KS, JS divergence feature monitoring
Concept Drift	`target_drift.py`	Target and feature-target relationship drift
Data Quality	`data_quality.py`	Missing values, outlier frequency, structural breaks
Alerts	`monitoring_alerts.py`	Alert routing from drift and quality signals
Synthesis	`agentic_engine.py`	End-to-end orchestration and deployment gate
Reports	`report_templates.py`	Structured analysis, model, and monitoring reports

Pipeline Execution Rules

Data must be loaded before any feature calculation or analysis.
V2 features require V1 features to be computed first.
Model CV requires time-series purge gap to prevent leakage.
Export and verification pairs implement Write-Then-Verify.
Zero-sum adjustment applied after predictions for market neutrality.
PSI < 0.1 = stable · PSI > 0.25 = significant drift · deployment blocked.
Deployment decisions: deploy, defer, reject, or retrain.

Synthesis Profiles

Quick

Explore mode: load, summary, target analysis, report. Fast turnaround for initial data review.

~2 min

Standard

Model mode: features, LightGBM CV, drift check, basic report. Production-ready validation.

~8 min

Comprehensive

Full mode: parallel features, CV, walk-forward backtest, full monitoring suite, complete report.

~20 min