Variations | Optiver

Variation Overview

Each variation adds a complete new capability layer on top of the previous. V6 is the full synthesis capstone — one command runs the entire pipeline.

V1

Core Data + Features

Data loading, baseline features, analysis

V2

Advanced Features

V2 Numba microstructure features

V3

Explainability + Ensembles

SHAP, Optuna, stacking, walk-forward

V4

Drift + Monitoring

Drift detection, data quality, alerting

V5

Parallel Execution

Parallelized CV, features, tuning

V6

Synthesis Capstone

Full pipeline, deployment gate, reports

V1 — Core Data + Features

Establishes the base system: CLI, tool registry, and data client. V1 baseline features compute imbalance, price, and temporal signals.

Data loading and slicing via OptiverDataClient
Stock and target analysis tools
V1 baseline feature pipeline: imbalance, price, temporal
Feature correlation and cross-stock comparison

# V1 baseline workflow load_data() -> analyze_stock(stock_id=5) -> calculate_all_features()

V2 — Advanced Microstructure Features

Numba-accelerated V2 feature generation. Triplet and pairwise imbalance, microstructure signals, global stock statistics, and temporal window features. Walk-forward safe.

calculate_triplet_imbalance and calculate_pairwise_imbalance
calculate_microstructure_features and calculate_global_stock_features
Temporal shift, return, and diff windows
Zero-sum adjustment and write-then-verify export

# V2 feature pipeline load_data() -> generate_all_v2_features() -> export_features() -> verify_export()

V3 — Explainability · Optuna · Ensembles

Adds SHAP explainability, Bayesian hyperparameter search, stacking ensembles, and walk-forward backtesting.

SHAP values require a trained model and test set (not training data)
SHAP importance can differ from gain-based feature importance
Optuna uses TPE Bayesian optimization, ideally with CV evaluation
Stacking ensembles use out-of-fold predictions to prevent leakage
Walk-forward backtesting simulates production inference conditions

V4 — Drift Detection + Data Quality

Full monitoring suite. PSI, KS test, JS divergence, concept drift, covariate shift, structural break detection, and regime classification.

Signal	Method	Threshold
Feature drift	PSI	< 0.1 stable · > 0.25 significant
Distribution shift	KS test	alpha = 0.05
Divergence	JS divergence	domain-specific
Covariate shift	Domain classifier	AUC > 0.6 indicates shift
Structural breaks	CUSUM / Chow	change-point detection

V5 — Parallel Execution

ThreadPoolExecutor-backed parallel versions of the most compute-intensive operations. max_workers should not exceed available CPU cores.

Parallel Tools

generate_all_v2_features_parallel train_lightgbm_cv_parallel tune_hyperparameters_parallel run_walk_forward_parallel track_feature_drift_parallel

V6 — Synthesis Capstone

Full pipeline orchestration with profile-driven depth, deployment readiness gate, and structured reports.

# Synthesis profiles run_synthesis_workflow(profile="quick") # explore only run_synthesis_workflow(profile="standard") # features + model run_synthesis_workflow(profile="comprehensive") # full pipeline # Gate deployment on drift + quality criteria check_deployment_readiness() # → deploy | defer | reject | retrain # Structured reports create_analysis_report() create_model_report() create_monitoring_report()

Five deployment criteria: data quality, feature drift, CV MAE, Sharpe ratio, and active alerts. All five must clear before deployment is approved.