Project Phoenix Domain

Optiver Variations

6 Configurations · Data Baseline → Full Synthesis

Variation Overview

Each variation adds a complete new capability layer on top of the previous. V6 is the full synthesis capstone — one command runs the entire pipeline.

V1

Core Data + Features

Data loading, baseline features, analysis

V2

Advanced Features

V2 Numba microstructure features

V3

Explainability + Ensembles

SHAP, Optuna, stacking, walk-forward

V4

Drift + Monitoring

Drift detection, data quality, alerting

V5

Parallel Execution

Parallelized CV, features, tuning

V6

Synthesis Capstone

Full pipeline, deployment gate, reports

V1 — Core Data + Features

Establishes the base system: CLI, tool registry, and data client. V1 baseline features compute imbalance, price, and temporal signals.

# V1 baseline workflow load_data() -> analyze_stock(stock_id=5) -> calculate_all_features()

V2 — Advanced Microstructure Features

Numba-accelerated V2 feature generation. Triplet and pairwise imbalance, microstructure signals, global stock statistics, and temporal window features. Walk-forward safe.

# V2 feature pipeline load_data() -> generate_all_v2_features() -> export_features() -> verify_export()

V3 — Explainability · Optuna · Ensembles

Adds SHAP explainability, Bayesian hyperparameter search, stacking ensembles, and walk-forward backtesting.

V4 — Drift Detection + Data Quality

Full monitoring suite. PSI, KS test, JS divergence, concept drift, covariate shift, structural break detection, and regime classification.

SignalMethodThreshold
Feature driftPSI< 0.1 stable · > 0.25 significant
Distribution shiftKS testalpha = 0.05
DivergenceJS divergencedomain-specific
Covariate shiftDomain classifierAUC > 0.6 indicates shift
Structural breaksCUSUM / Chowchange-point detection

V5 — Parallel Execution

ThreadPoolExecutor-backed parallel versions of the most compute-intensive operations. max_workers should not exceed available CPU cores.

Parallel Tools

generate_all_v2_features_parallel train_lightgbm_cv_parallel tune_hyperparameters_parallel run_walk_forward_parallel track_feature_drift_parallel

V6 — Synthesis Capstone

Full pipeline orchestration with profile-driven depth, deployment readiness gate, and structured reports.

# Synthesis profiles run_synthesis_workflow(profile="quick") # explore only run_synthesis_workflow(profile="standard") # features + model run_synthesis_workflow(profile="comprehensive") # full pipeline # Gate deployment on drift + quality criteria check_deployment_readiness() # → deploy | defer | reject | retrain # Structured reports create_analysis_report() create_model_report() create_monitoring_report()

Five deployment criteria: data quality, feature drift, CV MAE, Sharpe ratio, and active alerts. All five must clear before deployment is approved.