Project Phoenix Domain

Optiver Tools

84 Tools · Data · Features · Modeling · Monitoring · Synthesis

Registry Overview

84
Total Tools
5
Data Tools
16
Feature Tools
19
Model + ML Tools
16
Drift + Monitor

Each tool owns one operation. No tool crosses pipeline stages. All outputs are verifiable before the next stage begins.

Data Tools

ToolPurpose
load_dataLoad Kaggle Trading at the Close dataset from disk.
get_stock_infoReturn metadata for a specific stock ID (0–199).
get_date_rangeReturn the date ID range present in the dataset.
get_data_summaryStatistical summary of rows, columns, and target distribution.
filter_dataSlice dataset by stock ID, date range, or feature subset.

Feature Tools — V1 Baseline

Imbalance Features

Order book imbalance signals from bid/ask size and price.

calculate_imbalance_features

Price Features

WAP, reference price, and bid-ask spread derivations.

calculate_price_features

Temporal Features

Seconds-in-bucket and auction-proximity signals.

calculate_temporal_features

Composite + Info

All-feature pipeline and registry lookup.

calculate_all_features get_feature_info

Feature Tools — V2 Microstructure

Numba-accelerated V2 features for walk-forward safe pipelines.

Triplet + Pairwise Imbalance

calculate_triplet_imbalance calculate_pairwise_imbalance

Microstructure + Global

calculate_microstructure_features calculate_global_stock_features

Temporal Windows

calculate_temporal_shift calculate_temporal_return calculate_temporal_diff generate_all_v2_features

Exports + Verification

apply_zero_sum_adjustment export_features verify_export

Analysis Tools

ToolPurpose
analyze_stockDistribution, trend, and target summary for a stock ID.
analyze_targetTarget variable statistics and skew analysis.
calculate_correlationFeature-to-target and feature-to-feature correlation matrices.
compare_stocksSide-by-side stock behavior comparison.
analyze_temporal_patternsIntraday and cross-day temporal signal patterns.

Model Tools

ToolPurpose
prepare_train_testTime-series aware train/test split with purge gap.
train_baselineBaseline LightGBM fit with default parameters.
evaluate_modelMAE, RMSE, and zero-sum adjusted evaluation.
get_feature_importanceGain-based feature importance from trained model.
create_cv_splitsWalk-forward cross-validation split generator.
train_lightgbm_foldSingle fold LightGBM training with early stopping.
train_lightgbm_cvFull walk-forward CV with OOF predictions.
predict_ensembleEnsemble prediction from CV fold models.
apply_zero_sum_predictionMarket-neutral prediction normalization.

Advanced ML — SHAP · Optuna · Ensembles

Explainability

compute_shap_values get_shap_importance explain_single_prediction

Tuning

tune_hyperparameters get_best_params visualize_optimization

Ensembles + Walk-Forward

create_stacking_ensemble train_stacking_meta run_walk_forward analyze_regime

Drift + Quality + Alerts

PSI < 0.1 = stable · PSI > 0.25 = significant drift · KS alpha = 0.05

Statistical Drift

detect_ks_drift calculate_psi calculate_js_divergence track_feature_drift detect_multivariate_drift

Concept + Covariate

detect_target_drift detect_concept_drift detect_covariate_shift analyze_prediction_drift

Quality + Structural

track_missing_values track_outlier_frequency detect_structural_breaks_cusum detect_structural_breaks_chow fit_regime_hmm

Alerts + Dashboard

generate_drift_alerts get_monitoring_dashboard_data

Parallel Execution + Synthesis

Parallel Tools

ThreadPoolExecutor-backed variants of CV, feature gen, tuning, and monitoring.

generate_all_v2_features_parallel train_lightgbm_cv_parallel tune_hyperparameters_parallel run_walk_forward_parallel track_feature_drift_parallel

Synthesis Capstone

End-to-end orchestration with deployment gate and structured reports.

run_synthesis_workflow check_deployment_readiness create_analysis_report create_model_report create_monitoring_report