84 Tools · Data · Features · Modeling · Monitoring · Synthesis
Each tool owns one operation. No tool crosses pipeline stages. All outputs are verifiable before the next stage begins.
| Tool | Purpose |
|---|---|
load_data | Load Kaggle Trading at the Close dataset from disk. |
get_stock_info | Return metadata for a specific stock ID (0–199). |
get_date_range | Return the date ID range present in the dataset. |
get_data_summary | Statistical summary of rows, columns, and target distribution. |
filter_data | Slice dataset by stock ID, date range, or feature subset. |
Order book imbalance signals from bid/ask size and price.
WAP, reference price, and bid-ask spread derivations.
Seconds-in-bucket and auction-proximity signals.
All-feature pipeline and registry lookup.
Numba-accelerated V2 features for walk-forward safe pipelines.
| Tool | Purpose |
|---|---|
analyze_stock | Distribution, trend, and target summary for a stock ID. |
analyze_target | Target variable statistics and skew analysis. |
calculate_correlation | Feature-to-target and feature-to-feature correlation matrices. |
compare_stocks | Side-by-side stock behavior comparison. |
analyze_temporal_patterns | Intraday and cross-day temporal signal patterns. |
| Tool | Purpose |
|---|---|
prepare_train_test | Time-series aware train/test split with purge gap. |
train_baseline | Baseline LightGBM fit with default parameters. |
evaluate_model | MAE, RMSE, and zero-sum adjusted evaluation. |
get_feature_importance | Gain-based feature importance from trained model. |
create_cv_splits | Walk-forward cross-validation split generator. |
train_lightgbm_fold | Single fold LightGBM training with early stopping. |
train_lightgbm_cv | Full walk-forward CV with OOF predictions. |
predict_ensemble | Ensemble prediction from CV fold models. |
apply_zero_sum_prediction | Market-neutral prediction normalization. |
PSI < 0.1 = stable · PSI > 0.25 = significant drift · KS alpha = 0.05
ThreadPoolExecutor-backed variants of CV, feature gen, tuning, and monitoring.
End-to-end orchestration with deployment gate and structured reports.