7 Data Science Projects Spanning the Full ML Lifecycle
| Project | Domain | Key Techniques | Data Type |
|---|---|---|---|
| Proj2 | Real Estate (Buenos Aires) | Linear Regression, Feature Engineering | Tabular/CSV |
| Proj3 | Air Quality (Nairobi) | Time Series, AR/ARMA Models | Time Series |
| Proj4 | Earthquake Damage (Nepal) | Logistic Regression, Decision Trees | Classification |
| Proj5 | Bankruptcy (Taiwan) | Random Forest, Gradient Boosting | Imbalanced Classification |
| Proj6 | Consumer Finance (SCF) | K-Means Clustering, PCA | Clustering |
| Proj7 | DS Lab Applicants | Chi-Square, A/B Testing, ETL | Hypothesis Testing |
| Proj8 | Stock Volatility (MTN) | GARCH, API Integration, TDD | Time Series Forecasting |
Each project contains 4-5 lessons with predefined workflows. Example from Proj2:
| Lesson | Topic | Model Pipeline |
|---|---|---|
| 2.1 | Price and Size | Size -> LinearRegression -> Price |
| 2.2 | Price and Location | Location -> Imputer + LinearRegression -> Price |
| 2.3 | Price and Neighborhood | Neighborhood -> OneHotEncoder + Ridge -> Price |
| 2.4 | Price and Everything | All Features Combined -> Full Pipeline -> Price |
The unified agent enables powerful cross-project queries: