WQ v1.1

Project Catalog

7 Data Science Projects Spanning the Full ML Lifecycle

Project Overview

Project Domain Key Techniques Data Type
Proj2 Real Estate (Buenos Aires) Linear Regression, Feature Engineering Tabular/CSV
Proj3 Air Quality (Nairobi) Time Series, AR/ARMA Models Time Series
Proj4 Earthquake Damage (Nepal) Logistic Regression, Decision Trees Classification
Proj5 Bankruptcy (Taiwan) Random Forest, Gradient Boosting Imbalanced Classification
Proj6 Consumer Finance (SCF) K-Means Clustering, PCA Clustering
Proj7 DS Lab Applicants Chi-Square, A/B Testing, ETL Hypothesis Testing
Proj8 Stock Volatility (MTN) GARCH, API Integration, TDD Time Series Forecasting

Lesson Workflows

Each project contains 4-5 lessons with predefined workflows. Example from Proj2:

Lesson Topic Model Pipeline
2.1 Price and Size Size -> LinearRegression -> Price
2.2 Price and Location Location -> Imputer + LinearRegression -> Price
2.3 Price and Neighborhood Neighborhood -> OneHotEncoder + Ridge -> Price
2.4 Price and Everything All Features Combined -> Full Pipeline -> Price

Data Sources

WQ/ ├── Proj2/data/ # Buenos Aires & Mexico City CSVs │ ├── buenos-aires--*.csv │ └── mexico-city--*.csv ├── Proj3/data/ # Nairobi air quality JSON │ └── nairobi.json ├── Proj4/data/ # Nepal earthquake SQLite ├── Proj5/data/ # Taiwan bankruptcy JSON ├── Proj6/data/ # SCF consumer finance CSV ├── Proj7/data/ # DS Lab applicants CSV └── Proj8/data/ # Stock data via API simulation

Cross-Project Analysis

The unified agent enables powerful cross-project queries:

# Apply technique from one project to another "Apply GARCH from project 8 to air quality data from project 3" # Compare models across domains "Which model works best for classification: Proj4 or Proj5?" # Educational integration "Explain how logistic regression works and show example from Proj4"