Naive Bayes & Data Piling
Homework 4 covers Naive Bayes classification for spam detection and explores the phenomenon of data piling in high-dimensional settings. Understand when independence assumptions help and the challenges of p >> n scenarios.
Learning Objectives
- Implement Naive Bayes with different distributions
- Understand the independence assumption and when it works
- Explore data piling in high-dimensional classification
- Analyze behavior when dimensions exceed observations
ESLII Reference
This homework draws from Chapter 6 (Kernel Smoothing Methods) and Chapter 18 (High-Dimensional Problems), covering modern challenges in statistical learning.
Available Scripts
| Script | Description | Subdirectory |
|---|---|---|
spam_NB_app.py |
Naive Bayes spam classifier application | root |
pdf_spam_NB_app.py |
PDF-based Naive Bayes implementation | root |
data_piling_sim.py |
High-dimensional data piling simulation | 18.9/ |
Quick Start
# CLI exploration
cd domains/Stan/cli
python main.py "homework 4"
# Cockpit GUI
cd domains/Stan/cockpit
python stan_cockpit.py
# Enter: "explore naive bayes data piling"
# Direct tool access
from unified_agent import StanDataClient, ToolRegistry
client = StanDataClient()
tools = ToolRegistry(client)
result = tools.get_tool('load_hmk4_info')({})
Related Tools
| Tool | Description |
|---|---|
load_hmk4_info |
Get Homework 4 metadata and available scripts |
list_hmk4_scripts |
List all Python scripts in Hmk4 and 18.9/ |
find_by_chapter |
Find homeworks by ESLII chapter |