Homework 4: Naive Bayes & Data Piling

Naive Bayes & Data Piling

Homework 4 covers Naive Bayes classification for spam detection and explores the phenomenon of data piling in high-dimensional settings. Understand when independence assumptions help and the challenges of p >> n scenarios.

Naive Bayes Data Piling High-Dimensional ESLII Chapter 6 ESLII Chapter 18

Learning Objectives

Implement Naive Bayes with different distributions
Understand the independence assumption and when it works
Explore data piling in high-dimensional classification
Analyze behavior when dimensions exceed observations

ESLII Reference

This homework draws from Chapter 6 (Kernel Smoothing Methods) and Chapter 18 (High-Dimensional Problems), covering modern challenges in statistical learning.

Available Scripts

Script	Description	Subdirectory
`spam_NB_app.py`	Naive Bayes spam classifier application	root
`pdf_spam_NB_app.py`	PDF-based Naive Bayes implementation	root
`data_piling_sim.py`	High-dimensional data piling simulation	18.9/

Quick Start

# CLI exploration
cd domains/Stan/cli
python main.py "homework 4"

# Cockpit GUI
cd domains/Stan/cockpit
python stan_cockpit.py
# Enter: "explore naive bayes data piling"

# Direct tool access
from unified_agent import StanDataClient, ToolRegistry
client = StanDataClient()
tools = ToolRegistry(client)
result = tools.get_tool('load_hmk4_info')({})
            

Related Tools

Tool	Description
`load_hmk4_info`	Get Homework 4 metadata and available scripts
`list_hmk4_scripts`	List all Python scripts in Hmk4 and 18.9/
`find_by_chapter`	Find homeworks by ESLII chapter