Architecture

Vision

A single agentic interface that spans ALL 7 WQU Data Science projects, enabling:

Cross-project analysis ("Apply GARCH to air quality data")
Technique comparison ("Which model works best for classification?")
Educational integration (Textbook reference on demand)
Unified data exploration across all domains

System Architecture

┌─────────────────────────────────────────────────────────────────┐ │ WQ Unified Cockpit │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────┐│ │ │ Blueprint │ │ Execution │ │ Inspector │ │Artifacts││ │ │ Panel │ │ Trace │ │ Panel │ │Workspace││ │ └─────────────┘ └─────────────┘ └─────────────┘ └─────────┘│ └─────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────┐ │ PlanBuilder │ │ Query Pattern Matching (NLI) → Execution Plan Generation │ │ _plan_proj2_* _plan_proj3_* _plan_cross_project_* │ └─────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────┐ │ AgenticEngine │ │ State Machine: IDLE → PLANNING → AWAITING_APPROVAL → RUNNING │ │ Parameter Resolution: $step_N_result.field │ └─────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────┐ │ ToolRegistry (41 tools) │ │ PROJECT TOOLS │ CROSS-PROJECT │ TEXTBOOK │ UTILITY │ └─────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────┐ │ ExecutionContext (3-Tier Cache) │ │ step_results │ data_cache │ query_cache │ └─────────────────────────────────────────────────────────────────┘

Per-Project Cockpit Structure

File	Purpose
`agentic_engine.py`	Plan execution orchestrator with state machine
`plan_builder.py`	NLI parser converting queries to ExecutionPlans
`data_client.py`	Project-specific data access with 3-tier caching
`tool_registry.py`	Tool catalog (data, model, visualization, utility)
`execution_context.py`	Session state and cache management
`ui_components/`	tkinter UI panels (blueprint, trace, inspector)

State Machine

IDLE │ ▼ PLANNING ──────────────────────────────┐ │ │ ▼ │ AWAITING_APPROVAL ──[cancel]────────────┤ │ │ [approve] │ │ │ ▼ │ RUNNING ──[pause]──► PAUSED ──[resume]──┤ │ │ │ [complete] [cancel] │ │ │ │ ▼ ▼ │ COMPLETED IDLE ◄───────────┘

Three-Tier Caching

1. Memory Cache

In-memory dictionary for current session data. Fastest access, cleared on restart.

2. Disk Cache

Parquet files in data/cache/. Persists across sessions.

3. URL Fetch

Fallback to sample data URLs if local files unavailable.

Pattern Matching Hierarchy

PlanBuilder matches queries from most-specific to least-specific:

Exact lesson references: "lesson 3.3", "run lesson 2.1"
Cross-project patterns: "apply X from projA to projB"
Technique patterns: "cluster", "classify", "forecast"
Domain patterns: "real estate", "air quality", "earthquake"
General fallbacks: "help", "show capabilities"

File Structure

domains/WQ/ ├── unified_agent/ │ ├── data_client.py # UnifiedDataClient │ ├── tool_registry.py # 41 tools │ ├── plan_builder.py # Query patterns │ ├── agentic_engine.py # Orchestration │ └── wq_cockpit.py # Unified GUI ├── Proj2-8/ # Project data sources ├── textbook_client.py # PDF parsing ├── textbook_tools.py # Reference tools └── Textbook/ # 22 markdown chapters