**Author:** blue-az **Status:** Active draft **Paper group:** Sensor-to-Simulation Engineering **Predecessor:** Paper 1.27 (WHITE_PAPER_WEARABLE_SENSOR_CORPUS.md) — broader corpus framing this paper applies to one bout **Data sources:** Zepp2 session 55 (Sensors/2Zepp/ZeppWrangle/ZeppWrangle.csv); Babolat aggregate row in tennis_unified.db matches table for session 560; rendered Zepp dashboard at domains/SensorAgents/TennisAgent/data/reports/zepp2_impact_dashboard_2023-05-14.html **Anonymization note:** the opponent is referred to as "the opponent" throughout. The opponent's name appears in upstream raw data and in public USTA tournament records but does not appear in this paper's rendered prose, queries, or figure captions.
---
A single competitive USTA Round-of-16 match (Scottsdale Open, 2023-05-14, score 4-6 2-6, loss), instrumented by dual wearable sensors (Zepp2 and Babolat POP) alongside passive step monitoring, generates a dense but bounded data pool at three distinct fidelity tiers. Zepp2 (the older Zepp Tennis racquet sensor) captured 352 shots over 92 minutes with per-shot XY impact location, spin, ball speed, and stroke classification. A Babolat POP captured 284 shots in a narrower 58-minute window with per-stroke-type breakdown (FOREHAND_FLAT, FOREHAND_LIFTED, FOREHAND_SLICED, BACKHAND variants, SMASH variants, VOLLEY, and a PASS non-stroke category) extracted from the Babolat app log. A Mi Fit watch was worn passively but the workout-recording mode was not engaged, leaving only a day-aggregate cardio summary (max HR 182 bpm, 67 active-zone minutes, 5,952 steps). The Zepp2 dataset supports several falsifiable diagnostic claims about how the match played out (sweet-spot rate 31.5%, forehand-side dominance at 64%, FLAT-dominant stroke composition at 42%) and refuses to support others (no opponent-side data, no point-outcome ground truth per shot). One claim flips on personal-baseline comparison: 31.5% sweet-spot rate sits well *above* the player's median across 37 sessions (19.0%), not below — the match was ranked 31st of 37 in ball-striking quality, in the top six. The loss therefore cannot be attributed to off-center contact. The paper's thesis is methodological in three parts: (1) at this fidelity, a single match is sufficient for *pattern description* and *self-coaching hypothesis generation*, but not for *causal attribution* of the loss without personal-baseline context; (2) the apparent 19% shot-count gap between Zepp2 (352) and Babolat (284) is mostly a 13/21-minute recording-window mismatch — true window-matched agreement is 84.3%, which is consistent with the broader Babolat-to-Zepp2 agreement distribution observed across the legacy portion of the corpus; direct timestamp-level pairing (reported in §7.3) successfully matches 100% of the 284 Babolat shots to Zepp2 swings within a 5-second window (median clock offset 1.4s), resolving the counting disagreement and confirming high SERVE-to-SERVE classification alignment (94.7%) alongside minor discrepancies in volley and groundstroke categorizations; and (3) the three-tier fidelity split was not driven by hardware capability but by a different gate at each tier — auto-detection (Zepp), recording-window discipline (Babolat), and **recording-intent** (Mi Fit, the manual "Start Workout" tap that was not performed). The recording-intent gate is the most fragile and the most likely source of data-tier degradation in field-deployed instrumentation.
---
Paper 1.27 (WHITE_PAPER_WEARABLE_SENSOR_CORPUS.md) established the landscape of consumer wearable sport sensors — what they record, at what fidelity, with what gaps and incompatibilities. The corpus paper's thesis is general: it characterizes the sensor population.
This paper inverts the scope. One match. One sensor pair. Maximum depth.
The match chosen is a Round-of-16 loss at a USTA tournament. The loss matters structurally — a loss is more analyzable than a win because it presents a falsifiable question ("why did this go this way?"). The tournament context matters because competitive play with real stakes produces different sensor signatures than casual practice. The dual-sensor instrumentation matters because Paper 1.27 named cross-sensor agreement as a methodological frontier; this paper tests that frontier on a real bout.
The paper's secondary contribution, which became its load-bearing contribution during drafting, is **infrastructure inventory**. Substantial analysis already exists for this match: a 1.6 MB self-contained Plotly dashboard with 72 pre-rendered figure slices, a registered tool in the TennisAgent cockpit that treats this match as the canonical dual-sensor example, and per-sensor wrangling pipelines maintained at the project root in Dash/LinkBab/ and Dash/LinkZepp/. The paper documents what was already learned rather than commissioning fresh analysis.
---
ZeppWrangle.csv row L_PLAY_SESSION_ID=55.tennis_unified.db references babolat_session_id=560 with 284 shots and session_score=80.0. The underlying per-shot rows are materialized in the local babpop.db database from app log files.---
The match ran 92.2 minutes from first to last Zepp-recorded shot. At 352 shots over 92.2 minutes, the per-minute shot tempo was **3.82 shots/min** — corresponding roughly to one stroke every 16 seconds when amortized across the full match (changeovers included). Across the 14 games played, this works out to ~25 shots/game.
Score progression — set 1 was 4-6, set 2 was 2-6 — shows the loss widened rather than tightened, consistent with the typical pattern where the trailing side either solves the puzzle or runs out of options. Without per-game shot-timing data extracted from the CSV, the paper cannot directly test whether shot tempo, sweet-spot rate, or stroke-type composition shifted between sets; this is a known gap that a more careful per-game re-analysis would close.
---
The Zepp swing-type classifier produced this distribution across the 352 shots:
| Swing type | Count | % of shots |
|---|---|---|
| FLAT | 148 | 42.0% |
| SERVE | 55 | 15.6% |
| TOPSPIN | 54 | 15.3% |
| SLICE | 48 | 13.6% |
| VOLLEY | 46 | 13.1% |
| SMASH | 1 | 0.3% |
By hand:
| Hand | Count | % |
|---|---|---|
| Forehand | 225 | 63.9% |
| Backhand | 127 | 36.1% |
Three findings stand out from this distribution:
---
The distinctive contribution of Zepp2 over a wrist-only sensor is per-shot impact location on the racquet face. The 352 shots have POSITION_X ∈ [2, 16] and POSITION_Y ∈ [1, 14], mean X = 9.7, mean Y = 8.8 — slightly off-center toward the upper-right quadrant of the nominal 16 × 14 grid. The Zepp "sweet spot" classifier identifies **111 of 352 shots (31.5%)** as sweet-spot hits.
A 31.5% sweet-spot rate is one of the most directly load-bearing single numbers this paper reports. In match-relevant terms, the player struck the racquet's optimal contact zone on slightly under 1 in 3 shots. The remaining 68.5% were spread across the upper-half, lower-half, left-edge, and right-edge zones in the Zepp 5-zone classification. Whether 31.5% is "high" or "low" depends on the personal-baseline comparison developed in Section 5.2.

The overlay shows the contact distribution is centered slightly off-axis but not catastrophically: sweet-spot contacts cluster near the geometric center while off-frame contacts populate the periphery. There is no obvious systematic bias toward any one edge of the racquet face. The forehand/backhand split panel shows the two stroke sides occupy similar XY regions — neither side is concentrated in a different part of the racquet face — suggesting consistent contact mechanics across the match's stroke types.
To resolve the visual overlap of overlapping scatter points, a static density heatmap of all 352 shots is plotted on the 16×14 grid below (Figure 3):

The density heatmap confirms a tight concentration of hits at the central region, particularly around $X \in [8, 11]$ and $Y \in [7, 10]$, with density tailing off toward the edges.
This density pattern is further broken down by stroke type (Forehand ground/volley, Backhand ground/volley, and Serve) in Figure 4:

The stroke-type breakdown shows:
Per-stroke XY heatmaps, hit-frame-classified scatter plots, and zone-count breakdowns across the 3 (frame) × 4 (stroke type) × 3 (metric) cube are pre-rendered in domains/SensorAgents/TennisAgent/data/reports/zepp2_impact_dashboard_2023-05-14.html — 72 distinct figure slices. Readers wanting the interactive per-cell view should consult that dashboard directly.
A raw sweet-spot rate is uninterpretable without a baseline. The user's full Zepp recording history in Sensors/2Zepp/ZeppWrangle/ZeppWrangle.csv spans 47 sessions and 8,610 shots; filtered to sessions with at least 50 shots for statistical meaningfulness, the personal-baseline corpus is **37 sessions**.
| Personal-baseline statistic | Value |
|---|---|
| Mean sweet-spot rate across 37 sessions | 18.2% |
| Median | 19.0% |
| Standard deviation | 13.0% |
| Range | 0.0% – 48.3% |
| **Match pool (session 55) rate** | **31.5%** |
| **Match pool rank** | **31st of 37 (top 6)** |

This is the finding that reverses an obvious intuition. **A 31.5% sweet-spot rate looks modest in absolute terms** — fewer than one shot in three hit the optimal contact zone — but relative to the player's own typical performance across 37 recorded sessions, this match was a **top-six ball-striking performance** at +12.5 percentage points above personal median. The match was not lost on contact quality. Whatever cost the player the loss is not in the ball-striking column of the dataset.
This finding is itself a methodological lesson: **single-match numbers without personal-baseline context routinely lead to wrong diagnostic conclusions**. The naïve reading of 31.5% ("substantially below clean striking") is the reading the data refutes. The data-pool view from a single match is informative only when the dataset has enough prior bouts to establish the personal baseline that contextualizes it.
| Metric | Mean | Median | Max |
|---|---|---|---|
| Ball speed (mph) | 86.7 | 85.2 | 136.7 |
| Spin (RPM) | 1616 | — | 3891 |
The 136.7 mph peak suggests a hard serve; the 86.7 mph average across all shot types is consistent with a competitive baseline rally pace. The 3891 RPM peak spin is in the range of moderate-to-heavy topspin.
---
The Babolat POP wrist-and-racquet sensor was active during the match window, but with a narrower recording window than Zepp2: **09:08:02 → 10:06:08 (58 minutes)**, starting ~13 minutes after Zepp2 began capturing and stopping ~21 minutes before Zepp2 stopped. The Babolat session metadata in the raw log file (log_18.04.26_221401.txt) reports effectiveTime = 14.3 min, where "effective time" is Babolat's internal measure of active play excluding pauses.
The per-shot data was extracted from the raw log file (log_18.04.26_221401.txt) and materialized into the local babpop.db database. In addition, the log file contains a session-summary record with a detailed per-stroke-type breakdown. The session-summary breakdown is:
| Stroke type | Count |
|---|---|
| FOREHAND_FLAT | 72 |
| FOREHAND_LIFTED | 33 |
| FOREHAND_SLICED | 7 |
| BACKHAND_FLAT | 38 |
| BACKHAND_LIFTED | 27 |
| BACKHAND_SLICED | 18 |
| SMASH_NO_EFFECT | 10 |
| SMASH_WITH_EFFECT | 47 |
| VOLLEY_FOREHAND | 8 |
| VOLLEY_BACKHAND | 5 |
| PASS (non-stroke racquet motion) | 19 |
| **Total** | **284** |
However, the per-shot database rows (motions table in babpop.db) use simpler category names and group strokes differently. The 284 per-shot rows show:
Comparing these two internal records reveals how the Babolat system processes and maps stroke categories:
SERVE match the session-summary total of 57 SMASH shots (SMASH_NO_EFFECT = 10 and SMASH_WITH_EFFECT = 47). This confirms that Babolat's internal per-shot classifier detects and labels these physical events as serves, but the app's summary interface maps them under the "SMASH" category.BACKHAND and 13 VOLLEY per-shot rows match their respective session-summary totals exactly (BACKHAND_FLAT + LIFTED + SLICED = 83; VOLLEY_FOREHAND + BACKHAND = 13). However, the 131 FOREHAND per-shot rows represent the 112 summary forehands (FOREHAND_FLAT + LIFTED + SLICED = 112) plus the 19 PASS (non-stroke) shots. This indicates that the per-shot logs fold non-stroke PASS motions into the generic FOREHAND category.The Babolat session composite "PIQ score" is **7,444** (the metadata field; the value in tennis_unified.db was 80.0, which is a scaled-down derived value). Session type was classified as "UNQUALIFIED" by the Babolat app's session-type heuristic.
---
The match window was passively monitored by a Mi Fit / Zepp Life watch (Huami band, package com.xiaomi.hm.health on a rooted Android device). The day-aggregate record in tennis_unified.db for 2023-05-14 shows:
| Metric | Value |
|---|---|
| Resting HR | 84 bpm |
| Max HR (entire day) | 182 bpm |
| HR-zone minutes (low / medium / high) | 6 / 42 / 19 (67 active) |
| Total steps | 5,952 |
| Distance | 4,017 m |
| Calories | 674 |
Max HR of 182 bpm against a theoretical age-predicted max of ~174 bpm (220 − 46 = 174) confirms intense exertion during the day — though the 67 active-zone minutes are upper-bounded by the day's other activity (an earlier match, warm-up, post-match walking). Per-minute attribution to the match window is not derivable from the day-aggregate alone.
**Per-minute HR is not available for this match.** The Mi Fit workout-recording feature requires the user to tap "Start Workout" on the watch face at session start; without that action, the watch captures only background sampling and a day-aggregate summary. The 2023-05-14 match was not recorded as a workout session, and the per-minute trace was never persisted to local storage. A search across the rooted Avant's full filesystem (78 originDetail JSON files), a separate 40-file zip archive in Downloads, and the on-device Mi Fit user database confirmed no per-minute record for the match window survives.
The session-history pattern is itself a methodological data point. Across **68 distinct Mi Fit workout sessions** spanning 2021-09 through 2023-10, the recording behavior splits into two regimes:
| Period | Workout sessions captured |
|---|---|
| 2021 (Sep–Dec, 4 months) | 59 |
| 2022 (full year) | 1 |
| 2023 (Jan–Oct, 10 months) | 8 |
The 2021 "Sep–Dec burst" represents an initial period of high enthusiasm for tapping the manual workout-start gesture. Behavior changed in 2022; the 2023-05-14 match fell squarely in the lower-frequency regime where the tap-to-record action was rarely performed. The data tier this leaves is therefore not a function of the hardware's capability (the watch could have recorded per-minute HR) but of the user's recording-intent at the moment.
**The recording-intent gate.** This is the third fidelity tier the paper has now encountered, distinct from the others:
| Sensor | Data tier | Gate that determined the tier |
|---|---|---|
| Zepp2 (§3–§5) | Per-shot, dense (352 shots, full XY) | Auto-detection on racquet impact — no user action required |
| Babolat POP (§6) | Per-stroke-type, narrower window (284 shots) | Recording window was 13 min shorter at the start and 21 min shorter at the end than Zepp2's; per-shot data lives in app log files |
| Mi Fit (§6.5) | Per-day aggregate only | Requires manual "Start Workout" tap; was not performed for this match |
The methodological lesson: instrumentation richness is jointly determined by *device capability*, *downstream data retention*, and *recording-intent at capture time*. Each of the three sensors in this match was capable of higher fidelity than survives in the operator's dataset, but for different reasons. Zepp's auto-detection makes it the most reliable contributor to the data pool because it requires no in-the-moment action. Mi Fit's tap-to-record gate is the most fragile because human attention at the moment of session start is a recurring failure point — even when the watch is worn, the data isn't captured at the resolution the hardware could deliver.
For future instrumentation: the corpus's most reliable per-minute-resolution layer would be sensors with **auto-start workout detection** (some newer Garmin, Apple Watch, and Whoop devices do this). For Mi Fit specifically, the absence of per-match cardio detail in this paper is a recording-discipline gap, not a hardware gap.
---
The raw shot counts (Zepp2 = 352, Babolat = 284) appear to show a 19% sensor disagreement, but **most of that gap is the recording window, not the sensors**:
| Comparison | Zepp2 | Babolat | Gap |
|---|---|---|---|
| Full match windows | 352 | 284 | -19.3% (misleading) |
| **Window-matched (09:08–10:06)** | **337** | **284** | **-15.7%** |
| Window-matched, Babolat excl. PASS category | 337 | 265 | -21.4% |
Babolat started 13 minutes after Zepp2 (missing pre-game / early-warm-up Zepp2 swings) and stopped 21 minutes before Zepp2 (missing late swings). The honest window-matched cross-sensor gap is **15.7%** of Zepp2's count, or **84.3% agreement on shot count.**
When the per-stroke-type breakdowns are aligned, the picture is more nuanced than a single shot-count gap:
| Stroke category | Zepp2 | Babolat (Summary) | Babolat (Per-Shot) | Notes |
|---|---|---|---|---|
| Serves | 55 (SERVE) | 57 (SMASH_* total) | 57 (SERVE) | Direct mapping confirmed by timestamp pairing |
| Forehand groundstrokes | 225 (FH total) | 112 (FOREHAND_*) | 131 (FOREHAND) | Zepp2 FH includes net play; Babolat per-shot folds PASS |
| Backhand groundstrokes | 127 (BH total) | 83 (BACKHAND_*) | 83 (BACKHAND) | Class alignment matches exactly |
| Volleys | 46 (VOLLEY) | 13 (VOLLEY_FH+BH) | 13 (VOLLEY) | Significant classifier threshold differences |
| Other / non-stroke | 1 (SMASH) | 19 (PASS) | (folded into FH) | Babolat summary has explicit non-stroke category |
Two distinct issues are visible in the breakdown:
SERVE. Direct timestamp pairing (see §7.3) confirms that 54 of the 57 Babolat serves map directly to Zepp2 serves, showing that serve detection is highly consistent between the two sensors.PASS swings), a portion represents genuine counting disagreements (swings recorded by one sensor but ignored by the other's accelerometer thresholds).The cleanest defensible statement is: the two sensors produce overlapping but non-identical accounts of the same match, and both count and classification differences contribute to the gap. A definitive partition between "count disagreement" and "classification disagreement" is reported in §7.3 using direct timestamp pairing.
This match was instrumented using the older **Zepp2** sensor rather than the newer **Zepp Universal** sensor. Cross-sensor agreement distributions across the entire corpus are characterized separately (e.g., in memory or a forthcoming corpus paper).
With per-shot datasets available for both sensors in the matched window (09:08:02 → 10:06:08), a direct chronological pairing was conducted. Using a maximum alignment window of $\pm 5.0$ seconds, every single one of the **284 Babolat motions** was successfully paired with a corresponding Zepp2 swing (**100% pairing rate**).
This pairing reveals the following details:
FLAT, 1 SMASH, and 1 VOLLEY. - **Groundstrokes:** Of the 131 Babolat forehands, 124 (94.7%) paired with Zepp2 forehands, and 7 paired with Zepp2 backhands. Of the 83 Babolat backhands, 70 (84.3%) paired with Zepp2 backhands, 13 paired with Zepp2 forehands (reflecting rare hand-type classifier confusion). - **Volleys:** 9 of the 13 Babolat volleys (69.2%) paired with Zepp2 volleys, while 4 paired with Zepp2 groundstrokes (3 FLAT, 1 SLICE).---
This section is the falsifiable diagnostic claim and is therefore the part of the paper most at risk of over-interpretation. Bounded conclusions, evidence first.
**The data can support**:
**The data cannot support**:
**What a player would actually do with this**: redirect the follow-up question from "why was ball-striking bad" (it wasn't) to "what was bad if ball-striking wasn't?" Candidate next-investigation paths the data points toward but cannot answer: (a) shot selection and tactical pattern analysis — were the well-struck shots being directed to wrong court locations? (b) point construction — were the high-quality strikes being wasted on neutral exchanges while the critical points went to lower-quality shots? (c) per-set degradation — does the second-set 2-6 reflect a drop in ball-striking quality late in the match, or were the sweet-spot rates roughly constant across sets? Per-game CSV timestamp extraction would resolve (c) without new data capture.
**The methodological lesson worth carrying forward**: a single match's sensor data, viewed in isolation, can support an obvious-seeming but wrong diagnostic. The same 31.5% sweet-spot rate that reads as "below clean striking" without baseline reads as "top-six performance" with baseline. The data pool from one match is *information-dense* but *interpretation-fragile* without the context of prior matches.
---
---
At single-match granularity, dual-sensor instrumentation (supplemented by passive daily tracking) of a competitive tennis bout produces a data pool that supports pattern description (sweet-spot rate, stroke composition, racquet-face contact distribution) but refuses causal attribution (why the loss happened, what the opponent did, whether fatigue mattered) — and even pattern description requires personal-baseline context to be interpreted correctly. The three most empirically informative findings of this paper are methodological:
The match's load-bearing analytic artifact is not this paper. It is the 1.6 MB self-contained Plotly dashboard at data/reports/zepp2_impact_dashboard_2023-05-14.html, which contains 72 figure slices organized as a stroke × frame × metric cube. This paper is the narrative companion that names what's in the dashboard, embeds the two most diagnostic overlays (§5.1), and bounds what the dataset can support.
If the project produces additional single-match deep dives in the future, the methodological lessons from this one would let the next paper carry a sharper claim:
---
domains/SensorAgents/TennisAgent/data/reports/zepp2_impact_dashboard_2023-05-14.html — 72-figure Plotly dashboard, canonical artifactdomains/SensorAgents/TennisAgent/data/unified/tennis_unified.db — matches table row id=261 for the match metadatadomains/SensorAgents/TennisAgent/data/unified/match_history.csv — source CSV for the matches table/home/blueaz/Python/Sensors/2Zepp/ZeppWrangle/ZeppWrangle.csv — Zepp per-shot data; filter L_PLAY_SESSION_ID = 55 for the 352 match shots; aggregate across sessions with ≥50 shots (n=37) for the personal baseline in §5.2domains/SensorAgents/TennisAgent/data/papers/match_pool_2023/fig01_xy_sweet_spot.png — embedded figure for §5.1domains/SensorAgents/TennisAgent/data/papers/match_pool_2023/fig02_personal_baseline.png — embedded figure for §5.2domains/SensorAgents/TennisAgent/data/papers/match_pool_2023/fig03_density_heatmap.png — embedded figure for §5.1 (all shots density)domains/SensorAgents/TennisAgent/data/papers/match_pool_2023/fig04_stroke_heatmaps.png — embedded figure for §5.1 (density by stroke type)/home/blueaz/Python/Dash/LinkBab/, /home/blueaz/Python/Dash/LinkZepp/ — Streamlit dashboards for cross-linked sensor viewsdomains/SensorAgents/TennisAgent/cockpit_poc/agent/core/tool_registry.py:955 — the registered dual-sensor match-view tool that treats this match as canonical exampledomains/SensorAgents/TennisAgent/cockpit_poc/agent/core/plan_builder.py:3499 — _plan_match_piq_by_opponent planner that routes opponent-name queries through this toolA reproducing analyst should be able to start at the unified DB matches table, find row id=261, follow the zepp_session_id link into ZeppWrangle.csv filtered to session 55, and reproduce all numbers in Sections 3–5 directly from that filter.
---
---
**Anonymization protocol applied during authoring**:
**Voice-signature flags applied during authoring** (per AGENT_AUDIT_PROTOCOL.md):
The thesis is calibrated. The data picture is real but bounded. The insights are descriptive, not predictive. Use accordingly.