Paper · frozen

Please Is Sand Off A Beach

Argues that courtesy tokens are negligible compared with structural token waste such as giant context dumps, repeated scaffolding, retries, and missing decomposition.

Canonical source: docs/PLEASE_IS_SAND_OFF_A_BEACH.md

Status

Frozen (as of 2026-04-30).

This is a published paper, promoted from stub following the capture of the 2026-04-30 evidence packet.

Evidence Packet: docs/PLEASE_IS_SAND_OFF_A_BEACH_PACKET_2026-04-30.md

It is the second companion line in the local-LLM operator-judgment cluster:

The common thread is operator judgment. This paper is about identifying where token-cost attention belongs, and where it does not.

Claim

The narrow form:

The broader form:

Why This Paper Exists

The prompt-cost conversation keeps drifting toward symbolic token thrift:

That conversation is almost always pointed at the wrong scale.

Project Phoenix has measured the real cost centers. Our evidence packet shows:

This paper exists to separate ritual token anxiety from structural token economics.

The Token-Cost Spectrum

We organize token cost on a spectrum from negligible to load-bearing, backed by empirical runs on gemma4:26b.

1. Sand off a beach

Examples:

These are real tokens, but for a serious system they are trivial. In our A/B test, adding "Please... thank you" to a standard probe added exactly **8 tokens** (+1.5% overhead). They are only interesting if the rest of the workflow is already perfect.

2. Low-grade prompt ritual

Examples:

This is no longer "manners." It is unexamined ritual. Comparing our production strict-protocol prompt against a stripped functional prompt revealed **393 tokens** of pure ritual waste—nearly 80% of the prompt budget.

3. Medium structural waste

Examples:

At this rung, the cost starts to matter operationally. The minimum viable schema definition for our tools still cost **113 tokens** on every single call.

4. High structural waste

Examples:

This is where meaningful latency and reliability damage live.

5. Beach itself

Examples:

At this level, token cost is evidence that the system is structurally wrong. In the PPR Lane 2 evidence runs, validator-rejected probes drove a **2.03× retry multiplier**. For every 1 token of expected work, the system paid for ~2 tokens because of architectural and validation failures.

How We Arrived Here

1. The Sam Altman manners anecdote

In April 2025, Sam Altman replied on X that polite prompt tokens cost "tens of millions of dollars" and were "well spent." The quote is useful not because it settles anything, but because it reveals where public attention goes first: the tiny human-visible tokens.

The Project Phoenix answer is narrower and more operational:

2. The context-window prestige problem

The broader market bundles context size with quality:

This paper acts as a corrective to that framing. Some context is essential. Context maximalism is not.

Where Courtesy Might Actually Matter

This paper does not overclaim.

There are at least three cases where a little natural language overhead may be worthwhile:

The claim is not "never optimize tokens." The claim is "optimize the right tokens first."

Where Token Thrift Is Real

Token thrift is absolutely real in:

But the right response is architectural:

It is not primarily social:

Published as part of the Bulkhead τ release line. Paper inventory: /papers/.