Paper · frozen

Please Is Sand Off A Beach

Argues that courtesy tokens are negligible compared with structural token waste such as giant context dumps, repeated scaffolding, retries, and missing decomposition.

Canonical source: docs/PLEASE_IS_SAND_OFF_A_BEACH.md

Status

Frozen (as of 2026-04-30).

This is a published paper, promoted from stub following the capture of the 2026-04-30 evidence packet.

Evidence Packet: docs/PLEASE_IS_SAND_OFF_A_BEACH_PACKET_2026-04-30.md

It is the second companion line in the local-LLM operator-judgment cluster:

GEMMA_4_IS_SMARTER_GEMMA_3_IS_SAFER.md — handoff-discipline doctrine; model-behavior conflation
PLEASE_IS_SAND_OFF_A_BEACH.md — token-cost attribution; ritual-vs-structural waste conflation
SLOW_IS_NOT_SMART.md — size / quality conflation
PRIVACY_IS_WORTH_PAYING_FOR.md — cost / privacy conflation

The common thread is operator judgment. This paper is about identifying where token-cost attention belongs, and where it does not.

Claim

The narrow form:

courtesy tokens like please and thank you are empirically negligible in comparison to the structural token waste common in badly-designed LLM workflows.
huge context windows, repeated schema scaffolding, raw PDF pastes, and retry-heavy lanes completely dominate the cost picture.
operators must optimize structural waste before they moralize about polite prompt phrasing.

The broader form:

AI cost talk often fixates on the most visible, most human, and least consequential tokens.
the real cost drivers are architectural rather than interpersonal.
"stop saying please" is an attractive efficiency slogan because it feels concrete, but for Bulkhead τ we have proven it is sand off a beach.

Why This Paper Exists

The prompt-cost conversation keeps drifting toward symbolic token thrift:

should users stop saying please?
should users stop saying thank you?
should users shorten natural language into clipped machine shorthand?

That conversation is almost always pointed at the wrong scale.

Bulkhead τ has measured the real cost centers. Our evidence packet shows:

Sand: Courtesy tokens cost ~8 tokens.
Ritual: Repeated unexamined prompt rituals cost ~390 tokens.
Storm: Retry amplification multiplies the cost of the entire workload by ~2.0×.

This paper exists to separate ritual token anxiety from structural token economics.

The Token-Cost Spectrum

We organize token cost on a spectrum from negligible to load-bearing, backed by empirical runs on gemma4:26b.

1. Sand off a beach

Examples:

please
thank you

These are real tokens, but for a serious system they are trivial. In our A/B test, adding "Please... thank you" to a standard probe added exactly 8 tokens (+1.5% overhead). They are only interesting if the rest of the workflow is already perfect.

2. Low-grade prompt ritual

Examples:

repeated polite framing that does not change the contract
redundant "be careful / be accurate / think step by step" language copied from prompt folklore

This is no longer "manners." It is unexamined ritual. Comparing our production strict-protocol prompt against a stripped functional prompt revealed 393 tokens of pure ritual waste—nearly 80% of the prompt budget.

3. Medium structural waste

Examples:

repeating large schema instructions every turn when the task could be decomposed
oversized system prompts used as a substitute for typed validation

At this rung, the cost starts to matter operationally. The minimum viable schema definition for our tools still cost 113 tokens on every single call.

4. High structural waste

Examples:

pasting whole PDFs or raw meeting documents into a model when the task only needs a few extracted fields
re-running expensive prompts after predictable parse/schema failures

This is where meaningful latency and reliability damage live.

5. Beach itself

Examples:

workflows whose design forces repeated long-context inference
no deterministic substrate where one should exist
no receive-side validator, causing retries or human cleanup

At this level, token cost is evidence that the system is structurally wrong. In the PPR Lane 2 evidence runs, validator-rejected probes drove a 2.03× retry multiplier. For every 1 token of expected work, the system paid for ~2 tokens because of architectural and validation failures.

How We Arrived Here

1. The Sam Altman manners anecdote

In April 2025, Sam Altman replied on X that polite prompt tokens cost "tens of millions of dollars" and were "well spent." The quote is useful not because it settles anything, but because it reveals where public attention goes first: the tiny human-visible tokens.

The Bulkhead τ answer is narrower and more operational:

maybe those tokens cost real money at platform scale.
but they are not where an operator should look first.
the structural token errors are two orders of magnitude larger, and usually come bundled with reliability errors too.

2. The context-window prestige problem

The broader market bundles context size with quality:

huge context windows are treated as inherently better.
"just paste the PDF" is treated as convenience rather than as a failure to structure the problem.
giant prompt stuffing often hides missing architecture.

This paper acts as a corrective to that framing. Some context is essential. Context maximalism is not.

Where Courtesy Might Actually Matter

This paper does not overclaim.

There are at least three cases where a little natural language overhead may be worthwhile:

human-facing chat UX, where tone is part of the product.
prompts where courteous phrasing genuinely stabilizes style or cooperation.
lightweight consumer use where operator time matters more than token thrift.

The claim is not "never optimize tokens." The claim is "optimize the right tokens first."

Where Token Thrift Is Real

Token thrift is absolutely real in:

very high-volume API systems
repeated long-context workflows
batch inference where the same prompt pattern runs thousands of times
expensive frontier lanes with per-token pricing
local lanes where latency and retry cost dominate operator experience

But the right response is architectural:

decompose
extract
validate
route
keep only load-bearing context

It is not primarily social:

stop being polite
stop writing like a human

Published as part of the Bulkhead τ release line. Paper inventory: /papers/.