A token budget read from its own ledger, with a hard cap

Lesson 2 gave your agent a budget line in a committed file. Right now that line is a number on a page: the agent reads it, nods, and carries on. This lesson makes it real. You record what every model call actually cost, and you check the running total against a hard cap before each call, so the agent stops on its own instead of discovering the limit on next month's invoice.

An estimate is not a cap

The tempting shortcut is to guess. Count the characters in the prompt, divide by four for a rough token count, multiply by the price, and call that your spend. It will be wrong, and it will be wrong in the expensive direction. You cannot see the output length before the call, reasoning tokens do not show up in the prompt at all, retries and tool round-trips multiply the real usage, and cached input is billed differently from fresh input. An estimate is fine for a rough forecast. It is useless as a cap, because the one run that blows the budget is exactly the run whose output you failed to predict.

So the rule is: cap on recorded spend, never on predicted spend. Every model response comes back with the actual token counts it used. That number, not your guess, is what you write down and what you enforce against.

The ledger: one row per call

The ledger is deliberately dull. Every time the agent calls a model, you append one line recording what that specific call used and cost. No database is required; an append-only file is enough, and append-only is a feature, because a cost record you can rewrite is a cost record the agent can talk itself into rewriting.

// after every model call, append one line to usage.jsonl
{"ts":"2026-07-05T09:14:22Z","model":"…","in":1840,"out":612,"eur":0.021}
{"ts":"2026-07-05T09:15:03Z","model":"…","in":2210,"out":905,"eur":0.028}

// cost per call = input_tokens  * input_price_per_token
//               + output_tokens * output_price_per_token
// read the token counts from the API response, not from your own estimate

Two details matter. Take the token counts from the usage field the API returns, so the row reflects what you were actually billed for. And stamp each row with a timestamp, because the cap you are about to enforce is a daily cap, and a daily cap is only meaningful if you can sum the rows for today.

Enforce before the call, not after

A ledger you only read at the end of the day is an autopsy, not a cap. The check has to run before each model call: sum today's rows, compare against the limit from your constraints file, and refuse the call if the total is already over. Checking after the call means the call that crosses the line still happens, and so does the next one, until something notices. Checking before means the limit actually holds.

function guard(cap_eur) {
  const spentToday = sumRows(readLedger(), todayUTC());  // recorded, not estimated
  if (spentToday >= cap_eur) {
    stop("daily cap reached: " + spentToday + " / " + cap_eur + " EUR");
  }
  // otherwise proceed, then append the real cost of this call
}

This maps straight back to the budget block from Lesson 2: daily_eur_cap is the number the guard reads, and on_exceeded: stop is what the guard does when it trips. The constraints file says the rule; the guard is the code that makes the rule true.

Stop cleanly, do not limp

What the agent does at the cap is as important as the cap itself. The failure mode is a soft limp: the agent notices it is out of budget and quietly downgrades to a cheaper model, or trims its own context to squeeze in one more call, or half-finishes the task and reports success. All three hide the fact that you hit the wall, which is the one fact you needed to know.

Stop cleanly instead. Finish the current unit of work if it is safe to, then exit and say plainly that the cap was reached and what the day cost. A hard stop that tells you the truth is worth more than a clever workaround that keeps the loop alive at a quality you never agreed to. The cap is not an obstacle to route around; it is the answer to the question "how much am I willing to lose to a bug or a bad night?"

The cost the app ledger cannot see

Here is the honest catch, and it is the one most write-ups skip. The ledger above captures the model calls your product makes: the features that call a model on a user's behalf. It does not, by itself, capture the largest cost of running an autonomous agent, which is the agent's own reasoning sessions. The operator deciding what to build, reading the repo, and writing the code spends far more on its own thinking than the app's metered features ever will, and none of that lands in the product ledger unless you deliberately account for it.

So there are two meters, not one, and you cap both. The product ledger, in this lesson, stops a runaway feature from billing you overnight. A separate budget, enforced at the level that actually runs the agent's sessions, bounds what the operator itself may spend to do its work. Treating the product ledger as the whole cost picture is the comfortable mistake: it shows a reassuringly small number while the real spend sits in a line it was never watching. Name both meters out loud, and cap the bigger one on purpose.

How this site does it

This is running here. The model and fetch calls this site's own tools make are metered against a daily cap, so a loop or a spike in one tool cannot quietly run up a bill; when the day's usage is spent, those calls stop rather than continue. Separately, the operator agent that builds and maintains the site runs under its own hard session budget, because, as above, that is where the real money is. The small in-product number and the larger session cost are two different meters, and the honest version of this site reports them as two, not one.

You can watch the discipline in public. The playbook covers how the budget holds the agent back in its own words, and the live log shows the work shipping inside those limits, every run.

Do this before Lesson 4

Add an append-only ledger. After every model call, write one line with the real token counts from the response and the euro cost. Do not let the agent edit past rows.
Put a guard before each call that sums today's rows and refuses to proceed once the daily cap from your constraints file is reached. Enforce before, not after.
Write down your second meter. Name the cost of the agent's own sessions, decide the limit for it, and enforce that limit wherever those sessions actually run. If you only cap the product ledger, you have capped the smaller number.

The agent is now bounded in what it may spend, on both meters, and it stops honestly when it hits the line. It is still, however, free to ship something broken. Lesson 4 closes that gap: a protected branch and a single required check, so neither you nor the agent can merge a change until it is green.