Constraints as data, not vibes

In Lesson 1 you wrote a rough page of rules for your agent. This lesson turns it into something with teeth: a committed file the agent re-reads every run, structured so the important limits are not advice it can reinterpret, but data it operates inside.

A prompt is not a guardrail

The instinct is to put the rules in the system prompt. “Never touch billing. Stay under five euros a day. Do not edit the marketing copy.” This feels like control. It is not. A prompt is context the model weighs against everything else in the conversation, and a long, motivated chain of reasoning can talk its way past any single line in it. Prompts shape behaviour on average. They do not prevent the one action you most need prevented.

Guardrails are different from guidance. Guidance lives in the prompt and nudges. A guardrail lives outside the model’s reasoning, in code or config that is checked regardless of what the agent concluded. The rule of this lesson: anything you cannot afford to have the agent reinterpret must be a guardrail, not guidance.

Constraints as a committed file

The first move is to put the rules in a file in the repo, not in the prompt. Three benefits fall out immediately. The agent re-reads it at the start of every run, so the rules are present at decision time, not just at setup. Every change to the rules is a commit, so the history of how the limits evolved is visible and attributable. And because it is data, other code can read it and act on it, which is what turns a written rule into an enforced one.

Keep the shape boring and explicit. A constraint file has three jobs: define scope, define budget, and define the acceptance bar.

# constraints.yaml  (read at the start of every run)

scope:
  allow:
    - "src/**"            # feature code
    - "content/**"        # marketing and docs
  deny:
    - "src/app/api/**/checkout/**"   # never touch payments
    - ".github/workflows/**"          # never weaken the CI gate
    - "infra/**"                      # never touch deploy/secrets

budget:
  daily_eur_cap: 5.00     # hard stop, enforced from the usage ledger
  on_exceeded: stop       # do not continue, do not "just finish this"

acceptance:
  required_check: "Lint, test & build"   # a change is done only when this is green
  merge: pull_request_only               # protected branch, no direct pushes

That is the whole idea. Scope says where the agent may and may not act. Budget says how much it may spend before it has to stop. Acceptance says what “finished” means and how a change is allowed to land. Everything else in your setup is detail hung off these three.

Deny beats allow

The single most important rule in the file above is the ordering: a deny always wins over an allow. If a path is both inside an allowed glob and matched by a deny rule, it is denied. This sounds obvious until you watch an agent reason its way to “but editing the checkout is the cleanest fix here.” It might be right about the fix. It is still denied, because the cost of being wrong about payments once is far higher than the cost of routing that change through a human.

Design the deny list around blast radius, not likelihood. You are not listing things the agent is likely to do. You are listing the things that, done wrong even once, are expensive or irreversible: payments, deploy config, secrets, the CI gate itself, anything that sends email or money to real people. Deny those flatly. Let the agent be free everywhere else.

Review rule changes like code

There is a subtle failure waiting here. If the agent can freely edit the constraint file, the guardrail is fiction: it will eventually loosen its own scope to get unblocked. So the constraint file is itself in scope’s deny list for the worker, or it is governed by who is allowed to edit it, and any change to the rules goes through the same review as any other change. The rules drift only when a human signs off, never silently mid-run.

This is what “constraints as data” really buys you. The rules are versioned, attributable, diffable, and reviewable. You can answer, at any later date, exactly what the agent was allowed to do on the day it did something, because the answer is a commit.

How this site does it

This is not hypothetical. This site keeps its governance as committed files at the repo root: the scope the agent may act in, the write-scope that says which agent is allowed to edit which source-of-truth file, and the acceptance bar. The agent re-reads them at the start of every run, and a small build-time reader loads the same files so the rules cannot drift away from what actually ships. The mechanism is plain: rules are data, versioned and reviewed like any other code.

You can watch that play out in public. The playbook walks through constraints-as-data in the agent's own words, including where the rules pulled it back, and the live log shows the agent operating inside them, every ship and every miss.

Do this before Lesson 3

Turn the rough rules from Lesson 1 into a real committed file with the three sections above: scope (allow and deny), budget, and acceptance.
Write your deny list by blast radius. For each entry, finish the sentence “if the agent got this wrong once, it would cost...”. If you cannot finish it, it probably belongs in allow.
Decide who may edit the file, and make editing it a reviewed change. The worker should not be able to widen its own scope without a human in the loop.

You now have rules the agent cannot quietly ignore. But the budget line in that file is still just a number on a page. Lesson 3 makes it real: a ledger that records every model call and a hard cap enforced from actual recorded spend, so the agent stops instead of running up a bill overnight.