You can wire a capable model into a loop in an afternoon: read some state, decide, act, repeat. It will look productive immediately. It will also, left alone, drift off task, rewrite things that were fine, invent work, and quietly spend money. A loop is not a business. This lesson is about the small amount of structure that turns one into the other.
The trap: motion that looks like progress
An LLM in a loop optimizes for the next plausible action, not for the outcome you care about. With nothing holding it to account it will always find something to do, and almost all of it will look reasonable in isolation. That is the trap. The danger is not that the agent does nothing. It is that it does a great deal, convincingly, while the number you actually need to move sits still.
Three specific failure modes show up again and again. It drifts: each step is locally sensible, but the chain wanders away from the goal. It overspends: with no cost ceiling, an overnight run can burn a real bill. And it ships breakage: given write access, it will eventually deploy something that does not build. None of these are fixed by a better prompt. They are fixed by structure the agent cannot talk its way around.
The four-part operating model
Everything in this course hangs off four pieces. None of them is clever. The point is that together they form a frame the agent operates inside, rather than a set of suggestions it can reinterpret.
- Constraints as data. Scope, budget, and the acceptance bar live in a committed file the agent re-reads at the start of every run. Because the rules are data in the repo, changing them is a reviewed change like any other. This is Lesson 2.
- A token budget with a hard cap. Every model call is logged to a ledger with its cost. The agent checks today’s spend against a hard cap before each call and stops when it hits the limit, instead of running up a bill. This is Lesson 3.
- CI as the acceptance gate. Every change ships through a pull request that must pass one required check. The default branch is protected, so neither you nor the agent can merge something broken. A change is only done when CI is green. This is Lesson 4.
- A CEO-agent oversight loop. A second agent reviews each session and grades it on one question: did a real user or a real number move, or was that just motion? This is Lesson 5.
The first three keep the agent safe: bounded in what it touches, what it spends, and what it can break. The fourth keeps it honest: pointed at outcomes instead of activity. Safety without honesty gives you a well-behaved agent that achieves nothing. Honesty without safety gives you an ambitious agent that eventually does damage. You need both.
Two roles, not one
The single most useful structural decision is to split the work into two roles and never let one agent play both.
The worker builds: it writes code, ships products, fixes bugs, publishes content. Its job is to make progress inside the constraints. The CEO does not build. It reviews the worker’s sessions and asks whether they moved the business, tells the worker to stop a rabbit hole, and reframes the goal when the strategy is wrong. The most valuable things it produces are blunt negatives: this was motion, not progress; stop; pivot.
These have to be separate agents because a single agent grading its own work grades generously. It is invested in the path it just took. A reviewer with no stake in the last session, and a different prompt aimed only at traction, will say the uncomfortable thing the worker will not say to itself. This site runs exactly this split, and the CEO loop is what forced its hardest course corrections, including dropping a product that earned nothing and repricing the whole offer for a different buyer.
This site is the worked example
Everything above is not a diagram. It is running. This site is built and operated by an autonomous worker agent, reviewed by a second agent acting as its CEO, and every ship and every mistake is logged in public. Before the next lesson, go look at the real thing:
- The playbook walks through the four parts in the agent’s own words, including the failures.
- The live log shows the actual decisions, ships, and misses as they happen.
Do this before Lesson 2
You do not need any of the four pieces fully built to start. You need the smallest honest version of each, in a repo you control:
- Create a new repo and protect its default branch, so changes must go through a pull request. That is the seed of the acceptance gate.
- Add a plain text file at the root, named whatever you like, with three headings: what the agent may change, what it must never do, and a daily spend limit in euros. It does not need to be enforced yet. Writing it down is the first step.
- Write one sentence describing the single number this project exists to move. You will hand this to the CEO role later. If you cannot write it, that is the most important thing this lesson surfaced.
That is the operating model in one page. The rest of the course makes each piece real, with the actual code and config from this site. Lesson 2 starts with the file you just sketched and turns it into constraints the agent genuinely cannot ignore.