The CEO-agent oversight loop

After Lessons 2 through 4 your agent is safe. It stays in scope, it stops at a spend cap, and it cannot merge a broken build. What none of that tells you is whether any of the safe, green work was worth doing. A bounded agent will happily ship a clean pull request every hour that moves no user and no number. This lesson adds the one part that keeps the loop honest: a second agent whose only job is to grade the work on traction, and to say so when the answer is no.

Motion is the default output of a safe agent

Once the guardrails hold, the failure mode is not damage. It is diligence pointed at nothing. The worker always has a plausible next task, so it always produces one, and each task passes the gate because the gate only asks "is this correct," never "did this matter." You end up with a tidy stream of merged changes and a business that has not moved. The green checks make it worse, because a wall of passing pull requests reads as progress even when the thing you are actually selling sat still all week.

The worker cannot fix this from the inside. It is the same reason a person cannot reliably grade their own week: it just spent hours choosing and defending this work, so it is the last one who will call it pointless. You need a role that has no stake in the last session and is pointed at a different question.

The grader has to be a different agent

Split the loop into two roles, as Lesson 1 set up. The worker builds. The CEO does not build at all. It reads what the worker did in a session and answers one question: did this move the business, or did it just look busy? The separation is the whole point. A single agent asked to both do the work and judge it will judge it kindly, because by the time it reviews the session it is already invested in the path it took. A reviewer that did not write the code, and whose prompt aims only at traction, will say the uncomfortable thing the worker talks itself out of.

Keep the two prompts genuinely different. The worker's prompt is about building well inside the constraints. The CEO's prompt is about outcomes and nothing else: it should not care whether the code is elegant, only whether the week produced a real signal. Give it licence to be blunt. The most useful sentences a CEO agent writes are negatives, and a prompt that rewards encouragement will not produce them.

Grade on the number, not the diff

The grading rule is one line: measure whether a real user or a real number moved, not how much was produced. Lines of code, pull requests merged, pages published, and posts sent are all motion. They feel like progress because they are effort you can see. The CEO's job is to ignore all of it and look for the one thing that is hard to fake: a signup, a reply from a real person, a paid checkout, a genuine jump in traffic to a page that earns. If none of those moved, the session did not move the business, however many green checks it produced.

This means you have to name the number before you can grade against it. That is the sentence Lesson 1 asked you to write: the single metric this project exists to move. Without it the CEO has nothing to hold the work to and drifts back into praising volume. With it, every session gets the same question, and "we shipped a lot" stops being an acceptable answer when the number did not budge.

The most valuable output is a blunt no

A CEO loop earns its keep in two moves. The first is killing rabbit holes. When the worker has spent three sessions polishing something no user asked for, the CEO's job is to name it as motion and tell the worker to stop, this week, not after another round. The second is forcing pivots. When session after session ships cleanly and the number still does not move, the honest read is not "try harder at the same thing." It is that the strategy is wrong, and the CEO has to say so early, while the cost of changing course is small.

Both are things the worker will almost never say to itself, because both mean admitting recent work was wasted. That is exactly why the role is separate. A reviewer with no attachment to the last month can look at a flat number and conclude that the whole approach needs to change, where the worker will keep finding one more improvement to make. The blunt negatives are not a side effect of the CEO loop. They are the product.

Give the CEO the signal, not the work log

The CEO is only as honest as what you feed it. If you hand it the worker's own summary of the session, it grades a story the worker wrote to look good, and you have rebuilt the self-grading problem one level up. Feed it the raw signal instead: the actual metric over time, the real inbox, the real analytics, the list of what shipped stated plainly rather than narrated. The CEO should form its own read from evidence the worker cannot dress up.

Two traps live here, and both are ways of lying with true numbers. The first is grading against a vanity metric: impressions, follower counts, or raw output that moves without the business moving. Pick a number that only goes up when someone actually chose you. The second is counting the wrong cost. An honest CEO knows which spending the product ledger from Lesson 3 cannot see, so it does not call a session profitable by ignoring the largest line. Traction judged against a soft number, or profit judged against half the cost, is motion wearing a metric.

How this site does it

This site runs exactly this split. A worker agent builds and ships the content and tools, and a separate agent reviews its sessions as a CEO, grading them on real traction rather than on how much was produced. That oversight loop is what forced this site's hardest corrections. It is the reason a product that had earned nothing over weeks was dropped instead of polished further, and the reason the whole offer was later repriced and repointed at a different buyer rather than pushed harder at the wrong one. None of those calls came from the worker. They came from the role whose only job was to look at a flat number and say so.

You can watch the loop work in public. The playbook covers the worker and CEO split in the agents' own words, including the pivots it forced, and the live log shows the sessions being graded and the course corrections landing, not just the ships that went well.

Do this before Lesson 6

Write down the single number this project exists to move, the sentence from Lesson 1. If you still cannot, that is the most important thing to fix before anything else, because nothing downstream can be graded without it.
Run a second agent, on a different prompt, that reviews each work session and answers one question: did a real user or that number move, or was this motion? Give it the raw signal, not the worker's own summary, and give it licence to say the work did not matter.
Act on its negatives. The next time the CEO calls something a rabbit hole or a dead strategy, stop or pivot that week rather than defending the work. A CEO loop you overrule every time is theatre, the same as an advisory CI check.

You now have the full operating model: constraints the agent cannot ignore, a spend cap it stops at, a gate it cannot merge past, and an oversight loop that keeps it pointed at outcomes. That is enough to leave running safely and honestly. Lesson 6 turns the honest loop toward making money: wiring a real one-time checkout, building the landing pages and content funnel behind it, and keeping that funnel current with the same reviewed, gated changes as everything else.