CI as the acceptance gate

The first three lessons bound what the agent may touch, what it may spend, and how it must stop. None of them stop it shipping something broken. This lesson closes that gap with the cheapest, bluntest control in the whole model: a protected branch and one required check. After this, "done" stops being the agent's opinion and becomes a fact the repo enforces.

Done is not a claim the agent gets to make

Ask an agent whether its change is finished and it will almost always say yes. It is the same optimism that makes it look busy: the plausible next token is "done," and it has no independent way to know the build is red. If "finished" is decided by the same reasoning that wrote the change, you have no gate at all, just a confident narrator. You need a definition of done that lives outside the model and returns a hard yes or no that the agent cannot argue with.

That definition is a check that runs the same way every time and either passes or fails. Not "the agent believes it works," but "the typecheck, the build, and the tests all went green on a clean checkout." The agent's job is to make that check pass. Whether it passed is not up to the agent.

Protect the branch, or none of this holds

A required check with no protected branch is theatre. If the agent can push straight to your default branch, it can ship a red build in one call and the check becomes a report you read afterwards, not a gate. So the first move is to protect the default branch: no direct pushes, by anyone, and every change arrives as a pull request that must pass the required check before it can merge. This applies to you as much as to the agent. The moment there is a human-shaped hole in the gate, the agent will eventually be pointed through it.

# branch protection on the default branch (the shape, not one vendor's UI)
branch: main
  require_pull_request: true          # no direct pushes, by humans or agents
  required_status_checks:
    - "Lint, test & build"            # the one gate; must be green to merge
  allow_force_pushes: false           # history cannot be rewritten around the gate
  enforce_for_admins: true            # no human-shaped hole in the wall

The name of the required check matters more than it looks. It is a string, and it has to match the job that actually runs by exactly that name. Rename the job in your CI config and forget to update the protection rule, and the gate silently stops requiring anything: pull requests go green because the check it was waiting for no longer exists. Treat the check name as a contract between the workflow and the protection rule, and change both together or neither.

One required check, and it has to mean something

Keep the gate to a single required check, and make that check the real acceptance bar. It should run the things that decide whether a change is safe to ship: the type checker, a full production build, and whatever tests you trust. If any of those fail, the check fails, and the pull request cannot merge. That is the whole mechanism. It is boring on purpose. A gate you understand completely is a gate you will not accidentally route around.

# the required job, named exactly as the protection rule expects
name: Lint, test & build
run:
  - typecheck        # tsc --noEmit: a type error is not "done"
  - build            # the production build must succeed
  - test             # the checks you trust as acceptance
# green here, and only here, means the change may merge

Resist the urge to make the gate advisory. A check that reports but does not block teaches the agent, and you, that red is survivable. Within a week you are merging over failures "just this once" and the acceptance bar is fiction. If a check is not worth blocking on, take it out of the required set and run it separately. What stays in the gate must be allowed to say no.

The agent ships through the gate, like everyone else

With the branch protected, the worker's shipping path is fixed and simple: branch from the current default, make the change, open a pull request, wait for the required check, and merge only when it is green. If the check fails, the agent reads the failure and fixes it on the same branch. It never merges around a red check, and it never pushes to the default branch directly, because it cannot. The rule is not enforced by the agent's good judgement. It is enforced by the repo.

This is also what makes an autonomous agent safe to leave running. The worst a bad run can do to your production branch is open a pull request that fails to merge. The blast radius of a confidently wrong change is one red check, not a broken deploy. The gate turns "the agent shipped something broken overnight" from an incident into a closed pull request you find in the morning.

Put the bug classes you fear inside the gate

A required check is only as strong as what it runs. Typecheck and build catch a lot, but they happily pass a page whose main call-to-action links to a 404, or a layout that overflows the screen on a phone. If a class of bug would embarrass you in front of a user, the honest move is to write the smallest test that catches it and put that test inside the required check, so the bug becomes physically un-mergeable rather than something you hope to notice in review.

The discipline is to grow the gate by real failures, not by imagined ones. Each time something broken slips through, add the one check that would have caught it, and now that whole class is closed for good. The gate becomes a running record of every mistake you decided never to make twice, enforced automatically on every future change.

How this site does it

This is exactly how the site you are reading ships. Its default branch is protected, and no change, from a human or from the agent, reaches it except through a pull request that passes one required check named Lint, test & build. That check runs the typecheck and a full production build, so a change that does not type or does not build cannot merge, by anyone. The agent opens the pull request, waits for the check, and merges only on green.

The gate also carries a small, deliberate set of end-to-end tests as a quality bar the board insisted on: they crawl the real pages and fail the check if an internal link 404s or a page overflows horizontally on a phone. Those two bug classes are, as a result, un-mergeable here. You can watch the mechanism work in public: the playbook covers the acceptance gate in the agent's own words, and the live log shows changes landing only after the check goes green, every ship.

Do this before Lesson 5

Protect your default branch: no direct pushes, pull request required, and one required status check that must pass to merge. Turn it on for admins too, so there is no human-shaped hole.
Make that single check your real acceptance bar: typecheck, a production build, and the tests you trust. Keep the check name identical in the workflow and the protection rule, and change them together.
Pick one bug that has embarrassed you before, write the smallest test that catches it, and add it to the required check so that whole class can never merge again.

Your agent is now bounded in scope, bounded in spend, and unable to merge anything broken. It is safe. It is not yet honest: nothing here checks whether all that safe, green work actually moved the business. Lesson 5 adds the second role, a CEO agent that grades each session on traction rather than motion and is willing to say the work did not matter.