The coding agents discussion is dominated by a focus on rules and skills, and we often neglect the role of the codebase itself on the road to safer agent tasks.

You have three main ways to reduce the odds of bugs in a codebase:

  • Document — write "don't do X" in dev docs or agent rules.
  • Detect — add tests and lints to catch mistakes after they're written.
  • Prevent — design interfaces to make wrong code impossible.

Prevention (or possibility reduction) is the principle that if you want to avoid a class of bug, you make that bug impossible to create. Testing does not achieve impossibility, because you can forget to write the test.

You can achieve this through different methods — strict typing, abstractions that make footguns private, even formal methods. The most basic example: if you have a switch-statement, it should be impossible to miss a case. Languages with rich type systems help a lot.

Agents, like humans, are prone to missing steps. They'll change a string in one place and forget to change it in another. But unlike humans, they don't know to be extra careful in the most critical parts of your code.

So I now try to ask these three questions for every agent bug:

  1. Can I prevent this class of bug at compile time? If no…
  2. Can I detect it at test time? If no…
  3. Can I document it well, and detect it when it fails in production?

When you put it like that, it's pretty obvious which one you don't want. If you've spent time in incident postmortems and maintained systems over a long time, you're rehearsed in this thinking.

But one thing is changing: prevention was historically a lot more expensive than detection, but there are now many cases where that's no longer true, particularly because large-scale migrations to safer abstractions can be orchestrated by agents.

Some recent examples of things I've done to apply this in practice in Tumbric's evaluation and training codebase:

  • Ideally Rust, (almost) never Bash. I'll use Python or TypeScript is if it's essential for the use case. Everything else is in Rust.
  • Typed state machines for workflows. In event handlers, transitions to illegal states are prevented at compile time.
  • Predicated events. The API for processing events forces events, at the type level, to declare the state they were predicated on, so version conflicts can be detected systematically.
Before any destination is accepted
// transition accepts any string
type Run = { status: string };

function transition(
  run: Run,
  to: string
): Run {
  return { ...run, status: to };
}

transition({ status: 'pending' }, 'executing');  // OK
transition({ status: 'pending' }, 'completed');  // compiles, bug ships
After destination follows current status
// destination comes from the current status
const next = {
  pending:   ['executing'],
  executing: ['completed', 'failed'],
  completed: [],
  failed:    ['pending'],
} as const;

type Status = keyof typeof next;
type Next<From extends Status> =
  (typeof next)[From][number];

function transition<From extends Status>(
  run: { status: From },
  to: Next<From>
) {
  return { ...run, status: to };
}

transition({ status: 'pending' }, 'executing');  // OK
transition({ status: 'pending' }, 'completed');  // compile error
The API stays simple: callers pass the run and the destination, but invalid destinations stop compiling.

The kind of serious investment I used to put into the reliability of payments systems is now something I'm putting into a "casual" internal system to track training runs. That's because the cost of adding strictness is now very low, the added friction is imperceptible, and the benefits are enduring.

It doesn't matter how fast the agent can code. We can't move fast without safety. So simplifying systems, adding strictness, and making error states impossible is a core human engineering activity in a world where agents do more of the implementation work.