Almost every team can build an AI agent that demos well. Far fewer can keep one running in production without it quietly going wrong. The gap is not the model — it is everything around the model that a demo never tests: messy inputs, brittle tool calls, and failures nobody is watching for.
Why agents fail
The failure modes are remarkably consistent across teams:
- No evaluation: “it felt better” is not a release gate, so regressions ship unnoticed.
- Unscoped tools: the agent can call anything, so one wrong step has a large blast radius.
- Irreversible actions: sends and writes happen directly, so a mistake is permanent.
- No observability: when something breaks there is no trace to replay, only guesses.

The framework we use instead
We treat an agent as a system to be contained, not a prompt to be perfected. Four disciplines do most of the work.
1. Scope the tools
Give the agent only the actions it genuinely needs, with typed, validated arguments. A narrow tool surface is the cheapest safety mechanism you will ever add.
2. Make actions reversible
Prefer drafts over sends, holds over charges, and staged changes over direct writes. An action you can undo is an action a mistake cannot make catastrophic.
3. Evaluate every change
A fixed set of real-world cases, scored on every prompt or model change, turns “seems fine” into a number you can defend. Twenty representative cases catch most regressions.
4. Observe everything
Log every plan, tool call, and result so any run can be replayed. Observability is what turns a scary incident into a five-minute fix.
A safe agent is not one that never makes a mistake. It is one whose mistakes are cheap, visible, and reversible.
A pre-production checklist
- Can every action be undone or held for review?
- Is each tool scoped, typed, and validated before execution?
- Are high-impact steps gated behind a human approval?
- Can you replay any run from its logged plan and tool calls?
Answer yes to all four and you have an agent whose worst day is an inconvenience, not an incident — which is the only kind worth putting in front of customers.


