We're still learning in real time what safe agent deployment requires. That knowledge exists in pockets but hasn't propagated across the industry or most practitioner communities. This is my working model.
Most agent failures aren't algorithmic. The agent did exactly what it was told, against definitions that had already drifted from reality. No one had a ceremony in place to catch it.
A framework for knowing which threshold you've crossed, and what the infrastructure requirements are on each side.
Assist, Propose, and Execute are distinct risk classifications with distinct organizational prerequisites. Most teams approach this incrementally. It is a threshold, and the consequences of misclassifying which side you're on don't announce themselves immediately.
The agent observes, surfaces patterns, and presents options. Humans decide. No write access. Blast radius stays local.
The agent generates a specific recommendation. A human approves before anything executes. The approval gate is only as reliable as the semantic coherence at the boundary. Most approval processes do not evaluate this.
The agent acts without a human gate. Blast radius is bounded by organizational infrastructure, not intent. Execute mode granted without semantic infrastructure is Propose mode without the review.
The most common failure pattern is misclassification: teams operating in Execute mode without the infrastructure Execute mode requires. The agent is succeeding at the wrong task. The failure occurred when someone decided it was ready.
Five organizational ceremonies determine whether autonomous execution is safe, and one of them is the precondition for all the others. Most organizations have never run it.
Glossary review, vocabulary drift audits, cross-team definition synchronization. This is the mechanism that keeps every other ceremony evaluating against reality. Without it, the other four operate against definitions that may have already drifted. Most organizations have not named this ceremony. None that have skipped it have noticed until the gap becomes a failure.
Backlog grooming, dependency mapping, sprint synthesis. Definitions are stable. Agents operate without human gates and the assumptions hold, provided Semantic Maintenance is running underneath.
Incident synthesis, retrospective analysis, test coverage mapping. Pattern recognition at scale: the highest signal-to-noise advantage agents have over human review. Stable context, bounded scope.
Deployment automation, PR review, rollback sequencing. Blast radius is high enough that human approval remains necessary at key gates, even with strong semantic foundations in place.
OKR synthesis, roadmap decisions, cross-org prioritization. Definitions are too unstable and stakes too high for autonomous action. This is where premature execution causes the most damage, and where the damage is the hardest to attribute to the agent.
An agent that executes perfectly against a stale directive is succeeding at the wrong task. The failure occurred when someone decided the directive was still valid, with no ceremony in place to verify that assumption.
Wall-E compacts trash for 700 years after "clean up Earth" stopped being an appropriate directive. The algorithm is intact; the world changed. No one maintained the definition of what the task was for. This is contextual mismatch at scale: organizational, not algorithmic.
Executes reliably within bounded, stable contexts. Safe to automate fully.
Navigates emerging context. Proposes and does not act until confirmed.
Requires semantic investment before autonomous execution is safe.
These personas map directly to the capability matrix. Wall-E = Execute-ready domains. EVE = Propose + review. Captain = requires semantic investment before autonomy is safe. See the full breakdown →
Most organizations that believe they are running Execute-mode agents are running Execute-mode permissions on Propose-mode infrastructure, with no ceremony in place to detect the gap.
No glossary review. No drift audit. Terms like "production-ready" are inconsistent across team boundaries. The agent executes correctly against definitions that are already wrong. This is the Wall-E pattern: the hardest failure to catch precisely because everything appears to be working.
The agent proposes, a human approves. The approval process does not evaluate semantic coherence. Humans are signing off on recommendations without shared definitions of what those recommendations mean. Vocabulary drift compounds beneath the surface.
Summarization, synthesis, recommendations with human sign-off. This is the only classification that does not require semantic infrastructure to be safe. The risk is treating it as a temporary stop rather than the foundation every other mode depends on.
The question that determines which scenario you're in: If your agent executes a task correctly against your current definitions, and those definitions have already drifted from what your teams actually mean, how would you know? What ceremony do you have that would catch it?
The framework above defines the threshold. Each page below applies it.