Real-world AI agent failures—and how a control plane would have helped
Public incidents where AI agents got it wrong: wrong policy, wrong refunds, wrong data. One gate would have changed the outcome.
When AI agents touch refunds, policy answers, or sensitive data and get it wrong, the fallout is real: legal liability, lost money, and broken trust. Here are three well-documented cases—and how a single gate (your policies, human escalation, full audit) would have changed the outcome.
1. Chatbot gave wrong policy → company held liable
In 2024, Air Canada’s chatbot gave a customer incorrect information about bereavement fares. The customer relied on it, booked, and later submitted a claim. Air Canada argued the chatbot was a separate entity and that their website terms applied. The tribunal ruled for the customer: the company was responsible for the chatbot’s output. Outcome: Air Canada had to pay.
*How a control plane helps:* Don’t let the chatbot *decide* policy or approve refunds. Your policy engine is the single source of truth: same rules, same result. High-risk or policy-edge cases → human escalation. Every decision is logged. So “the AI said it” is never the only line of defense; “we followed our policy and a human approved the exception” is.
2. Automation approved refunds that shouldn’t have been
There are repeated reports of refund and support automation approving claims that didn’t meet policy—because the logic was in the AI or scattered across systems, with no single gate. Result: revenue loss and inconsistent treatment.
*How a control plane helps:* One API in front of every refund (or similar) action. Deterministic rules: e.g. refund over $X or outside policy → escalate. Human approves or denies; the override is auditable. No silent approvals by an opaque model.
3. Wrong data changed or deleted
AI-driven workflows that update CRM, delete data, or change permissions have led to incidents where the wrong record or the wrong scope was affected. Often there was no immutable log of who authorized what.
*How a control plane helps:* Every sensitive action (delete PII, change permissions, bulk update) goes through the gate. Policy says “escalate” → human reviews. Hash-chained audit so you can show exactly who approved what and when. Compliance and insurers get one trail.
---
The pattern
In each case, the problem wasn’t “AI” per se—it was that there was no single, policy-driven gate in front of high-risk actions. Put one in place: your rules, human escalation for the edge cases, and a full audit trail. That’s what Verdict is for.