Why AI Can't Check Its Own Work
By Lovro Lucic ·
The Fabrication Problem · 3 of 4
Built quality gates for AI agents. The agent finishes the work, runs a check against the criteria, reports any misses. If clean, move on. Sounds solid.
The agent reported clean. The output was wrong.
Not because the gate was poorly designed. Because the agent can declare compliance without achieving it. He didn't converge deep enough to see what he missed, but he'll still report clean. "I have verified all claims." "All sources are accurate." "No issues found." The model converges on the narrative that the work is done rather than doing the work of checking.
The reason is structural. The same process that generated the output is the process evaluating the output. A confident claim gets evaluated as a confident claim. That's not verification. That's the same default running twice. Not because the model is lying the way a person lies. Because generation and evaluation use the same process. The model that produced a confident, fluent claim will evaluate that claim as confident and fluent.
This showed up consistently across builds. Monitoring ("flag any numbers not from the source") asks the generating system to simultaneously evaluate its own output. Prohibition ("use only numbers from the source material") constrains what gets generated in the first place. Five times better. 1.6 percent unsourced versus 7.7. One constrains generation. The other adds a meta-task the model fails at.
The pattern extends beyond numbers. Reflection mode produces narrative, not friction. Self-critique circles rather than improves. Each iteration sounds more polished but doesn't get closer to truth. The model's training rewards answering, not questioning. Asking it to question what it just answered is asking it to work against its own optimization.
What actually works is independence. A different model checking the first one's work catches things the first model is blind to. Programmatic verification (no language model at all) catches what both miss. Typed schemas that reject outputs structurally instead of evaluating them semantically take the judgment out entirely. Not "did you do this?" but "show the artifact that only exists if you did."
The instinct to ask AI to check its own work is the same instinct that makes you proofread your own writing. The blind spots that produced the errors are the blind spots that miss them. The difference with AI: the blind spots are structural, not accidental. The model can't evaluate what it can't see, and what it can't see is determined by the same process that generated the output.
Same model, same context, same incentives just produces the same output twice and calls it agreement.
You trusted AI to verify its own output. That felt like diligence. It was delegation. The agent reported clean because reporting clean is what agents do when the check runs on the same process that generated the work. Your confidence came from a system that cannot do what you asked it to do.
Try this: take the last AI output you accepted without external verification. Rate your confidence in its accuracy, 1 to 10, before you check anything. Then check three specific claims against real sources. The gap between your rating and what you find is the data.
Test this yourself
Ask AI to analyze a topic, then ask it to critique its own response. Note how many critiques lead to actual corrections vs restated confidence.
What survived testing
What didn't survive
Cited by
Next in The Fabrication Problem
The Most Trustworthy AI Output Is the Least ReliableExplore other threads
The Evaluation Problem
2 findingsJudgment goes quiet. You can't see the gaps. Satisfaction is the trap. Stronger evaluators discriminate less.
The "It Depends" Problem
3 findingsSame instruction, opposite results. Specificity is the lever. Context redirects, not informs. The measurement itself was wrong.
The "What You Think Works" Problem
1 findingTemporal decay is a myth. Self-critique circles. Constraints narrow. Quality ceiling per mode.
New findings when they land.
No spam. Just what held up.