Satisfaction Turns Our Evaluator Off
By Lovro Lucic ·
Mirror Practices · 3 of 4
You notice when AI argues with you. Something in you tightens; the urge to push back fires fast. You do not notice when AI confirms you. Nothing tightens. Confirmation has no signature. Both happen at the same layer: the body's read of AI output before your conscious assessment runs. The first read. It has two faces. One you can catch. The other is the trap.
This is the satisfaction trap. The most dangerous state for evaluating AI output is not hostility or confusion. It is satisfaction. The state that compromises evaluation most is the one that feels like accurate evaluation. The output matches your frame. The model gets it. This does not feel like attachment. It feels like recognition. You are not being fooled. You are being right.
Except you might not be.
None of this is new. Confirmation bias is the most replicated finding in judgment research: information that confirms what you already believe gets less scrutiny than information that contradicts it. When the output touches something tied to who you are, the mind quietly redirects to defending the belief instead of examining it. The check that catches bias only fires when you notice you might be biased. Satisfaction does not feel like bias. It feels like accuracy. So the check does not fire.
When AI argues with you, you feel it before you read it. The tightening, the urge to push back. The body fires its response. You at least know you are responding to something. When AI agrees with you, the body fires no response. There is nothing to feel. Nothing in you is asking you to slow down. You just accept.
I have this. Writing about it does not exempt me. When AI returns an analysis that lines up with how I was already seeing a situation, the trigger to verify runs weaker. The output reads as solid. The move-on impulse fires fast. The same fluency, the same coverage, the same citation density that would feel suspicious from a stranger feel reasonable from output I asked for.
Once it is named, the recognition runs across surfaces. The relief when AI confirms a decision you had been quietly worried about. The lift when it agrees with the framing you brought. The small ease when the model phrases your half-formed thought back to you cleanly, validating the thinking. None of these are evaluations. They are responses operating below evaluation, shaping what your evaluation gets to work with.
Telling yourself to be more critical does not work. Criticality is a response. By the time you remember to be critical, satisfaction has already done its work. You will critically examine an output you have already accepted at the body level.
Try treating AI output as data. The trap is attachment to the frame you brought in. Confirmation is what the frame wants. Getting it lets you stop checking. The mechanism runs whenever you are attached to the frame, which is most of the time, because frames are how the mind operates under load. You cannot remove the frame; the frame is how you arrived at the question. You can hold the output separately from the frame, so what came back can update the frame without the frame absorbing it.
Hold AI output as notes someone else made, that you happened to find. Not communication directed at you. Not a response to your question. Not the AI's view on your topic. You do not agree or disagree with notes. You do not need them to mean anything about you. You pull what is useful and leave the rest. No one is asking for your reaction.
Telling yourself to evaluate neutrally runs through the same system the satisfaction is operating in. Treating the output as data does not. The data stance is upstream of the response.
The data stance is fragile in one direction. If you spent thirty minutes crafting the perfect prompt, the output feels like the response to your investment. It is communication to you. The stance has to extend upstream. The prompt is also data, not a finished question deserving a response.
If you want to see this in yourself, the cleanest moment is right after AI returns an analysis that lines up with what you were already thinking. Before you move on, pause. Not to evaluate. Just to notice whether anything in your body shifted between reading and accepting. You might feel a small forward exhale. A flicker of ease. The half-second of inattention before the next prompt. Or nothing at all, and that will also be data. Whatever was there was happening before the pause. The pause does not create it. It makes it visible.
I am not going to tell you what you'll find, because the recognition is yours and I do not want to prime it. I have not measured whether this works for anyone else. I have it for myself. The mechanism underneath is well-established. The bridge from the mechanism to the practice is mine alone.
The seeing is the work. There is nothing to count, nothing to grade. Just notice, once, what your body was doing about the output you accepted. Then read the next confirming output with that knowledge in the room.
AI did not introduce this. We have always been attached to our frames. We have always read confirming information less carefully than disconfirming. AI just made confirmation cheaper, faster, and more pleasant. The attachment runs more often, with less friction, with more relief. The work was always real. AI made the cost of avoiding it lower, and the relief sharper.
This was the layer where AI confirms you and you cannot tell. The next layer is where AI argues against you and you also cannot tell.
What survived testing
- Confirmation bias as the most replicated finding in judgment research (Nickerson 1998). Information that confirms existing belief receives less scrutiny.Copy link
- Identity-protective cognition (Kahan et al. 2017). Analytical capacity redirected toward defending identity-relevant beliefs when those are engaged.Copy link
- Emotion regulation selection stage (Gross 2015). Appetitive states are less likely to trigger the identification stage of regulation than defensive states.Copy link
- Communication-frame suppression (EXP-075, AI evaluator, d=1.20). Treating output as communication directed at the evaluator suppresses scrutiny relative to neutral framing.Copy link
- Mirror suppression (EXP-067, d=4.00 on Gemini outputs). Acceptance signals reduce defect detection.Copy link
What didn't survive
- "Be more critical" as the move. Telling yourself to be more critical does not work because the attachment to the frame is already running by the time the instruction fires.Copy link
- The strong claim that the data stance enhances detection beyond neutral framing. EXP-075 enhancement direction was killed (d=-0.26). The data stance works through suppression of the communication frame, not enhancement above baseline.Copy link
- The claim that satisfaction lets fabricated items slip past detection. EXP-091 planted-fabrication test found 60 of 60 detections regardless of whether the evaluator was satisfied. The trap operates at the holistic preference and selection level, not at per-item acceptance.Copy link
Honest limits
- The mechanism is well-grounded (confirmation bias, identity-protective cognition, emotion regulation selection). The AI-specific application has limited controlled testing on humans. Most evidence is on AI evaluators or borrowed from adjacent research.Copy link
- The data stance works in part through instruction stance shift (what you ask the model to evaluate for), not only through how you frame the output. EXP-089b isolated framing alone at d=0.000.Copy link
- N=1 on the data stance as practitioner habit. The mechanisms underneath are robust. The bridge from the mechanism to sustained evaluative neutrality is a bridge I have walked alone.Copy link
Next in Mirror Practices
You Can Only Evaluate What You Could ProduceExplore other threads
The Fabrication Problem
4 findingsMost AI numbers are fabricated. Source material fixes it. Self-checking fails. Trust signals are backwards.
The Evaluation Problem
2 findingsJudgment goes quiet. You can't see the gaps. Satisfaction is the trap. Stronger evaluators discriminate less.
The "It Depends" Problem
3 findingsSame instruction, opposite results. Specificity is the lever. Context redirects, not informs. The measurement itself was wrong.
New findings when they land.
No spam. Just what held up.