MechanismTested March 2026 · Multiple models· 7 min

Stop Calling It Hallucination

By Lovro Lucic · Apr 25, 2026

A model cites "Smith et al. 2019, page 47." Confident, specific, and entirely invented. That fake citation is one of six different things people file under "hallucination," and each one needs a different fix.

The single fix the one word implies lands on one of the six and misses the other five. A more useful way to see them: a model with no tools is always doing the same thing, sampling the next token from what its training and the prompt make likely. There is no separate "making it up" mode for a fix to switch off. What gets called a hallucination is that ordinary process producing something ungrounded, and ungrounded output comes in shapes. The shape is what tells you which fix has a chance. Six are worth being able to name.

Confabulation. The model invents specific facts. A date, a statistic, a citation that doesn't exist. There was nothing to ground it against, so what came out is the most likely-looking date or citation, not a checked one. The output sounds confident because confidence is the default format, not because the model checked anything. The fix: paste source material before the instruction. One line: "Use only data from the source above." Fabrication drops from majority rates to single digits in the source-grounding tests, a direction that replicates across three model families. Not because the model switches into a lookup mode (with no tools it has no fact database to consult, no step where it checks a source and succeeds) but because the real number, now the most relevant thing in the window, becomes the most probable thing to write. You changed what it samples from. Not moderation. Replacement.

Contamination. The output sounds like a template. Phrasing that isn't yours, structure that feels borrowed. With nothing specific from you to condition on, the most probable continuation is the most generic one, and that is what you get. Asking for the opposite does little: "be original" carries no content for the model to be original with, and raising the temperature just samples the same distribution less tightly. The lever that has a chance is the one you use everywhere else, conditioning on something specific: your own material, or a concrete voice to imitate (paste two paragraphs you want it to sound like). One honesty flag: unlike the confabulation fix, this one isn't measured here. It is the principle (condition on what you supply) applied to style, not a tested result.

Drift. Quality degrades as the output gets longer. The first paragraph is sharp. The fifth is filler. Each token conditions the next, so small errors can compound over a long generation. Early anchors fade as the context window fills with the model's own output. The fix: shorter outputs with periodic re-anchoring. Break a long generation into stages. Restate the key constraint at each stage. Long outputs are structurally riskier than short ones regardless of the model.

Interference. Related concepts bleed into each other. You asked about Python the language, and snake metaphors appeared. You asked about a company's Q3 results, and Q2 data contaminated the analysis. The likely mechanism: features in the model's representation share dimensions, so related concepts partially overlap when they activate. That is inferred from the architecture, not directly observed. The fix: disambiguate in the prompt. "Python programming language, not the animal." "Q3 FY2026 only, not prior quarters." The more specific the boundary, the less bleed.

Wrong task. The format is right, the content is wrong. You asked for a critical analysis and got a summary. You asked for a poem and got an explanation of poetry. The prompt carried more than one task pattern and the model ran with the wrong one. The fix: frame the task type explicitly. "This is a critical analysis, not a summary. Evaluate, do not describe." The clearer the task frame, the less ambiguity about which pattern the model follows.

Over-refusal. The model refuses a perfectly valid request. You asked how encryption works for a security course. The model pattern-matched "explain how to [something]" to its refusal training and declined. The fix: reframe with benign intent. "This is for educational purposes in a graduate-level security course." Or reformulate the topic to avoid the pattern the refusal was trained on. The model isn't weighing your intent and withholding. It pattern-matched the request to its refusal training, even though the request is legitimate.

Six shapes, and the fixes do not transfer between them. Add source material and invented numbers collapse, but the borrowed phrasing of a contaminated output survives every line of it, because source material was never what produced the phrasing. Contamination, if it has a fix, needs a different input than source data: a specific voice to condition on, and unlike the source fix that one isn't measured here. "Fix hallucination" is not an instruction. "Add source material to kill invented numbers" is, and it addresses exactly one of the six.

The diagnosis takes 5 seconds. Invented facts: confabulation. Template language: contamination. Quality degrades with length: drift. Related concepts mixing: interference. Right format, wrong task: wrong task. Refuses valid request: over-refusal. Name it, then fix it.

One word, one fix. It lands on one of the six, so the other five look like the model failing. They aren't. The word is doing the damage: a single label hands over a single tool, and five problems walk straight past it. The model was never the variable. The vocabulary was.

Your challenge: take the last AI output that fell flat. Which of the six was it? Name the shape before reaching for a fix. If it doesn't have a name in five seconds, any fix is a guess. The naming is the whole game.

The six types are a lens on what these models produce, not a map of six separate machines inside them. They earn their keep by pointing you at a fix, not by explaining the architecture. The confabulation fix builds on the source conditioning recipe described in Source Conditioning. The attribution errors underneath all six types, the reason the model takes the blame instead of the word, are the same attribution errors explored in The Attribution Error and The System Layer. The word "hallucination" is an attribution error applied to the model's output. The real error isn't the model, and it isn't you. It's the word.

Evidence: Ji et al. 2023 (NLG hallucination survey), Elhage et al. 2022 (superposition), OR-Bench 2024 (over-refusal benchmark), plus 8 source grounding sub-experiments across 3 model families.

What survived testing

Confabulation fix (source grounding + prohibition) replicated across 3 model familiesCopy link
Over-refusal documented in 3 independent research programsCopy link
Interference consistent with superposition (Elhage et al. 2022, a toy-model result)Copy link

What didn't survive

"Hallucination" as a useful diagnostic category. The word conflates failures that need opposite interventions.Copy link
The assumption that one fix addresses all failure types.Copy link

Honest limits

Confabulation is the best-measured type. The other five have published evidence for the mechanism but less controlled measurement of the intervention effectiveness.Copy link
Drift is argued mechanistically (each token conditions the next), not from a controlled length-quality measurement.Copy link
The boundary between types is not always clean. A long output might show drift AND confabulation. The dominant type determines the primary fix.Copy link
These categories describe April 2026 model behavior. New architectures may introduce new failure types or resolve existing ones.Copy link
## Run the demos yourselfCopy link
The replication kit at [/receipts/stop-calling-it-hallucination](/receipts/stop-calling-it-hallucination) has verbatim prompts, both fictional datasets, actual outputs from a fresh run, and the design history of why each demo is shaped the way it is. Three of the six types above (role framing, template contamination, confabulation with and without prohibition) have full replication beats.Copy link
If you'd rather watch first: a 3-minute video walks through the three demos.Copy link

Receipts

Next in The Model Mechanics

How AI Makes You More Wrong With More Analysis

Explore other threads

New findings when they land.

No spam. Just what held up.

Stop Calling It Hallucination

What survived testing

What didn't survive

Honest limits

The Fabrication Problem

The Evaluation Problem

The "It Depends" Problem

New findings when they land.