You Can Only Evaluate What You Could Produce
By Lovro Lucic ·
Mirror Practices · 4 of 4
Twice in two days last week, I caught myself defending a position I could not rebuild. The framing came from AI. The path was never built.
This is borrowed certainty. The kind of conclusion you defend not because you worked it through, but because by the time anyone asks, it sits in your head as your position. It is the difference between builders who grow with AI and builders who plateau. The plateau is not about how much AI we use. It is about how we use our own cognition.
Take something AI generated for you this week. An analysis. A recommendation. A strategy document. A product spec. Something you used for real work, not a demo. Explain it. Out loud, walking, to yourself. To the AI in a fresh session, without leaning on the document. To a colleague if you have one. Why this approach and not another. What the key trade-offs were. What would change the conclusion.
Where you produce clearly, the thinking is yours. Where you reach for the original, or say "the analysis showed that..." or "AI recommended...", you are carrying borrowed understanding. The words were delivered. The thinking wasn't.
The construction trace is why. When you generate, you build a mental model as you go. You feel the hard parts. You notice the gaps. You know what good looks like because you struggled to produce it. When AI generates, you skip the struggle. The output arrives fully formed. The deep check that catches bad reasoning under pressure requires the model only generation builds.
The wrong move after seeing this is to try to own everything yourself. Cognition is finite. Choose.
Own what compounds. Taste. Judgment. Problem framing. Direction. The call about what matters in this specific situation. None of it is verifiable from outside, which is exactly why it needs you. Every session you spend cognition here trains the skill. Growth is on a curve.
Delegate what does not compound. Syntax. Grammar. Mechanical execution. Well-defined transformations. Anything cheap to verify against ground truth. AI does these, often better, definitely faster. Cognition spent here trains nothing that will not be cheap to verify next year.
The pattern most of us fall into is the inverse. We let AI decide what matters and spend our cognition checking the punctuation. We delegate the direction and keep the mechanical. The thinking goes generic. The decisions go average. Ship rate goes up. Growth rate does not. The why is usually invisible from inside the pattern.
Pick three AI outputs from this week. Run the explanation pass. For every claim you stall on, decide: is this a part I want my mental model on, or a part I am happy to carry as borrowed? When you use a borrowed conclusion next, mark it borrowed in real time, even just to yourself. The discrimination is the practice. With reps, you stop trying to own everything and stop letting AI decide everything. It becomes instinct.
Within two weeks the work shifts. Strategy docs you can defend without reaching for the source. Product decisions where the framing is yours and the execution is delegated and labeled. Build cycles where you put cognition on the problem and let AI handle the well-defined execution. Community decisions where the direction is yours and the wordcraft is delegated. Meetings where you say "I am carrying this from AI; here is my actual reasoning on what I worked through, here is the part I have not," and the conversation moves forward. Calls you used to lose by reversal that hold up because you only commit to what you have built a model for.
You will know it is working when you catch yourself reaching for a borrowed conclusion and either reconstruct it before using it, or use it labeled. The pattern that breaks: defending a conclusion you cannot reconstruct. If that keeps happening, the discrimination has not landed yet. Pick which side.
This was the discipline of choosing where your cognition goes. Whichever side you choose, AI amplifies what you bring to it.
What survived testing
- Generation effect on encoding (Slamecka and Graf 1978). Generating produces deeper encoding than reading. 86-experiment meta-analysis (Bertsch et al. 2007) confirms robustness across word lists, sentences, and complex material.Copy link
- Self-explanation effect (Chi et al. 1989). Generating explanations while studying produces 2 to 3x learning over passive reading.Copy link
- Ironies of automation (Bainbridge 1983). The more you automate the easy parts, the more critical the remaining human role becomes, and the less practiced the human is for it.Copy link
- The construction trace as the mechanism behind evaluation depth. What you generated, you can evaluate deeply. What was delivered to you, you can only check on the surface.Copy link
What didn't survive
- "Always generate first" as universal prescription, and its mirror, "delegate everything to AI." Both miss the discrimination move. The practice is generating what compounds for you and delegating what does not.Copy link
- "Borrowed is bad." Borrowed is fine when labeled. The failure is borrowed-mistaken-for-yours.Copy link
- Anchoring risk on generate-first (Tversky and Kahneman 1974) is real and unresolved at the controlled-test level. The mitigation: use the construction trace for structural evaluation (framing, completeness, what's missing), not content comparison.Copy link
Honest limits
- The piece gives the principle of choosing what to own, not what specifically should compound for you. That depends on what you are building toward. Problem framing over syntax for one practitioner. Thesis over formatting for another. Strategy over execution for a third. Community design over message drafting for a fourth.Copy link
- The ownership test is self-report and a rough proxy, not a precise measurement.Copy link
- N=1 on the practice itself. The construction trace mechanism is established cognitive science.Copy link
Explore other threads
The Fabrication Problem
4 findingsMost AI numbers are fabricated. Source material fixes it. Self-checking fails. Trust signals are backwards.
The Evaluation Problem
2 findingsJudgment goes quiet. You can't see the gaps. Satisfaction is the trap. Stronger evaluators discriminate less.
The "It Depends" Problem
3 findingsSame instruction, opposite results. Specificity is the lever. Context redirects, not informs. The measurement itself was wrong.
New findings when they land.
No spam. Just what held up.