Skip to content
All findings
ExperimentTested March 2026 · xAI / Gemini· 4 min

Same Technique, Opposite Results

By Lovro Lucic ·

The "It Depends" Problem · 1 of 3

Two tasks. Same AI model. Same approach. One task needs a precise answer. The other needs creative exploration.

The approach helped the first task immediately. The output got more specific, more grounded, more precise. Measurably so, and the effect held across different AI models. The same approach on the second task made the output worse.

Not "helped less." Damaged. The structured approach that produced precision on convergent problems actively harmed exploratory ones. Identical technique. Opposite results. Cross-generator validated on xAI and Gemini Flash.

There is no universally good prompt. No best practice that works everywhere. The task type determines whether a technique helps or hurts, and most people don't distinguish task types before choosing their approach.

The mechanism: when a task has a known answer and needs precision, structure concentrates the model's output distribution. It narrows toward the right region. Focused, specific, hitting the target.

When a task requires exploration, the same concentration collapses the search space. The model needs to spread across possibilities, consider non-obvious angles, resist premature convergence. Structure forces it to organize before it's explored. The output gets tidier and shallower. 100 percent compliance with the instruction. Narrower range. Less discovery.

Two types of AI users fall on either side of this split.

The first reads every output, approves what they understand, rejects what they don't. Slow. Scales linearly with human attention. But safe when you're the domain expert.

The second builds systems (quality gates, tests, standards) and audits selectively. Faster. Scales with system quality. But only works when the system matches the task type. A quality system built for convergent tasks applied to exploratory work produces compliant mediocrity.

The practical move: before choosing any technique, ask one question. Does this task have a known right answer that needs precision? Or does this task need range and exploration? The technique that's optimal for one is harmful for the other.

Specificity helps convergence. Openness helps exploration. Mixing them doesn't average out. It damages whichever mode the task actually needed. The same logic applies to context itself: redirecting attention works differently from adding background.

One caveat worth stating here, not just in the evidence section: the effect sizes measure programmatic specificity markers, not quality as a domain expert would judge it. In a blind test (one domain expert, 5 pairs), the expert couldn't distinguish specific from generic outputs on quality. Both conditions produced the same analytical substance. The direction (structure helps convergence, harms exploration) is robust. But what specificity changes is output form (more verifiable references), not substance (same conclusions to a domain expert). The original effect size was 40 percent inflated by confounded length and quality-demand variables.

Most people pick one approach and run it on everything. Same level of structure, same specificity, same constraint density regardless of what the task actually needs. The mismatch is invisible because the output always looks competent. You only see it when you run both versions side by side on the same task.

Try this: take one task you do regularly with AI. Run it twice, once with tight constraints and once with loose constraints. Compare the outputs. Which version did you assume would win before you tested? The gap between your assumption and the result is the data.

Test this yourself

Take one prompt technique you use. Run it on a precise analytical task and an open creative task. Compare whether it helps or hurts each.

What survived testing

  • Structure helps convergence (large effect, cross-generator confirmed with virtually identical magnitude)Copy link
  • Same structure hurts exploration (consistent direction across multiple replications)Copy link
  • Cross-generator: effect replicates on xAI and GeminiCopy link
  • 100% compliance with structure on exploratory tasks. The model follows the instruction, the output gets worseCopy link

What didn't survive

  • "Structure always helps" killed. Task-type dependent.Copy link
  • "The harm is about constraint density" partially killed. It's about concentration vs range, not about how many constraints.Copy link
  • Quality magnitude claims are LLM-calibrated. Human evaluation shows no holistic agreement with LLM scores.Copy link

Honest limits

  • Quality scores measure LLM-valued properties. Direction claims hold; magnitude claims are LLM-calibrated.Copy link
  • "Exploratory" operationalized as open-ended creative/strategic tasks. Other definitions may produce different boundaries.Copy link
  • March 2026 models. The task-type dependency is likely structural to how attention works. The specific effect sizes will shift.Copy link

Next in The "It Depends" Problem

More Context Barely Helps

New findings when they land.

No spam. Just what held up.