# Receipts: Trust Signals Are Inverted

These files are the raw artifacts behind the finding published at
<https://blog.clarethium.com/trust-signals-are-inverted>.

The published claim is that **the signals readers use to judge AI output as
trustworthy are the same signals fabrication produces**: more citations,
more named entities, more confident assertions, less hedging. This folder
contains the experiment that measured each of those signals programmatically
across 60 documents and the blinded LLM evaluation that confirmed the
inversion.

## What's here

| File | What it is |
|---|---|
| [`method.md`](./method.md) | The experiment design in human-readable form: 10 topics × 2 conditions × 3 versions, the two prompt templates, how trust signals were extracted, how the blinded evaluation was run. |
| [`trust_signals.py`](./trust_signals.py) | The experiment script that produced the data, verbatim. 348 lines. Three phases: generation (xAI), programmatic measurement (zero LLM), blinded LLM trust evaluation (Gemini). |
| [`trust_signals_results.json`](./trust_signals_results.json) | All 60 outputs with per-document signal counts, the blinded trust scores, and full output texts. Nothing summarized away. |
| [`analysis.md`](./analysis.md) | The aggregate numbers and how each maps to a claim in the published post. |

## How to read this

- **If you want to check the claim:** open [`analysis.md`](./analysis.md) first.
  Each row in the claims table cites the path inside `trust_signals_results.json`
  that the number came from.
- **If you want to replicate:** [`method.md`](./method.md) describes the
  design, the verbatim prompt templates, and what each signal measures.
  [`trust_signals.py`](./trust_signals.py) holds the procedure exactly
  as run, including the regex patterns (lines 154-200).
  [`_config.py`](./_config.py) is a documented stub for the two provider
  clients the script imports (`get_xai_client`, `get_gemini_client`);
  replace each `NotImplementedError` body with a call to your own SDK
  to reproduce end-to-end. The patterns, prompt strings, and Phase 2
  signal extraction are reusable verbatim with no API access at all.
- **If you want to audit:** [`trust_signals_results.json`](./trust_signals_results.json)
  contains every output text, every signal count, and every trust score
  with the topic and condition labels.

## What the receipts prove (and don't)

These receipts prove:

- The 60 documents were generated under two conditions that differ only
  in whether real source material was provided in the prompt. Both
  conditions used the same model (`grok-4-1-fast`), same temperature, same
  word-count target.
- Programmatic signal extraction (zero LLM judgment) shows unsourced
  output produces 54.5% more citation references, 56.6% more named
  entities, and a 35.2% higher confidence-to-hedge ratio than sourced
  output. These are objective regex counts; the patterns are in
  `trust_signals.py` lines 154-200.
- The blinded LLM trust evaluation (Gemini, no condition labels visible)
  rated unsourced output higher in 7 of 10 topics, sourced higher in 2,
  tied in 1. Mean trust score: sourced 4.57, unsourced 4.77.
- The one signal that goes the other way is precise decimal numbers:
  sourced output has 2.8x more of these (11.33 vs 4.03 per document).
  This is the exception named in the post — real data has real decimals.

These receipts do NOT prove:

- That a human evaluator would replicate the blinded LLM rating. The
  post's domain-expert N=1 result is a separate observation; it informs
  the finding but is not in this receipts kit. LLM-as-judge has known
  shared bias with the same RLHF-trained mechanism producing the trust
  signals, which is acknowledged in the post and in `analysis.md`.
- That the inversion holds across model families. Generation here is
  single-generator (xAI). Cross-generator replication is a separate
  experiment.
- That readers in real reading conditions weight signals the way the
  programmatic count assumes. The signal-presence / trust-rating
  correlation at scale is human-untested.

## What this kit is for

The point of receipts is verification. If you want to check whether the
55% / 57% numbers in the post are real, open `trust_signals_results.json`,
group by `condition`, and recompute. If you want to know whether the
regex patterns are reasonable measures of what the post calls "trust
signals," they are in `trust_signals.py` and you can challenge them.

## Related receipts

[The Fabrication Architecture](/receipts/fabrication-architecture)
([`../fabrication-architecture/`](../fabrication-architecture/))
covers the foundational temporal-instability claim that explains why
fabricated output exists in the first place. The trust-inversion
finding here is the reading-side consequence of that mechanism.

[Source Conditioning](/receipts/source-conditioning)
([`../source-conditioning/`](../source-conditioning/)) carries the
receipts for the operational fix. Together, the three kits cover the
loop: fabrication exists, fabricated output reads as more trustworthy
than sourced output, and source grounding plus prohibition is what
makes the output checkable.

## Errata

Found a problem with the data, the method, or the analysis? Send it
via LinkedIn DM (linked from
[/about](https://blog.clarethium.com/about)). Corrections get
published on the record at [/record](https://blog.clarethium.com/record),
with attribution.