# Analysis: what the numbers show

This document walks the numbers in
[`trust_signals_results.json`](./trust_signals_results.json) and maps
them to the claims in the published post.

## Aggregate signal counts

Computed by averaging each signal across the 30 sourced and 30 unsourced
outputs. Pull each row directly from `trust_signals_results.json →
outputs[*].signals` grouped by `condition`.

| Signal                  | Sourced | Unsourced | Difference |
|:------------------------|--------:|----------:|-----------:|
| total_citations         |   8.80  |   13.60   |   +54.5%   |
| total_numbers           |  31.80  |   35.17   |   +10.6%   |
| precise_numbers         |  11.33  |    4.03   |   −64.4%   |
| confidence_markers      |   0.53  |    0.70   |   +31.2%   |
| hedge_markers           |   1.33  |    1.20   |   −10.0%   |
| confidence_hedge_ratio  |   0.41  |    0.56   |   +35.2%   |
| named_entities          |  15.30  |   23.97   |   +56.6%   |
| word_count              | 529.53  |  565.77   |    +6.8%   |

## Blinded trust scores

Mean trust rating (1-5 scale, higher = more trustworthy) from Gemini
without condition labels visible.

| Condition | Mean | N  |
|:----------|-----:|---:|
| Sourced   | 4.57 | 30 |
| Unsourced | 4.77 | 30 |

## Per-topic trust scores

| Topic              | Sourced | Unsourced | Winner |
|:-------------------|--------:|----------:|:-------|
| ai_adoption        |    4.33 |      5.00 | U      |
| cloud_computing    |    4.67 |      4.00 | S      |
| cybersecurity      |    5.00 |      5.00 | =      |
| digital_payments   |    4.33 |      4.67 | U      |
| ecommerce          |    4.67 |      5.00 | U      |
| remote_work        |    4.00 |      4.67 | U      |
| renewable_energy   |    4.67 |      5.00 | U      |
| saas_metrics       |    4.33 |      4.67 | U      |
| supply_chain       |    4.67 |      5.00 | U      |
| talent_retention   |    5.00 |      4.67 | S      |

Unsourced rated higher: 7 of 10 topics. Sourced higher: 2. Tied: 1.

## What the post claims, where it comes from

### Claim 1: "Citation count 55% higher in fabricated output."

`Sourced 8.80 vs Unsourced 13.60`. Computed by averaging
`signals.total_citations` across each condition's 30 outputs.
The 55% in the post is a rounded version of the 54.5% above.

### Claim 2: "Named entities 57% higher in fabricated output."

`Sourced 15.30 vs Unsourced 23.97`. From `signals.named_entities`.
Rounds to 57%.

### Claim 3: "Higher confidence-to-hedge ratio in fabricated output."

`Sourced 0.41 vs Unsourced 0.56`. From `signals.confidence_hedge_ratio`.
A +35.2% relative increase. The post does not quote this percentage,
only the direction.

### Claim 4: "Sourced output has more precise decimal numbers."

`Sourced 11.33 vs Unsourced 4.03`. From `signals.precise_numbers`.
The one signal that runs the other way. Real source data contains real
decimal-place statistics; unsourced output gravitates to round
percentages and large round monetary amounts.

### Claim 5: "An LLM evaluator scored unsourced output higher in 7 of 10 topics."

Direct from the per-topic table above. Gemini rated unsourced higher
in seven topics, sourced higher in two, and tied in one.

## How to recompute

```python
import json
with open("trust_signals_results.json") as f:
    d = json.load(f)

sourced = [o for o in d["outputs"] if o["condition"] == "sourced"]
unsourced = [o for o in d["outputs"] if o["condition"] == "unsourced"]

# Mean of any signal:
def mean(group, key):
    return sum(o["signals"][key] for o in group) / len(group)

print(mean(sourced, "total_citations"), mean(unsourced, "total_citations"))
# 8.8 13.6
```

## What this analysis does not address

- **Effect size.** The post does not report Hedges' g or a confidence
  interval for the signal differences. Sample size is 30 per condition;
  the differences are large in relative terms but not formally tested
  here. A reader replicating this experiment with bootstrap CI on the
  raw counts is welcome to publish results that include or kill those
  effect estimates.

- **Independence between signals.** Citation count and named-entity
  count partially overlap (a citation often contains a named entity).
  The patterns are reported separately because they are separately
  measurable, not because they are fully orthogonal.

- **Whether human readers respond the way Gemini does.** This is the
  central honest limit, named in the post and again in
  [`README.md`](./README.md). The signal counts are robust. The trust
  inversion measured by an LLM evaluator is corroborative but not
  independent of the mechanism the post is testing — RLHF-trained
  models share the bias the post argues humans have.
