# Conditions (2×2 factorial)

Four instructions, matched at approximately 20 words each, varying two
binary factors: specificity (present or absent) and quality demands
(present or absent). Each condition was run 10 times against a single
generator, for 40 outputs total. Output length was instructed to 500
words (actual range 368 to 443).

The prior experiment (EXP-025) compared a 22-word specific instruction
against a 2-word vague instruction ("Be specific"). That design
confounded specificity with instruction length. This 2×2 separates them.

## Scenario (fixed across all conditions)

The same scenario is described to the generator for every run: Northvane
Analytics, a fictional B2B SaaS logistics optimization company with
specific attributes (47 employees, $8M ARR, 40% YoY growth, proprietary
algorithm, five competitors, defined market). The full scenario text is
embedded in [`script.py`](./script.py) under the `SCENARIO` constant.

## The four instructions, verbatim

### SPEC_QUAL: Specific + Quality demands (20 words)

> Ground every point in Northvane's specific assets and current
> situation. Provide detailed and thorough analysis with clear reasoning
> throughout.

Both factors present.

### SPEC_ONLY: Specific only, no quality demands (20 words)

> Ground every point in Northvane's specific assets and current
> situation. Write your strategic recommendation for their next twelve
> months.

Specificity present; quality demands absent.

### QUAL_ONLY: Quality demands only, no specificity (19 words)

> Provide detailed and thorough analysis with clear reasoning
> throughout. Write your strategic recommendation for the company's next
> twelve months.

Quality demands present; specificity absent.

### BARE: Neutral filler, no specificity or quality demands (19 words)

> Write your strategic recommendation for the company's next twelve
> months. Cover the key strategic considerations and options for their
> situation.

Both factors absent. Length matches the other conditions via neutral
filler.

## Length verification

The script asserts every condition is within ±1 word of the target. The
assertion is in `script.py` around line 82. If the script ran to
completion (it did, see `data.json`), all four conditions passed the
length check.

## Scoring rubric

Each generated output is scored on six specificity markers:

| Marker | What it counts |
|---|---|
| `m1_company` | Named references to Northvane's stated assets |
| `m2_scenario` | Mentions of the scenario's fixed elements (growth rate, ARR, employee count) |
| `m3_market` | Market-context references (specific competitors, positioning, market size) |
| `m4_algorithm` | References to Northvane's proprietary-algorithm specifics |
| `m5_competitors` | Named competitors from the scenario |
| `m6_specific_recs` | Concrete, scenario-grounded recommendations vs. generic strategy |

Each marker returns an integer count. The `total` score sums them; the
`density_per_1kw` divides the total by output word count and normalizes
to 1,000 words. Density is the measurement that separates real
specificity from raw-score inflation caused by longer outputs.

The rubric implementation is in `script.py` under the scoring helpers.
A 10-output calibration sample with human labels is in
[`validation.json`](./validation.json).
