AIRVS — The Recommendation Provenance Standard  ·  v1.0.0  ·  DOI: 10.5281/zenodo.20391984

The Standard

The AIRVS Standard v1.0.0

AI Recommendation Verification Standard — open, versioned, peer-reviewable. The standard defines how to measure and what the labels mean; each evaluator publishes their own decision rule.

DOI: 10.5281/zenodo.20391984CC BY 4.0SemVer · L1 only

Section 1 · What we evaluate

AIRVS evaluates a single object: an external AI-generated investment recommendation — one the evaluator did not write (L1). It produces four independent records that are never summed into a single score, because single scores invite gaming.

Section 2 · Six process axes (Pass / Fail)

Each axis is Pass or Fail, gated on evidence — a claim must cite its support to pass.

1Data Source
PASS IFSources real, primary, and tier-classified — each claim cites where it came from.
2Reasoning Logic
PASS IFThe argument holds from premise to conclusion.
3Counter Scenario
PASS IF≥2 downside cases, each with a primary source and weighted probabilities.
4Timing
PASS IFExplicit entry window and a standard horizon — not an open-ended bet.
5Accuracy / Hallucination
PASS IFNo non-existent tickers, fabricated figures, or invented facts.
6Causal Chain
PASS IFThe cited sources actually support the stated conclusion.

When the recommender is an AI (an LLM response), Annex A adds 11 AI-specific checks — model identity, prompt reproducibility, answer-distribution stability, RAG/search use, training cutoff, and non-existent-source verification.

Section 3 · Coherence & outcome

Macro / micro coherence is rated in three tiers — Sufficient, Partial, or Missing. Outcome is a time-series: return vs benchmark and drawdown at D+30, D+60, D+90, and D+180.

Section 4 · Verdict label (5 tiers)

The four dimensions combine into one verdict via the evaluator's pre-published, version-locked decision rule. The label vocabulary is standard; the mapping algorithm is implementer-defined.

Trustworthy
Acceptable
Questionable
Unreliable
Hallucinated

Provisional at D+0, Confirmed at D+90. A per-recommendation record, not a reputation score.

Section 5 · Version & evolution

v1.0.0 is frozen and evolves under Semantic Versioning: MAJOR (breaking — axis count, Pass model, verdict tiers), MINOR (backward-compatible), and PATCH (wording / edge cases). Breaking changes ship only after external peer review. Verifications are version-locked; later versions never silently re-score prior records.

Read the governance & RFC policy →

Section 6 · How to cite

Each version carries a DOI (Zenodo) and a machine-readable CITATION.cff.

How to cite AIRVS →