The Standard
The AIRVS Standard v1.0.0
AI Recommendation Verification Standard — open, versioned, peer-reviewable. The standard defines how to measure and what the labels mean; each evaluator publishes their own decision rule.
Section 1 · What we evaluate
AIRVS evaluates a single object: an external AI-generated investment recommendation — one the evaluator did not write (L1). It produces four independent records that are never summed into a single score, because single scores invite gaming.
Section 2 · Six process axes (Pass / Fail)
Each axis is Pass or Fail, gated on evidence — a claim must cite its support to pass.
- 1Data Source
- PASS IFSources real, primary, and tier-classified — each claim cites where it came from.
- 2Reasoning Logic
- PASS IFThe argument holds from premise to conclusion.
- 3Counter Scenario
- PASS IF≥2 downside cases, each with a primary source and weighted probabilities.
- 4Timing
- PASS IFExplicit entry window and a standard horizon — not an open-ended bet.
- 5Accuracy / Hallucination
- PASS IFNo non-existent tickers, fabricated figures, or invented facts.
- 6Causal Chain
- PASS IFThe cited sources actually support the stated conclusion.
When the recommender is an AI (an LLM response), Annex A adds 11 AI-specific checks — model identity, prompt reproducibility, answer-distribution stability, RAG/search use, training cutoff, and non-existent-source verification.
Section 3 · Coherence & outcome
Macro / micro coherence is rated in three tiers — Sufficient, Partial, or Missing. Outcome is a time-series: return vs benchmark and drawdown at D+30, D+60, D+90, and D+180.
Section 4 · Verdict label (5 tiers)
The four dimensions combine into one verdict via the evaluator's pre-published, version-locked decision rule. The label vocabulary is standard; the mapping algorithm is implementer-defined.
Provisional at D+0, Confirmed at D+90. A per-recommendation record, not a reputation score.
Section 5 · Version & evolution
v1.0.0 is frozen and evolves under Semantic Versioning: MAJOR (breaking — axis count, Pass model, verdict tiers), MINOR (backward-compatible), and PATCH (wording / edge cases). Breaking changes ship only after external peer review. Verifications are version-locked; later versions never silently re-score prior records.
Read the governance & RFC policy →Section 6 · How to cite
Each version carries a DOI (Zenodo) and a machine-readable CITATION.cff.
How to cite AIRVS →