The Standard

The AIRVS Standard v1.0.0

AI Recommendation Verification Standard — open, versioned, peer-reviewable. The standard defines how to measure and what the labels mean; each evaluator publishes their own decision rule.

DOI: 10.5281/zenodo.20391984CC BY 4.0SemVer · L1 only

Full specification (v1.0.0) — download

core.md — the standard (four dimensions, six axes, verdict, SemVer, runtime)
tier-rulebook.md — source Tier 1/2/3 classification rulebook
annex-a-ai.md — 11 AI-specific assessment items

v1.1 / v1.2 (amendments · MINOR)

airvs-v1.1.md — continuous-strategy outcome mode, Annex A-F, and two result sheets: Template L (LLM) / Template F (AI-managed fund)
airvs-v1.2.md — §9 rebuttal simplified: pre-publication notice removed; post-publication rebuttal right strengthened (§9-R1–R4)

Canonical source: github.com/emceeKim/AI-RVS. Korean consolidated edition maintained in the project wiki. Amendments are backward-compatible and not retroactive — prior evaluations keep their version lock.

Section 1 · What we evaluate

AIRVS evaluates a single object: an external AI-generated investment recommendation — one the evaluator did not write (L1). It produces four independent records that are never summed into a single score, because single scores invite gaming.

Section 2 · Six process axes (Pass / Fail)

Each axis is Pass or Fail, gated on evidence — a claim must cite its support to pass.

1Data Source: PASS IFSources real, primary, and tier-classified — each claim cites where it came from.
2Reasoning Logic: PASS IFThe argument holds from premise to conclusion.
3Counter Scenario: PASS IF≥2 downside cases, each with a primary source and weighted probabilities.
4Timing: PASS IFExplicit entry window and a standard horizon — not an open-ended bet.
5Accuracy / Hallucination: PASS IFNo non-existent tickers, fabricated figures, or invented facts.
6Causal Chain: PASS IFThe cited sources actually support the stated conclusion.

When the recommender is an AI (an LLM response), Annex A adds 11 AI-specific checks — model identity, prompt reproducibility, answer-distribution stability, RAG/search use, training cutoff, and non-existent-source verification.

Section 3 · Coherence & outcome

Macro / micro coherence is rated in three tiers — Sufficient, Partial, or Missing. Outcome is a time-series: return vs benchmark and drawdown at D+30, D+60, D+90, and D+180.

Section 4 · Verdict label (5 tiers)

The four dimensions combine into one verdict via the evaluator's pre-published, version-locked decision rule. The label vocabulary is standard; the mapping algorithm is implementer-defined.

Trustworthy

Acceptable

Questionable

Unreliable

Hallucinated

Provisional at D+0, Confirmed at D+90. A per-recommendation record, not a reputation score.

Section 5 · Version & evolution

v1.0.0 is frozen and evolves under Semantic Versioning: MAJOR (breaking — axis count, Pass model, verdict tiers), MINOR (backward-compatible), and PATCH (wording / edge cases). Breaking changes ship only after external peer review. Verifications are version-locked; later versions never silently re-score prior records.

Read the governance & RFC policy →

Section 6 · How to cite

Each version carries a DOI (Zenodo) and a machine-readable CITATION.cff.

How to cite AIRVS →