u22a8.rag-anchored

rag anchored

Measures whether a response is grounded in retrieved context versus floating on model knowledge. Distinct from faithfulness (claim-level accuracy) — this is about style and posture. A grounded response reads like it was written by someone who just read the sources; an unanchored response reads like a model generating from training data with context as decoration. Targets the "context was retrieved but ignored" failure mode in RAG systems.

Score content

Text URL
compare against another → Ctrl+Enter
Model card

Version: v1 · Status: ready

Traits

Source Engagement

Actively references, builds on, or synthesizes from source material ↔ Could have been written without seeing the context — generic on-topic prose

Measures whether the response actively engages with the provided context — referencing specific passages, building on particular details, or synthesizing across source sections. High-scoring text reads like it was composed while looking at the sources. Low-scoring text could have been written without ever seeing the context — generic knowledge-based prose that happens to be on-topic.

Context Vocabulary Uptake

Uses terminology, names, and framings from the source material ↔ Substitutes generic synonyms, ignores source-specific vocabulary

Measures whether the response uses terminology, phrasing, and entities from the provided context rather than substituting generic synonyms. High-scoring text adopts the source's vocabulary — using the same names, terms, and framings. Low-scoring text paraphrases everything into generic language, replacing specific context terms with vaguer alternatives.

Context-Specific Detail

Surfaces non-obvious details found only in the provided context ↔ States only general domain knowledge available without the context

Measures whether the response includes details that could only come from the provided context rather than general knowledge. High-scoring text surfaces non-obvious information from the sources — specific configurations, edge cases, or nuances found only there. Low-scoring text sticks to general truths about the topic that anyone familiar with the domain would know without the context.

Source Attribution Posture

Positions itself as reporting from sources, signals provenance ↔ Presents context-derived information as authoritative model knowledge

Measures whether the response's framing signals awareness of its sources — "according to the documentation", "the config shows", "based on the retrieved data" — versus presenting everything as authoritative model knowledge. High-scoring text positions itself as reporting from sources. Low-scoring text positions itself as an authority generating knowledge, even when that knowledge came from context.

Context Coverage

Synthesizes across multiple relevant context passages ↔ Latches onto one fragment, ignores other relevant retrieved material

Measures whether the response uses the breadth of provided context rather than cherry-picking one fragment and ignoring the rest. High-scoring text synthesizes across multiple context passages or sections when relevant. Low-scoring text latches onto one sentence or chunk and ignores other retrieved material that would enrich or qualify the answer.

About

u22a8.rag-anchored

Measures whether a response is grounded in retrieved context versus floating on model knowledge.

Distinct from faithfulness (which checks claim-level accuracy), this model captures the posture and style of a grounded response. Does it read like someone who just read the sources and is reporting from them? Or does it read like a model generating from training data, with retrieved context as window dressing? This targets the "context was retrieved but ignored" failure mode common in RAG systems.

Five traits: source engagement (actively references and builds on source material), context vocabulary uptake (uses the source's terminology rather than generic synonyms), context-specific detail (surfaces non-obvious information from context), source attribution posture (signals provenance rather than presenting as authoritative), and context coverage (synthesizes across retrieved material rather than cherry-picking one fragment).

Limitations

Some response styles legitimately synthesize context into a natural voice without explicit attribution — this isn't necessarily a quality failure. The model may penalize well-integrated responses that fully absorb context but present it conversationally. Works best when the evaluation use case values traceability and transparency over seamless prose.

Pairs well with

  • u22a8.faithfulness — claim-level accuracy + style-level groundedness
  • u22a8.answer-relevancy — together they form the full RAG evaluation triad
  • u22a8.specificity — context-grounded responses tend to be specific

Docs

  • Tiers and scoring — the per-trait trained boundaries between tiers
  • Breaks — where meaningful quality transitions occur

From your terminal

$ curl -s -d "your content here" \ https://u22a8.ai/m/u22a8.rag-anchored
A signal, not a verdict.