u22a8.answer-relevancy

answer relevancy

Measures whether a response actually addresses the question that was asked. The RAGAS answer_relevancy metric reimagined as a learned model — captures topic alignment, completeness of address, and absence of tangential content. An answer can be faithful to context yet irrelevant if it talks about the wrong thing. This model catches that failure mode.

Score content

Text URL
compare against another → Ctrl+Enter
Model card

Version: v1 · Status: ready

Traits

Topic Alignment

Directly engages with the exact topic and entities in the question ↔ Drifts to adjacent topics or answers a different question than asked

Measures whether the response is about the same subject as the question. High-scoring text directly engages with the topic, entities, and concepts raised in the question. Low-scoring text drifts to adjacent but different topics, addresses a related question instead of the actual one, or provides information that would be relevant to a different query.

Completeness of Address

Addresses all parts and sub-questions raised in the input ↔ Cherry-picks easy parts, ignores harder aspects or sub-questions

Measures whether the response addresses all parts of the question, especially when the question contains multiple sub-questions or compound asks. High-scoring text covers each aspect raised. Low-scoring text cherry-picks the easiest part of the question while ignoring harder aspects, or provides a partial answer that leaves significant components unaddressed.

Directness of Answer

Provides a clear, committed answer that resolves the question's intent ↔ Offers related context without committing to an actual answer

Measures whether the response provides a clear answer rather than just related information. High-scoring text gives a definitive response that satisfies the question's intent. Low-scoring text provides tangentially related context, background, or "it depends" hedging without ever committing to an answer that resolves the question.

Focus

Everything included serves answering the question, no tangential padding ↔ Padded with background, caveats, or related information not asked for

Measures the proportion of the response that's relevant to the question versus tangential content. High-scoring text stays focused — everything included serves answering the question. Low-scoring text pads with background the user didn't ask for, unsolicited caveats, related-but-unrequested information, or general context that doesn't advance the answer.

Intent Match

Response shape matches question intent — steps for how, comparison for which ↔ Wrong response shape — essay for yes/no, definition for procedural question

Measures whether the response matches the question's intent (informational, procedural, comparative, yes/no) with the appropriate response type. High-scoring text recognizes a "how do I" question needs steps, a "which is better" needs comparison, a "what is" needs definition. Low-scoring text gives the wrong shape of answer — e.g., an essay when a yes/no was asked, or a definition when a procedure was needed.

About

u22a8.answer-relevancy

Measures whether a response actually addresses the question that was asked.

The RAGAS answer_relevancy metric uses synthetic question generation and cosine similarity — clever but indirect. This model learns the signal directly: does the response align with the question's topic, cover all its parts, provide a committed answer, stay focused, and match the question's intent shape? An answer that's faithful to context but talks about the wrong thing still fails the user. This model catches that.

Five traits capture what "relevant" means in practice: topic alignment (same subject?), completeness of address (all parts covered?), directness (actual answer vs. related context?), focus (proportion of relevant content?), and intent match (right response shape for the question type?).

Limitations

Doesn't assess factual correctness — a relevant but wrong answer scores well here. Questions with ambiguous intent may be legitimately addressable in multiple ways. Exploratory or clarifying responses ("did you mean X or Y?") may score lower on directness despite being appropriate. Best used alongside faithfulness and correctness checks.

Pairs well with

  • u22a8.faithfulness — relevancy + faithfulness covers the "right topic, right facts" requirement
  • u22a8.conciseness — focus and conciseness share the "no tangential padding" signal
  • u22a8.rag-anchored — together they cover the full RAG quality triad

Docs

  • Tiers and scoring — the per-trait trained boundaries between tiers
  • Breaks — where meaningful quality transitions occur

From your terminal

$ curl -s -d "your content here" \ https://u22a8.ai/m/u22a8.answer-relevancy
A signal, not a verdict.