u22a8.ai·scoring·catalog
Models
A model scores content along a handful of traits — each one is its own standard, trained from labelled examples. Different models measure different things.
20 models · search by name or trait — or paste content to rerank by what fits it best
-
answer relevancy
u22a8.answer-relevancyMeasures whether a response actually addresses the question that was asked. The RAGAS answer_relevancy metric reimagined as a learned model — captures topic alignment, completeness of address, and absence of tangential content. An answer can be faithful to context yet irrelevant if it talks about the wrong thing. This model catches that failure mode.
Completeness of AddressDirectness of AnswerFocusIntent MatchTopic Alignment -
changelog entry
u22a8.changelog-entryMeasures: Actionability, Motivation, Scoping, Specificity, User Impact Focus
ActionabilityMotivationScopingSpecificityUser Impact Focus -
cold outreach opener
u22a8.cold-outreach-openerScores the quality of a cold outreach opener — whether it demonstrates genuine research on the recipient, offers a specific reason for reaching out, and earns the right to a reply without resorting to templates or manipulation.
Ask ClarityAuthenticityBrevityRelevance BridgeResearch Signal -
commit message
u22a8.commit-messageMeasures: Actionable Summary, Context Sufficiency, Intent Clarity, Scope Precision, Signal Density
Actionable SummaryContext SufficiencyIntent ClarityScope PrecisionSignal Density -
compelling readme
u22a8.compelling-readmeMeasures: Concrete Usage Demonstration, Copy-Pasteable Setup, Hook Speed, Hype-Free Credibility, Problem Framing, Progressive Disclosure, Structural Scannability, Value Proposition Clarity
Concrete Usage DemonstrationCopy-Pasteable SetupHook SpeedHype-Free CredibilityProblem FramingProgressive DisclosureStructural ScannabilityValue Proposition Clarity -
conciseness
u22a8.concisenessMeasures whether text communicates efficiently without unnecessary padding. Targets the specific verbosity patterns that plague LLM output: preambles, question-restating, hedging, meta-commentary, filler transitions, and redundant restatement. Based on Phoenix/Arize conciseness evaluator and ConCISE (2025) framework — information density over raw word count.
Hedge AbsenceInformation DensityPreamble AbsenceRepetition AbsenceStructural Efficiency -
crisis comms
u22a8.crisis-commsScores the quality of crisis communication — whether it takes ownership, scopes the impact honestly, and provides clear next steps, rather than deflecting or hiding behind corporate platitudes.
Next Steps ClarityOwnershipPlatitude AbsenceScope HonestyUpdate Cadence Commitment -
customer support response
u22a8.customer-support-responseMeasures: Empathy & Acknowledgment, Expectation Setting, Personalization, Resolution Specificity, Tone Calibration
Empathy & AcknowledgmentExpectation SettingPersonalizationResolution SpecificityTone Calibration -
developer landing page
u22a8.developer-landing-pageMeasures: Developer Voice, Honest Scope, Path Clarity, Show Don't Tell, Technical Credibility, Time to Understanding, Zero Friction Try
Developer VoiceHonest ScopePath ClarityShow Don't TellTechnical CredibilityTime to UnderstandingZero Friction Try -
faithfulness
u22a8.faithfulnessMeasures whether every claim in a response is supported by the provided context. The RAGAS faithfulness metric as a learned model — detects when LLMs hallucinate beyond their source material, add unsupported details, or confabulate facts not present in the retrieved documents. Operates at the claim level: a response with 5 claims where 1 is unsupported should score lower than one where all are grounded.
Claim SupportHedging CalibrationInference ValidityScope Respect -
humor
u22a8.humorMeasures whether text is genuinely funny — not just attempting humor, but landing it. Evaluates the mechanics that make comedy work: surprise, economy, specificity, and structural craft. Replaces LLM-as-judge humor scoring (Braintrust autoevals, Phoenix) with a learned model that captures what separates a laugh from a groan.
Comic EconomyIncongruity & SurpriseComedic OriginalitySpecificity of ReferenceTonal Control -
peer congratulation
u22a8.peer-congratulationScores the quality of a peer congratulation message — whether it's specific and warm enough to land as genuine, rather than reading as a perfunctory LinkedIn reflex.
Forward-LookingPersonal ConnectionSpecificityWarmth Without Excess -
postmortem ref
u22a8.postmortem-refMeasures: Blamelessness, Impact Transparency, Remediation Commitment, Root Cause Depth, Timeline Specificity
BlamelessnessImpact TransparencyRemediation CommitmentRoot Cause DepthTimeline Specificity -
prospect research note
u22a8.prospect-research-noteScores the quality of a prospect research note — whether it surfaces signal-bearing observations that inform outreach, rather than restating generic profile information.
Observation DepthOpen QuestionsOutreach UtilitySignal Over Noise -
puns
u22a8.punsMeasures pun quality along the dimensions that distinguish a Jimmy Carr line from a strained, telegraphed, or over-explained attempt. Built on the bisociation theory of humor (Koestler), the Script-Based Semantic Theory of Humor (Raskin / Attardo), and the empirical finding from Kao, Levy & Goodman that distinctiveness — each meaning anchored to different parts of the carrier sentence — separates great puns from merely-ambiguous ones. Operates at the carrier-sentence level: a pun where every word earns its place, the sound-bridge is recognizable but not identical, and the resolution feels both surprising and retrospectively inevitable should score higher than one whose setup announces the joke or whose punchline restates rather than transforms.
Bisociation StrengthCarrier NaturalnessDistinctivenessEconomyPhonological ProximityResolution Inevitability -
rag anchored
u22a8.rag-anchoredMeasures whether a response is grounded in retrieved context versus floating on model knowledge. Distinct from faithfulness (claim-level accuracy) — this is about style and posture. A grounded response reads like it was written by someone who just read the sources; an unanchored response reads like a model generating from training data with context as decoration. Targets the "context was retrieved but ignored" failure mode in RAG systems.
Context CoverageContext-Specific DetailContext Vocabulary UptakeSource Attribution PostureSource Engagement -
retention message
u22a8.retention-messageMeasures: Commitment Specificity, Dignity Preservation, Offer Relevance, Ownership of Failure, Value Reaffirmation
Commitment SpecificityDignity PreservationOffer RelevanceOwnership of FailureValue Reaffirmation -
specificity
u22a8.specificityMeasures how concrete and specific text is versus generic LLM-style prose. The core signal that separates human-distinctive writing from AI-generated filler: proper nouns, numbers, dates, named examples, particular behavioral details. Replaces vibe-check "does this sound like AI?" with a learned model that captures the linguistic markers of specificity across domains.
Concrete Reference DensityExample ConcretenessParticular DetailQuantificationVoice Distinctiveness -
sycophancy
u22a8.sycophancyDetects sycophantic behavior in AI-generated text — gratuitous validation, opinion-matching, and performative helpfulness that prioritizes pleasing the user over being truthful or direct. Based on Sharma et al. 2023 (Anthropic/ICLR 2024) taxonomy of sycophancy types. Replaces prompted judges with a learned detector that catches the subtle patterns RLHF trains into language models.
Willingness to CorrectDirectness of CommunicationGratuitous ValidationOpinion IndependenceProportional Enthusiasm -
technical writing
u22a8.technical-writingMeasures: Actionable Takeaways, Grounded Motivation, Honest Specificity, Incremental Complexity, Narrative Throughline, Progressive Concreteness
Actionable TakeawaysGrounded MotivationHonest SpecificityIncremental ComplexityNarrative ThroughlineProgressive Concreteness