U⊨22A8
new
models
Use ⊨ in your editor
Claude Code plugin Score and iteratively improve content until traits converge.
$ claude mcp add \ --transport http \ u22a8 https://u22a8.ai/mcp
mcp.json — any MCP client
{ "mcpServers": { "u22a8": { "type": "http", "url": "https://u22a8.ai/mcp" } }}
Understanding scores
Models What a model is and why it matters. Score card Anatomy of a score — what every element means. Traits Independent dimensions, each scored separately. Tiers Weak, Developing, Solid, Strong. Confidence How reliable is this score.
Integrate
REST API The main integration. HTTP, JSON, one endpoint. MCP Any MCP client, scored from the editor. GitHub Action Score docs on every pull request.
research
punchlines
U⊨22A8
⊨ Models
Browse models Public preview catalog Compare models Side by side
⊨ Plugins
Claude Code plugin Score & improve from your editor view all plugins →
⊨ Docs
Models What a model is Score card Anatomy of a score Traits Dimensions of quality Tiers Weak to Strong Confidence How reliable is this score
⊨ Integrate
REST API Main integration — HTTP/JSON MCP Connect any MCP client GitHub Action Score docs on every PR
⊨ Research
qed-bench Benchmarks against task-appropriate baselines
punchlines
U⊨22A8 · built by @onebit0fme · Terms · Privacy

Research

Benchmarks, methodology, and the raw artifacts behind them. We publish work here when the comparisons are reproducible end-to-end and the failure modes are stateable.

  • Benchmarks May 5, 2026

    qed-bench: benchmarking small scoring models against task-appropriate baselines

    We trained scoring models on four content-judgment tasks — holistic essay quality, SMS spam, AI-vs-human authorship, and LLM authorship attribution — and compared each one to its task-appropriate baseline: trained human raters, gold labels, or an eight-model LLM-as-judge panel. Notebooks, models, and per-judge artifacts at github.com/u22a8/qed-bench.

    Read →
← back to the landing
U⊨22A8 · built by @onebit0fme · Terms · Privacy