Confidence — u22a8.ai

U⊨22A8

⊨ Models

Browse models Public preview catalog Compare models Side by side

⊨ Docs

Models What a model is Score card Anatomy of a score Traits Dimensions of quality Tiers Weak to Strong Confidence How reliable is this score

⊨ Integrate

REST API Main integration — HTTP/JSON

⊨ Research

qed-bench Benchmarks against task-appropriate baselines

punchlines

U22A8 · built by @onebit0fme · Terms · Privacy

Confidence

Confidence is a three-level signal — high, moderate, low — paired with every trait score. It reports how well the model can discriminate at the score's region of the score axis. It is a property of the training distributions, not of the content.

§1Definition

Every trait score is returned with a confidence level. Confidence expresses how reliably the model can distinguish content in the region where the score fell, based on how well the positive and negative training distributions separate there. Confidence is distinct from the score and from the tier; it describes the quality of the measurement, not the quality of the content.

Level	Distribution condition at the score's region
High	Score falls clearly within one of the two training distributions (positive or negative).
Moderate	Score falls between the two training distributions, in the gap where neither distribution is dense.
Low	The two training distributions overlap at the score's region — the model cannot discriminate reliably.

Table 1. The three confidence levels and the distribution condition each indicates.

§2Mechanism

§2.1Derivation

At training time, the trait's positive and negative samples produce two distributions on the 0–100 axis. At scoring time, confidence is computed from how those distributions behave near the score: cleanly separated, gapped, or overlapping. No external calibration is applied; the signal is entirely determined by the training data and the resulting breaks.

●●●

High The score falls inside one of the training distributions. The model has direct evidence for how to read content in this region.

●●○

Moderate The score falls in the gap between the two training distributions. The model has less direct evidence in this region; the score is directional but not as precise as a score inside a distribution.

●○○

Low The training distributions overlap at this region of the score axis. The model cannot reliably discriminate, so the tier label and headroom are withheld.

Figure 1. The three confidence levels, each corresponding to a distribution condition at the score's region of the axis.

§2.2Composite confidence

For a score card with multiple traits, the composite's confidence is the minimum per-trait confidence — the weakest trait's confidence becomes the composite's. A composite score inherits the reliability of its least reliable input. See composite.

§3Interpretation

§3.1Confidence and tier are derived from the same distributions

Tier and confidence both come from the same training distributions and therefore correlate. A trait's tier describes where on the axis the score sits; confidence describes how reliably the model can place a score at that position. The two signals answer different questions but are not independent — scores in different regions of the axis receive different confidence levels by construction:

Tier	Confidence	Why
Strong	High	Inside the positive distribution's upper tail.
Solid	High	Inside the positive distribution's inter-quartile range.
Developing	Moderate	In the gap between the two distributions.
Weak	High	Inside the negative distribution.
null	Low	Distributions overlap — no tier assigned.

Table 2. How tier and confidence pair for well-separated training distributions. Weak content carries high confidence for the same reason Strong content does: the model has direct training signal for content in that region.

§3.2Confidence is not a quality signal

A low-confidence score does not indicate that the content is borderline or problematic. It indicates that the model's training data does not cleanly separate at the score's region — a property of the model, not of the content. Two pieces of content with similar scores may carry different confidence levels if one score falls inside a dense training region and the other falls near a distributional overlap.

§4Edge cases

§4.1Low confidence withholds tier and headroom

Low confidence implies that the breaks in the score's region cannot be placed reliably. Because the tier label and headroom both depend on break positions, they are returned as null when confidence is low. The raw score is still returned.

§4.2Developing tier and moderate confidence are coupled

A Developing score is, by construction, in the gap between the training distributions. That region produces moderate confidence. The two signals therefore co-occur and carry complementary information: the tier reports the position, the confidence reports the gap.

§5Related concepts

Tiers — tier labels are withheld when confidence is low.
Breaks — low confidence corresponds to collapsed or unreliable breaks in the score's region.
Headroom — withheld under the same condition as the tier label.
Composite — composite confidence equals the minimum per-trait confidence.

Composite The single score across all traits →