Calibration — u22a8.ai

U⊨22A8

⊨ Models

Browse models Public preview catalog Compare models Side by side

⊨ Docs

Models What a model is Score card Anatomy of a score Traits Dimensions of quality Tiers Weak to Strong Confidence How reliable is this score

⊨ Integrate

REST API Main integration — HTTP/JSON

⊨ Research

qed-bench Benchmarks against task-appropriate baselines

punchlines

U22A8 · built by @onebit0fme · Terms · Privacy

Calibration

Calibration is the training step that derives each trait's three breaks from the trait's trained sample distributions. Each trait is calibrated independently against its own distributions — there is no global scale.

§1Definition

Calibration takes the scored positive and negative samples for a trait and computes the three break values that partition the 0–100 axis into four tiers. The break values are the quartile-derived thresholds defined by the breaks specification. Calibration runs once per trait, per training run, and produces the break values stored on the trained model.

§2Mechanism

§2.1Score the training samples

The trait's scoring parameters, fit earlier in the training run, are applied to the positive and negative training samples. Each sample receives a score on the 0–100 axis. The collection of scored positive samples forms the positive training distribution; the collection of scored negative samples forms the negative training distribution.

Figure 1. Training samples scored on the 0–100 axis for one trait, with the three break values derived from their distributions. The same procedure runs independently for every trait in the model.

§2.2Compute break values from quartiles

The three breaks are computed from the positive and negative distributions as defined in the breaks specification:

developing = 75th percentile of the negative training distribution
solid = 25th percentile of the positive training distribution, floored to developing
strong = 75th percentile of the positive training distribution, floored to solid

The floor operations enforce monotonicity: developing ≤ solid ≤ strong. When the positive and negative distributions overlap, floors cause adjacent breaks to collapse. The collapse propagates to low confidence and null tier assignments in the overlap region.

§3Per-trait calibration

Each trait in a model has its own positive and negative distributions and is calibrated from those distributions alone. A score of 65 may correspond to the solid break on one trait and the developing break on another because each trait's breaks are derived from a different distribution of training samples. Calibration does not impose a shared scale across traits.

§4Edge cases

§4.1Small sample distributions

Quartile estimates from very few samples are unstable. A trait trained on a handful of samples per side produces quartiles that depend heavily on the specific samples chosen, and the derived breaks may shift substantially when a new sample is added. Stability improves as the per-side sample count grows; the effect on scoring is captured by confidence, which drops in regions where the distributions fail to separate reliably.

§4.2Overlapping distributions

When the positive and negative distributions overlap on the score axis, the monotonicity floor causes one or more breaks to collapse. The collapsed band has zero width — no score can fall within it — and scores in that region are returned with a null tier label and low confidence. Calibration does not attempt to resolve overlap; the overlap is surfaced as reduced confidence rather than hidden.

§5Related concepts

Breaks — the break values calibration produces.
Samples — the training input calibration scores to produce distributions.
Confidence — the signal that reports when the calibrated distributions fail to separate.

Evolution Retraining behavior as the supervision set changes →