A sample is a content item paired with a label that indicates whether it exemplifies the standard. Samples are the direct form of supervision and the common representation every other supervision form resolves to before training.
A sample consists of two parts:
A training set is a collection of samples used to produce a model. Training derives the trait axes and the breaks placed by calibration from the contrast between positive and negative samples.
The simplest label is a binary verdict: positive or negative for the trait being trained. When a model has multiple traits, a sample may carry one label per trait, and a single item may be positive for one trait and negative for another. Continuous ratings (for example, a score on a Likert scale) are also accepted and treated as graded intensity on the trait axis; calibration uses whichever label resolution the training set provides.
A trait axis is fit from the difference between positive and negative samples, not from either set in isolation. A training set with only positive samples or only negative samples cannot train a trait. The minimum functional training set has at least one positive and one negative sample per trait; stable calibration requires enough samples on each side to produce quartile estimates that reflect a real distribution.
Positive samples are treated as instances of the same trait. If positives include items that exemplify the trait for different reasons — one because it is specific, another because it is concise, another because it is well-structured — training fits a trait axis that spans those reasons rather than isolating any one. The resulting trait is less selective than the labeler intended. The same holds for negatives.
Positive and negative samples should differ along the trait that is being labeled. If the most visible difference between the two groups is an unrelated attribute — length, topic, formality, source — training fits on that attribute rather than the intended trait. Coherent labeling on one axis at a time produces a trait that isolates that axis.
More samples tighten the training distributions and produce more stable breaks. Below a handful of samples per side, quartile estimates become unstable and the breaks can collapse — the downstream effect is low confidence in the overlap region. Above a few dozen per side, further samples primarily sharpen the distribution's tails rather than changing where the breaks fall.
An item that the labeler cannot confidently call positive or negative is a source of noise when labeled arbitrarily. Such items are better excluded from the training set or labeled on a continuous scale so their intermediate status is represented rather than collapsed to one side.
Identical or near-identical samples skew the distribution estimates toward the region they occupy. They do not reduce uncertainty in that region; they narrow the effective sample count. Deduplication before training is recommended.