Samples

A sample is a content item paired with a label that indicates whether it exemplifies the standard. Samples are the direct form of supervision and the common representation every other supervision form resolves to before training.

§1Definition

A sample consists of two parts:

  • Content — the item being labeled. For text models, a passage. For other media, the analogous unit.
  • Label — a verdict identifying the sample as a positive or a negative example of the trait. A sample may carry per-trait labels when a model has multiple traits.

A training set is a collection of samples used to produce a model. Training derives the trait axes and the breaks placed by calibration from the contrast between positive and negative samples.

Positive
"The function retries on 5xx responses up to three times with exponential backoff starting at 200 ms. 4xx responses are returned without retry."
"Deployments fail when the Postgres version on the replica is older than the primary. Upgrade the replica first, then the primary."
Negative
"The function handles errors gracefully and retries as needed."
"Make sure your environment is properly configured before deploying."
Figure 1. Positive and negative samples for a technical-writing trait. The positives name specific behaviors, values, and failure conditions; the negatives use generalities that do not commit to observable claims.

§2Mechanism

§2.1Label granularity

The simplest label is a binary verdict: positive or negative for the trait being trained. When a model has multiple traits, a sample may carry one label per trait, and a single item may be positive for one trait and negative for another. Continuous ratings (for example, a score on a Likert scale) are also accepted and treated as graded intensity on the trait axis; calibration uses whichever label resolution the training set provides.

§2.2Contrast is the training signal

A trait axis is fit from the difference between positive and negative samples, not from either set in isolation. A training set with only positive samples or only negative samples cannot train a trait. The minimum functional training set has at least one positive and one negative sample per trait; stable calibration requires enough samples on each side to produce quartile estimates that reflect a real distribution.

§3Properties of a usable training set

§3.1Coherence within each label

Positive samples are treated as instances of the same trait. If positives include items that exemplify the trait for different reasons — one because it is specific, another because it is concise, another because it is well-structured — training fits a trait axis that spans those reasons rather than isolating any one. The resulting trait is less selective than the labeler intended. The same holds for negatives.

§3.2Contrast between labels

Positive and negative samples should differ along the trait that is being labeled. If the most visible difference between the two groups is an unrelated attribute — length, topic, formality, source — training fits on that attribute rather than the intended trait. Coherent labeling on one axis at a time produces a trait that isolates that axis.

§3.3Size

More samples tighten the training distributions and produce more stable breaks. Below a handful of samples per side, quartile estimates become unstable and the breaks can collapse — the downstream effect is low confidence in the overlap region. Above a few dozen per side, further samples primarily sharpen the distribution's tails rather than changing where the breaks fall.

§4Edge cases

§4.1Ambiguous items

An item that the labeler cannot confidently call positive or negative is a source of noise when labeled arbitrarily. Such items are better excluded from the training set or labeled on a continuous scale so their intermediate status is represented rather than collapsed to one side.

§4.2Duplicates and near-duplicates

Identical or near-identical samples skew the distribution estimates toward the region they occupy. They do not reduce uncertainty in that region; they narrow the effective sample count. Deduplication before training is recommended.

§5Related concepts

  • Supervision — the umbrella of input forms; samples are one of them and the representation the others resolve to.
  • Briefs — the articulated form, useful when direct samples are not available.
  • Calibration — the step that places breaks from the sample distribution.
  • Trait discovery — the step that proposes traits from a set of samples when traits are not declared up front.
Scores are approximate — not a substitute for human judgment.