What makes a pun land?

Semantic surprise, comedic density, absurd imagery — scored 0–100 with a straight face.

Type a pun. Get judged.

try another bad one

What do the scores mean? → · sample puns are AI-generated

From your terminal

Judge puns programmatically. Because someone has to.

$ echo "Why do cows have hooves? Because they lactose." | \
    curl -s -d @- -H "Accept: text/plain" \
    https://u22a8.ai/m/u22a8.puns

Model card

Version: v1 · Status: ready

Traits

Phonological Proximity

Sound-bridge between meanings is recognizable and effortless ↔ Sound-bridge is strained, distant, or imperceptible

Measures the sound-bridge between the two activated meanings. The empirical Goldilocks zone (Hempelmann; Fleischhacker) sits just inside the threshold where target recovery is effortless but the connection still feels discovered rather than mechanical. High-scoring text has a sound link the listener catches without working — perfect homophones or close paronyms in natural use. Low-scoring text either requires cognitive stretching to bridge the sounds (strained substitution) or relies on a pivot word so identical to itself that no surprise registers.

Bisociation Strength

Two opposed meanings both genuinely activated on the same string ↔ One meaning dominates or the second meaning feels arbitrary

Measures the dual condition Raskin specifies for a pun to function: two semantic scripts must coexist on the same string, must oppose in a meaningful way (life/death, real/unreal, normal/abnormal, literal/figurative), and must both plausibly fit. High-scoring text activates two genuinely distinct frames of reference that collide at the pivot. Low-scoring text either fails to activate a second meaning, activates a second meaning that doesn't meaningfully oppose the first, or pairs meanings with no plausible co-existence on the text.

Distinctiveness

Each meaning anchored to different carrier elements ↔ One word carries the joke alone with no contextual support

Measures whether each of the two activated meanings is anchored to a different element of the carrier sentence — the property Kao, Levy & Goodman (Cognitive Science, 2015) identify as the strongest computational predictor of fine-grained funniness among puns. High-scoring text has clear contextual scaffolding for each interpretation, with different words supporting each. Low-scoring text leans on the pivot word to do all the work — either one meaning is asserted alone or both meanings hang off the same lexical anchor with no surrounding support.

Carrier Naturalness

Setup reads as ordinary speech, no telegraphing ↔ Setup announces wordplay is coming

Measures whether the setup reads as ordinary speech that would survive removal of the pun — what practitioners call the plausible-deniability principle. High-scoring text contains no signposting ("I heard a saying about…", "I'm reading a book about…", "There's a joke that goes…"); the audience doesn't know wordplay is coming until it lands. Low-scoring text telegraphs the joke through phrasing the listener recognizes as pun-setup template, putting them in wordplay-detection mode before the pivot.

Resolution Inevitability

Surprising on first read and inevitable in retrospect ↔ Either telegraphed or arbitrary, no aha resolution

Measures the retrospective sense that the punchline is both surprising on first read and, in retrospect, the only possible ending — what Koestler describes as the sudden emergence of a new synthesis through bisociation, and what comedians call inevitable surprise. High-scoring text produces a satisfying click of recognition where the second meaning feels earned by the setup. Low-scoring text either lands the pivot before the listener experiences any surprise (telegraphed) or lands a connection that feels arbitrary rather than discovered (the absurd-pathway groan of failure rather than the affectionate groan of success).

Economy

Irreducible — every word earns its place ↔ Bloated setup or over-explained pivot

Measures whether every word in the carrier sentence earns its place — what Lederer calls the greatest possible pressure per square syllable of language. High-scoring text is irreducible: cutting any single word diminishes the joke. Low-scoring text has bloated setup, restated premises, or explicit explanation of the wordplay after the pivot lands (the over-explained pun is the canonical failure mode of amateur punning).

About

Scores the quality of a pun against the dimensions that distinguish a laureate one-liner from a strained, telegraphed, or over-explained attempt.

A pun is a single linguistic pivot that activates two distinct meanings. Almost every adult can recognise that a string is a pun; very few can say why one pun lands and another flops. This model decomposes the "why" into six axes drawn from the cognitive theory of humor, the linguistic taxonomy of paronomasia, and the working theory of comedians who specialise in the form.

The failure mode this catches: prolific generators (LLMs, joke books, cracker manufacturers) producing wordplay that is technically correct but feels forced — the carrier sentence reads as a pun-shaped object rather than as natural language that happens to carry a second meaning. High score means the pivot was discovered in the language, not imposed on it.

What this model measures

Phonological proximity — the sound-bridge between the two activated meanings sits in the Goldilocks zone: recognizable on first hearing, not so identical that no surprise registers, not so distant that the connection feels stretched.
Bisociation strength — two opposed semantic scripts coexist on the same string and both plausibly fit. The collision is real, not decorative.
Distinctiveness — each of the two meanings is anchored to a different element of the carrier sentence; the pivot word doesn't do all the work alone.
Carrier naturalness — the setup reads as ordinary speech and contains no signposting ("I heard a saying about…", "I'm reading a book about…"); the audience doesn't know wordplay is coming until it lands.
Resolution inevitability — the punchline is both surprising on first read and, in retrospect, the only possible ending. The affectionate groan of success, not the dismissive groan of failure.
Economy — every word earns its place. Cutting any single word diminishes the joke; no over-explanation of the wordplay after the pivot lands.

Usage

Send the pun as the request body to /m/u22a8.puns. No special format. Punctuation and capitalisation matter — they shape the carrier rhythm and signal sentence boundaries the model uses to assess setup vs. pivot.

Single-sentence one-liners and short setup/punchline pairs are the model's home territory. Longer extended-pun routines (90-second Punniest-of-Show pieces) work but trade per-sample resolution for breadth.

Limitations

Text only. Delivery, timing, intonation, pause length, and audience read are core to live-performance punning and entirely invisible to the model. A Stewart Francis line scored cold reads differently from one heard live.
English-language pun conventions. The model is calibrated to English-language paronomasia and will not transfer cleanly to Japanese dajare, French calembour, or other languages whose phonological and orthographic systems open different pun pathways.
Visual and multimodal puns are out of scope. A homographic visual pun whose joke is the kerning gets no credit here.
Cultural-knowledge gating. Puns whose second meaning depends on recognising a specific reference (a 2017 football scoreline, a regional idiom, a current news figure) will be scored on the linguistic mechanics; whether the audience holds the prerequisite knowledge is a separate question the model can't answer.
Pure absurdism is not a pun. Surreal jokes without a phonological or polysemous pivot ("I went to a fancy-dress party as a chicken going to a fancy-dress party as a chicken") will score low here even when they are excellent in their own genre — pair with u22a8.humor for that signal.
Score reflects the carrier as written. A pun that depends on a written-vs-spoken disambiguation (e.g., a homograph that resolves visually but collapses when read aloud) is judged on the written form.

Theoretical foundation

The trait set is grounded in three converging research traditions:

Bisociation theory — Koestler, A. (1964). The Act of Creation. London: Hutchinson. The pivot connects two cognitive matrices; pun quality scales with how distinct the matrices are while still plausibly overlapping on the text.
Script-Based Semantic Theory of Humor (SSTH) — Raskin, V. (1985). Semantic Mechanisms of Humor. Dordrecht: Reidel. Two semantic scripts must coexist on the same text and oppose meaningfully.
General Theory of Verbal Humor (GTVH) — Attardo, S. & Raskin, V. (1991). "Script theory revis(it)ed: Joke similarity and joke representation model." HUMOR 4(3): 293–347. Extends SSTH with six Knowledge Resources (Script Opposition, Logical Mechanism, Situation, Target, Narrative Strategy, Language).
Computational pun quality — Kao, J. T., Levy, R., & Goodman, N. D. (2015). "A Computational Model of Linguistic Humor in Puns." Cognitive Science 40(5): 1270–1285. Empirically identifies ambiguity × distinctiveness as the predictors of fine-grained funniness within puns; distinctiveness — each meaning supported by different contextual elements — is the stronger signal.
Phonological similarity — Hempelmann, C. F. (2003). Paronomasic Puns: Target Recoverability Towards Automatic Generation. Purdue University dissertation. Establishes the negative correlation between phonological distance and perceived funniness, with a threshold effect: too distant strains, too identical telegraphs.
Taxonomy of pun types — Delabastita, D. (1996). "Introduction" in Wordplay and Translation (D. Delabastita, ed.). Manchester: St. Jerome. Distinguishes homophonic, homographic, homonymic, and paronymic puns along phonological and orthographic axes.

Practitioner-level theory draws on Carr, J. & Greeves, L. (2006). The Naked Jape: Uncovering the Hidden World of Jokes. London: Michael Joseph — particularly the editing discipline (each line rewritten through dozens of drafts) and the plausible deniability principle for setup construction.

Empirical calibration anchors include the Edinburgh Festival Fringe Joke of the Fringe shortlists 2009–2025 (jury-curated, audience-voted) and the O. Henry Pun-Off World Championships Punniest-of-Show finals. The SemEval-2017 Task 7 dataset (Miller, T., Hempelmann, C. F., & Gurevych, I., 2017, "SemEval-2017 Task 7: Detection and Interpretation of English Puns," Proceedings of SemEval-2017) provides the underlying pun/non-pun corpus with WordNet sense annotations.

Scores are approximate — not a substitute for human laughs.