Semantic surprise, comedic density, absurd imagery — scored 0–100 with a straight face.
What do the scores mean? → · sample puns are AI-generated
Judge puns programmatically. Because someone has to.
$ echo "Why do cows have hooves? Because they lactose." | \
curl -s -d @- -H "Accept: text/plain" \
https://u22a8.ai/m/u22a8.puns
Version: v1 · Status: ready
Sound-bridge between meanings is recognizable and effortless ↔ Sound-bridge is strained, distant, or imperceptible
Measures the sound-bridge between the two activated meanings. The empirical Goldilocks zone (Hempelmann; Fleischhacker) sits just inside the threshold where target recovery is effortless but the connection still feels discovered rather than mechanical. High-scoring text has a sound link the listener catches without working — perfect homophones or close paronyms in natural use. Low-scoring text either requires cognitive stretching to bridge the sounds (strained substitution) or relies on a pivot word so identical to itself that no surprise registers.
Two opposed meanings both genuinely activated on the same string ↔ One meaning dominates or the second meaning feels arbitrary
Measures the dual condition Raskin specifies for a pun to function: two semantic scripts must coexist on the same string, must oppose in a meaningful way (life/death, real/unreal, normal/abnormal, literal/figurative), and must both plausibly fit. High-scoring text activates two genuinely distinct frames of reference that collide at the pivot. Low-scoring text either fails to activate a second meaning, activates a second meaning that doesn't meaningfully oppose the first, or pairs meanings with no plausible co-existence on the text.
Each meaning anchored to different carrier elements ↔ One word carries the joke alone with no contextual support
Measures whether each of the two activated meanings is anchored to a different element of the carrier sentence — the property Kao, Levy & Goodman (Cognitive Science, 2015) identify as the strongest computational predictor of fine-grained funniness among puns. High-scoring text has clear contextual scaffolding for each interpretation, with different words supporting each. Low-scoring text leans on the pivot word to do all the work — either one meaning is asserted alone or both meanings hang off the same lexical anchor with no surrounding support.
Setup reads as ordinary speech, no telegraphing ↔ Setup announces wordplay is coming
Measures whether the setup reads as ordinary speech that would survive removal of the pun — what practitioners call the plausible-deniability principle. High-scoring text contains no signposting ("I heard a saying about…", "I'm reading a book about…", "There's a joke that goes…"); the audience doesn't know wordplay is coming until it lands. Low-scoring text telegraphs the joke through phrasing the listener recognizes as pun-setup template, putting them in wordplay-detection mode before the pivot.
Surprising on first read and inevitable in retrospect ↔ Either telegraphed or arbitrary, no aha resolution
Measures the retrospective sense that the punchline is both surprising on first read and, in retrospect, the only possible ending — what Koestler describes as the sudden emergence of a new synthesis through bisociation, and what comedians call inevitable surprise. High-scoring text produces a satisfying click of recognition where the second meaning feels earned by the setup. Low-scoring text either lands the pivot before the listener experiences any surprise (telegraphed) or lands a connection that feels arbitrary rather than discovered (the absurd-pathway groan of failure rather than the affectionate groan of success).
Irreducible — every word earns its place ↔ Bloated setup or over-explained pivot
Measures whether every word in the carrier sentence earns its place — what Lederer calls the greatest possible pressure per square syllable of language. High-scoring text is irreducible: cutting any single word diminishes the joke. Low-scoring text has bloated setup, restated premises, or explicit explanation of the wordplay after the pivot lands (the over-explained pun is the canonical failure mode of amateur punning).
Scores the quality of a pun against the dimensions that distinguish a laureate one-liner from a strained, telegraphed, or over-explained attempt.
A pun is a single linguistic pivot that activates two distinct meanings. Almost every adult can recognise that a string is a pun; very few can say why one pun lands and another flops. This model decomposes the "why" into six axes drawn from the cognitive theory of humor, the linguistic taxonomy of paronomasia, and the working theory of comedians who specialise in the form.
The failure mode this catches: prolific generators (LLMs, joke books, cracker manufacturers) producing wordplay that is technically correct but feels forced — the carrier sentence reads as a pun-shaped object rather than as natural language that happens to carry a second meaning. High score means the pivot was discovered in the language, not imposed on it.
Send the pun as the request body to /m/u22a8.puns. No special format.
Punctuation and capitalisation matter — they shape the carrier rhythm
and signal sentence boundaries the model uses to assess setup vs.
pivot.
Single-sentence one-liners and short setup/punchline pairs are the model's home territory. Longer extended-pun routines (90-second Punniest-of-Show pieces) work but trade per-sample resolution for breadth.
u22a8.humor for that signal.The trait set is grounded in three converging research traditions:
Practitioner-level theory draws on Carr, J. & Greeves, L. (2006). The Naked Jape: Uncovering the Hidden World of Jokes. London: Michael Joseph — particularly the editing discipline (each line rewritten through dozens of drafts) and the plausible deniability principle for setup construction.
Empirical calibration anchors include the Edinburgh Festival Fringe Joke of the Fringe shortlists 2009–2025 (jury-curated, audience-voted) and the O. Henry Pun-Off World Championships Punniest-of-Show finals. The SemEval-2017 Task 7 dataset (Miller, T., Hempelmann, C. F., & Gurevych, I., 2017, "SemEval-2017 Task 7: Detection and Interpretation of English Puns," Proceedings of SemEval-2017) provides the underlying pun/non-pun corpus with WordNet sense annotations.