360 likes | 959 Views
Computational Approach To Recognizing Wordplay In Jokes Julia M. Taylor & Lawrence J. Mazlack Applied Artificial Intelligence Laboratory University of Cincinnati Introduction Computational recognition of humor is difficult Requires natural language understanding World knowledge
E N D
Computational Approach To Recognizing Wordplay In Jokes Julia M. Taylor & Lawrence J. Mazlack Applied Artificial Intelligence Laboratory University of Cincinnati
Introduction • Computational recognition of humor is difficult • Requires natural language understanding • World knowledge • This is an initial investigation into computational humor recognition using wordplay • Learns statistical patterns of text • Recognizes utterances similar in pronunciation to a given word • Determines if found utterances transform a text into a joke
Subclass of Humor: Joke • A joke is a short humorous piece of literature in which the funniness culminates in the final sentence Most jokes have: • Setup – the first part of the joke which establishes certain expectations • Punchline – much shorter part of joke, causes some form of conflict • Force another interpretation • Violate expectation “Is the doctor home?” the patient asked in his bronchial whisper. “No,” the doctor’s young and pretty wife whispered in reply.“Come right in.”
Computational Humor • Computation humor generators, examples: • Light bulb joke generator • Joke generator that focuses on witticisms based around idioms • Generator of humorous parodies of existing acronyms • Generator of a humorous sentence based on alphanumeric password • Pun generators • Computational humor recognizer
Wordplay Jokes • Depend on words that are similar in sound but have different meaning • Same pronunciation, same spelling • Same pronunciation, different spelling • Similar pronunciation, different spelling • The difference in meaning creates conflict or breaks prediction Nurse: I need to get your weight today. Impatient patient: Three hours and twelve minutes. weight=wait
Statistical Language Recognition N-grams • Model that uses conditional probability to predict Nth word based on N-1 previous words. • Probabilities depend on the training corpus Find awordwith largest P(word|“is”) A newspaper reporter goes around the world with his investigation. He stops people on the street and asks them: “Excuse me, whatis youropinion of the meat shortage?” An American asks: “Whatis ‘shortage’?”A Russian asks: “Whatis ‘opinion’?” A Pole asks: “Whatis ‘meat’?” A New York taxi-driver asks: “Whatis ‘excuseme’?” The Parisian Little Moritzis askedin school: “How many deciliters are there in a liter of milk?” He replies: “One deciliter of milk and nine deciliters of water.” – In France, thisis agood joke; in Hungary, thisis agood milk.
Possible Methods for Joke Recognition • Determine if a given text is a joke • Given a joke, determine the punchline location Hotel clerk: Welcome to our hotel Max: Thank you. Can you tell me what room I’m in? Clerk: The lobby
Restricted Domain: Knock Knock Jokes • Line1: “Knock, Knock” • Line2: “Who’s there?” • Line3: any phrase • Line4: Line3 followed by “who?” • Line5: One or several sentences containing one of the following • Type1: Line3 • Type2: A wordplay on Line3 • Type3: A meaningful response to a wordplay of Line4
Restricted Domain: Knock Knock Jokes • Type1: Line3 --Knock, Knock --Who’s there? --Water --Water who? --Water you doing tonight? • Type2: A wordplay on Line3 --Knock, Knock --Who’s there? --Ashley --Ashley who? --Actually, I don’t know. • Type3: A meaningful response to a wordplayof Line4 --Knock, Knock --Who’s there? --Tank --Tank who? --You are welcome.
Script-based Semantic Theory of Humor • The text is compatible with 2 different scripts • The 2 scripts are opposite • Knock, Knock • Who’s there? • Water • Water who? • Water you doing tonight? Scripts overlap in phonetics representation of water and what are Scripts differ in meaning
Experimental Design • Definitions: • Wordplay: a word that sounds similar but has a different spelling (and meaning) What are is a wordplay on water • Keyword: what wordplay is based on (Line3) Wateris a keyword • Recognize only Type1 jokes
Experimental Design --Knock, Knock --Who’s there? --Water --Water who? --Water you doing tonight? • Step1: joke format validation • Step2: computational generation of sound-alike sequences • Step3: validations of meaning of a chosen sound-alike sequence • Step4: last sentence validation with sound-alike sequence
Experimental Design • Training set: • 66 Knock Knock jokes • Enhance similarity table of letters • Select N-gram training texts • 66 texts containing wordplay from 66 training jokes • Test set: • 130 Knock Knock jokes • 66 Non-jokes that have similar structure to Knock Knock jokes Water is cold.
Experimental Design • Similarity Table • Contains combination of letters that sound similar • Based on similarity table of cross-referenced English consonant pairs • Modified by: • translating phonemes to letters • adding vowels that are close in sound • adding other combinations of letters that may be used to recognize wordplay Segment of similarity table
Experimental Design • Training texts were entered into N-gram database • Wordplay validation: bigram table • Pairs of words from training texts with count of their occurrences (training texts 1) (texts were 1) (were entered 1) (entered into 1)… • Punchline validation: trigram table • Three words in a row from training texts with count of their occurrences (training texts were 1) (texts were entered 1) (were entered into 1)
Step 1: Joke Format Validation • Line1: “Knock, Knock” • Line2: “Who’s there?” • Line3: any phrase • Line4: Line3 followed by “who?” • Line5: One or several sentences containing Line3 • Knock, Knock • Who’s there? • Water • Water who? • Water you doing tonight? • Knock, Knock • Who’s there? • Water • Water who? • What are you doing tonight?
Step 2: Generation of Wordplay Sequences • Repetitive letter replacements of Line3 • Similarity used for letter replacements • Resulting utterances are ordered according to their similarity with Line3 • Utterances with highest similarity are checked for decomposition into several words • Words have to be in Webster's Second International (234,936 words) Segment of similarity table
Step 2: Generation of Wordplay Sequences water 5.0 mater 4.44 rater 4.42 whater 4.23
Step 2: Generation of Wordplay Sequences water 5.0 mater 4.44 rater 4.42 whater 4.23 weter 4.23 woter 4.23 wader 4.39
Step 2: Generation of Wordplay Sequences water 5.0 mater 4.44 rater 4.42 wader 4.39 weter 4.23 woter 4.23 whater 4.23 wather 4.32 wazer 4.17
Step 2 :Generation of Wordplay Sequences water 5.0 mater 4.44 rater 4.42 wader 4.39 weter 4.23 woter 4.23 whater 4.23 wather 4.32 watar 4.23 watir 4.23 wator 4.23 watem 4.44 wazer 4.17
Step 2: Generation of Wordplay Sequences water 5.0 mater 4.44 rater 4.42 wader 4.39 weter 4.23 woter 4.23 watem 4.44 wather 4.32 watar 4.23 watir 4.23 wator 4.23 whater 4.23 wazer 4.17
Step 2: Generation of Wordplay Sequences water 5.0 mater 4.44 rater 4.42 watem 4.44 weter 4.23 woter 4.23 wader 4.39 wather 4.32 wazer 4.17 watar 4.23 watir 4.23 wator 4.23 whater 4.23 watere 4.23
Step 2: Generation of Wordplay Sequences mater 4.44 rater 4.42 watem 4.44 weter 4.23 woter 4.23 wader 4.39 wather 4.32 wazer 4.17 watar 4.23 watir 4.23 wator 4.23 whater 4.23 watere 4.23
Step 2: Generation of Wordplay Sequences mater 4.44 watem 4.44 rater 4.42 wader 4.39 weter 4.23 woter 4.23 watere 4.23 wather 4.32 wazer 4.17 watar 4.23 watir 4.23 wator 4.23 whater 4.23 meter 3.67 moter 3.67
Step 2: Generation of Wordplay Sequences whator 3.46 *** *** *** *** • Decomposition of whator is what or • what or is different from water • wordplay found; return what or
Step 3: Wordplay Validation • Check if the wordplay is meaningful • If wordplay is at least two words • Decompose wordplay into word pairs what or • Check if word pair in the bigram database no • If wordplay is one word • The word is in the dictionary • If wordplay is meaningful, Step 4. • Otherwise, Step 2.
Step 2: Generation of Wordplay Sequences whatare 2.69 *** *** *** *** • Decomposition of whatare is what are • what are is different from water • wordplay found; return what are
Step 3: Wordplay Validation • Check if the wordplay is meaningful • If wordplay is at least two words • Decompose wordplay into word pairs what are • Check if word pair in the bigram database yes • Proceed to Step 4.
Step 4: Last Sentence Validation with Wordplay • Wordplay is meaningful • Could occur • In the beginning of last sentence: What are you doing? • In the middle of last sentence: Please tell me what are you doing? • At the end of last sentence The question started with “what are”.
Step 4: Last Sentence Validation with Wordplay • In the beginning of last sentence: • If wordplay is at least 2 words • What are you doing? • Check if (what are you) and (are you doing) are in trigram table. • If wordplay is only one word • Meter you doing? • Check if (meter you doing) is in trigram table • If at least one of the needed sequences in not in trigram table, Step 2. • Otherwise, the text is a joke.
Step 4: Last Sentence Validation with Wordplay • In the middle of last sentence: • If wordplay is at least 2 words • Please tell me what are you doing? • Check if (tell me what), (me what are), (what are you) and (are you doing) are in trigram table. • If wordplay is only one word • Please tell me meter you doing? • Check if (tell me meter) and (meter you doing) is in trigram table • If at least one of the needed sequences in not in trigram table, Step 2. • Otherwise, the text is a joke.
Step 4: Last Sentence Validation with Wordplay • At the end of last sentence: • If wordplay is at least 2 words • The sentence ended with what are? • Check if (ended with what) and (with what are) are in trigram table. • If wordplay is only one word • The sentence ended with meter? • Check if (ended with meter) is in trigram table • If at least one of the needed sequences in not in trigram table, Step 2. • Otherwise, the text is a joke.
Results • 66 training jokes • 59 jokes were recognized • 7 unrecognized, no wordplay found • 66 non-jokes • 62 correctly recognized as non-jokes • 1 found wordplay that makes sense • 3 incorrectly recognized as jokes • 130 test jokes • 8 jokes were not expected to be recognized • 12 identified as jokes with expected wordplay • 5 identified as jokes with unexpected wordplay • 80 expected wordplay found
Possible Enhancements • Improve last sentence validation • Increasing size of text used for N-gram training • Parser • N-grams with stemming • Improve wordplay generator • Use of phoneme comparison • Use wider domain • All types of Knock Knock jokes • Other types of wordplay jokes
Conclusion • Initial investigation into computational humor recognition using wordplay • The program was designed to • Recognize wordplay in jokes 67% • Recognize jokes with containing wordplay 12%