420 likes | 577 Views
Gaiku Generating Haiku with Word Associations Norms. Yael Netzer, David Gabay , Yoav Goldberg and Michael Elhadad Department of Computer Science Ben Gurion University of the Negev Israel. CALC’09 May 35 th 2009. Creativity.
E N D
GaikuGenerating Haiku withWord Associations Norms Yael Netzer, David Gabay , Yoav Goldberg and Michael Elhadad Department of Computer Science Ben Gurion University of the Negev Israel CALC’09 May 35th 2009
Creativity “the forming of associative elements into new combinations which either meet specified requirements or are in some way useful…” [Mendick 1969] Three main pathes to a creative solution: • serendipity • similarity • mediation
WAN Computational Creativity Poetry Generating Haiku!
Haiku • Form of poetry originated in Japan, 16th Century • Three lines of 5,7,5 phonetic units (mora) • Use present tense and use no judgmental words • Adopted in Western languages, 20th Century • 5,7,5 3 short lines • Traditionaly, reference to nature and seasons, but modern Haiku are not restricted • Basho Haiku • 古池や蛙飛込む水の音 • old pond . . . • a frog leaps in • water’s sound
fishing guides boat in the background a new trip iced over pond I skip a rock the entire width a holy cow a carton of milk seeking a church blind snakeson the wet grasstombstoned terror blossomless but not unloved the old magnolia first date — the little pile of anchovies
Bo y S ul
Bo y S ul Structure
Bo y S ul Content Structure
Bo y S ul Inspiring, Interesting, Intriguing, Joyful, … 3 lines, Grammatical, Haiku-like
Previous works • Manurung [2003] • Manurung et al. [2000] • Gervas [2001] Emphasize on Structure, Less on Content
Body / Structure • Haiku Corpus • ~3,500 Haiku in English • Various sources • amateurish sites • children’s writings • translations of classic Japanese Haiku of Bashu and others • ’official’ sites of Haiku Associations (e.g., Haiku Path - Haiku Society of America).
Body / Structure Line 1 Patterns: 280 JJ NN276 NN NN... Line 2 Patterns: 64 DT_the JJ NN … Line 3 Patterns: …. NN IN_of NNPDT_a NN IN_ofNNS NN NNNNS CC NNSIN_on DT_a NN NN … POS Tag Count Count Pattern Transitions: P(line2==DT_the NN | line1==JJ NN) = ... …
Body / Structure Google 1T-Web / Proj Gutenberg Line 1 Patterns: 280 JJ NN276 NN NN... Line 2 Patterns: 64 DT_the JJ NN … Line 3 Patterns: …. POS Tagged match Pattern Transitions: P(line2==DT_the NN | line1==JJ NN) = ... …
Body / Structure Google 1T-Web / Proj Gutenberg Line 1 Patterns: 280 JJ NN276 NN NN... Line 2 Patterns: 64 DT_the JJ NN … Line 3 Patterns: …. POS Tagged match Pattern Transitions: P(line2==DT_the NN | line1==JJ NN) = ... … JJ NNSDT_a JJ NNIN_of NN
Body / Structure Google 1T-Web / Proj Gutenberg Line 1 Patterns: 280 JJ NN276 NN NN... Line 2 Patterns: 64 DT_the JJ NN … Line 3 Patterns: …. POS Tagged match Pattern Transitions: P(line2==DT_the NN | line1==JJ NN) = ... … pouring catsa pilot careof fighter JJ NNSDT_a JJ NNIN_of NN
Body / Structure Google 1T-Web / Proj Gutenberg Line 1 Patterns: AA BB CC / 12 BB CC DD / 10 … Line 2 Patterns: CC DD EE / 20 … Line 3 Patterns: …. Grammatical output Preserves Haiku “Texture” POS Tagged match Pattern Transitions: P(Line2=AA BB | Line1= XX YY) … pouring catsa pilot careof fighter JJ NNSDT_a JJ NNIN_of NN
Soul? • Requirements: good “story” • cohesive • surprising • provoke feelings/emotions • metaphorical • “Should leave the reader wondering…” … Creative!
Soul? • An idea: capture “story” seed as sequence of concepts butterfly, spring, flower thief , steal , jail mosquito, blood, vampire but not any seed will do cat , feline , claw too cohesive computer , coat , queen too divergent
Soul? Is WordNet a good soul? not really it may give cohesiveness, but bad stories
Soul? We actually measured it in Haiku Corpus Is WordNet a good soul? not really
Butterfly Spring Flower • The connection between these words is reconstructable by human • It is not available in WordNet • Where can we find such relations?
Word Association Norms (WAN) • Collection of cue words a set of free associations (targets) with quantitative and statistical measures. (mouse CAT 0.5, RAT 0.08, CHEESE 0.07, HOLE 0.05…) • Given a cue - collect immediate responses of first word that comes to mind. • Largest WAN we know for English is the University of South Florida Free Association Norms (Nelson et al., 1998). http://w3.usf.edu/FreeAssociation/ • 5,019 cue words and 10,469 additional target that were collected with more than 6,000 participants since 1973. WAN – weighted directed graph, nodes are stemmed words.
water spring water fall fall flower butterfly green bloom
Why Word Associations • Added value of WAN: an insight on language, not found in WordNet or are hard to acquire from corpora [Sinopalnikova & Smrz 2004] • Associative thinking takes part in the process of writing and reading poetry. • Haiku, because so short - relies on lexical associations for concept progression Hypothesis: word-associations are good catalyzers for creativity, can be used as a building block in the creative process of Haiku generation.
We first test this hypothesis by analyzing a corpus of existing Haiku poems. • Can the creativity of text as reflected in word associations be quantified? • Are Haiku poems indeed more associative than newswire text or prose?
Two nodes are connected iff one of them is a cue for the other. Associative distance: number of edges in the shortest path between the words in the associations-graph. WordNet distance: number of edges in the shortest path between any synset of one word to any synset of the other word Associativity of a text - the number of associated word pairs in the text, normalized by the number of word pairs in the text of which both words are in the WAN. WordNet-relations level - the number of WordNet-related word pairs in the text.
Average Associativity We measure the associavity and WordNet relations levels of 200 of the Haiku in our Haiku Corpus, as well as of random 12-word sequences from Project Gutenberg and from the NANC newswire corpus.
Filling body with soul: Theme Selection • Generating the seed of the story: • Start with a word • random walk on a word graph Many possible variants. We currently use: start with the node of the seed word do several short random walks keep resulting word set
water spring water fall fall flower butterfly green bloom Spring {flower, butterfly…}
Filling body with soul • For a given structure: • Choose first line containing seed word • Choose other lines containing a word from the set • This is adequate, but relations might be straightforward Searching for a better soul: Generate several poems for the pattern Rerank them based on associativity measure. Reranking catches further “residual” relations
6 alligator pear a handful of whites in the spring 8 avocado pear a kind of boots in the fall 10 pear salad a season of tears in the summer 10 pear tree a seasoning of spices in the fall 10 alligator pear a spring of tears in the blackness NN NN DET_a NN of NNS PP_in DET_the NN
Evaluation Method • ‘Turing test’: • Was this Haiku written by human or by a computer? • How would you grade it between 1 to 5? • Settings: • AUTO Haiku set: 15 Haiku created by Gaiku without any manual selection, 10 random human Haiku on same subjects • SEL set: 17 Haiku created by Gaiku, selected manually out several runs, 9 award winning human Haiku • 52 subjects
The Best of Gaiku early dew the water contains teaspoons of honey • Best in SEL. Classified as human - 77.2%, average grade 3.09 • Best in AUTO. Classified as human - 72.2%, average grade 2.75 cherry tree poisonous flowers lie blooming
Conclusions • Word Association Norms have good potential in creative content generation Future Work: Lots! • Haiku: improve theme selection • Additional forms of creative texts • Test WAN in general NLP tasks: • Use WAN for (Non-creative) Generation • Word Sense Disambiguation • Lexical chains • ‘Guess the word’ given associations (for people with SLI)
fishing guides boat in the background a new trip iced over pond I skip a rock the entire width a holy cow a carton of milk seeking a church blind snakeson the wet grasstombstoned terror blossomless but not unloved the old magnolia first date — the little pile of anchovies
fishing guides boat in the background a new trip iced over pond I skip a rock the entire width a holy cow a carton of milk seeking a church blind snakeson the wet grasstombstoned terror blossomless but not unloved the old magnolia first date — the little pile of anchovies