Language Technologies Institute School of Computer Science Carnegie Mellon University, USA

Carnegie Mellon Diversiﬁable Bootstrapping for Acquiring High-Coverage Paraphrase Resource Hideki Shima Teruko Mitamura Language Technologies Institute School of Computer Science Carnegie Mellon University, USA

Can a machine recognize the meaning similarity? John killedMary.

Can a machine recognize the meaning similarity? • passivization John killedMary. Mary was killed byJohn.

Can a machine recognize the meaning similarity? • passivization • nominalization John killedMary. Mary was killed byJohn. John is the killer of Mary.

Can a machine recognize the meaning similarity? • passivization • nominalization • entailment John killedMary. Mary was killed byJohn. John is the killer of Mary. John assassinated Mary.

Can a machine recognize the meaning similarity? • passivization • nominalization • entailment • slang 187 means: “California penal code for murder, made popular in west coast gangsta rap”. – From The Urban Dictionary dot com Usage: “This is Gavilan. In pursuit of possible 187 suspects.” –From the movie, Hollywood Homicide John killedMary. Mary was killed byJohn. John is the killer of Mary. John assassinated Mary. John is the 187 suspect of Mary.

Can a machine recognize the meaning similarity? • passivization • nominalization • entailment • slang • euphemism “In military and other covert operations, terminate with extreme prejudice is a euphemism for execution” – Wikipedia John killedMary. Mary was killed byJohn. John is the killer of Mary. John assassinated Mary. John is the 187 suspect of Mary. John terminated Mary with extreme prejudice.

Can a machine recognize the meaning similarity? • passivization • nominalization • entailment • slang • euphemism Humans use various expressions to convey the same or similar meaning, which makes it difficult for machines to “read” text. John killedMary. Mary was killed byJohn. John is the killer of Mary. John assassinated Mary. John is the 187 suspect of Mary. John terminated Mary with extreme prejudice.

Can a machine recognize the meaning similarity? • passivization • nominalization • entailment • slang • euphemism Goal: automatically acquire paraphrase patterns that arelexically-diverse X killedY. Y was killed byY. X is the killer of Y. X assassinated Y. X is the 187 suspect ofY. X terminated Y with extreme prejudice.

ParaphraseRecognition / Generationis a common need in various applications • Automatic Evaluation • In Machine Translation [Kauchak & Barzilay, 2006][Padóet al., 2009] • In Text Summarization [Zhou et al., 2006] • In Question Answering [Ibrahim et al., 2003] [Dalmas, 2007] • Text Summarization[Lloret et al., 2008][Tatar et al., 2009] • Information Retrieval [Parapar et al., 2005][Riezler et al., 2007] • Information Extraction [Romano et al., 2006] • Question Answering [Harabagiu & Hickl, 2006][Dogdan et al., 2008] • Collocation Error Correction [Dahlmeier and Ng, 2011]

Outline Motivation Method: Diversifiable Bootstrapping Experiment Related Works Conclusion

Bootstrap Paraphrase Learning • INPUT • OUTPUT • BOOTSTRAPLEARNING • ALGORITHM • more • instances • seed • instances • monolingual • plain corpus • patterns

Bootstrap Paraphrase Learning • INPUT • OUTPUT • BOOTSTRAPLEARNING • ALGORITHM • Bootstrapping • more • instances • seed • instances • monolingual • plain corpus • patterns

Bootstrap Paraphrase Learning X, the assassin of Y assassination ofYbyX XassassinatedY the assassination ofYbyX ofX, the assassin ofY XassassinatedYin : : : • INPUT • OUTPUT • Bootstrapping • more • instances • seed • instances • monolingual • plain corpus • patterns Unlike many other bootstrapping works the goal is acquire patterns, not instances

Bootstrap Paraphrase Learning • INPUT • OUTPUT • BOOTSTRAPLEARNING • ALGORITHM • more • instances • seed • instances • monolingual • plain corpus • patterns

Bootstrap Learning Algorithm Seed Instances Extracted Patterns Sentences 1st iteration Ranked Patterns ExtractedInstances Sentences 2nd iteration Ranked Instances . . . This framework is based on ESPRESSO [Pantel & Pennacchiotti, 2006]

Bootstrap Learning Algorithm • Search sentences by instances Seed Instances Extracted Patterns Sentences 1st iteration • Edwin Booth was brother of John Wilkes Booth, the assassin of Abraham Lincoln. • John Wilkes Booth, the assassin of Abraham Lincoln, was inspired by Brutus. • In 1969 Berman was part of the defense team of SirhanSirhan, the assassin of Robert F. Kennedy. • : : : Ranked Patterns ExtractedInstances Sentences 2nd iteration Ranked Instances . . .

Bootstrap Learning Algorithm • Search sentences by instances Seed Instances Extracted Patterns Sentences • Edwin Booth was brother of X, the assassin of Y. • X, the assassin ofY, was inspired by Brutus. • In 1969 Berman was part of the defense team of X, the assassin of Y. • : : : 1st iteration Ranked Patterns ExtractedInstances Sentences 2nd iteration Ranked Instances . . .

Bootstrap Learning Algorithm • Extract patterns from sentences Seed Instances Extracted Patterns Sentences • … brother of X, the assassin of Y. • X, the assassin ofY, was • …team of X, the assassin of Y. 1st iteration Ranked Patterns ExtractedInstances Sentences 2nd iteration Ranked Instances . . .

Bootstrap Learning Algorithm • Extract patterns from sentences Seed Instances Extracted Patterns Sentences • … brother ofX, the assassin of Y . • X, the assassin of Y , was • …team of X, the assassin of Y . 1st iteration Ranked Patterns ExtractedInstances Sentences 2nd iteration Ranked Instances . . . Extracted Pattern: Longest Common Substring among retrieved sentences

Bootstrap Learning Algorithm • Score and rank patterns Seed Instances Extracted Patterns Sentences 1st iteration Ranked Patterns ExtractedInstances Sentences • Rank by reliability of pattern: r(p). • r(p) is based on an association measure with each instancein the corpus. 2nd iteration Ranked Instances . . .

Bootstrap Learning Algorithm • Score and rank patterns Seed Instances Extracted Patterns Sentences 1st iteration 1. 0.422 X, the assassin of Y 2. 0.324 assassination of Y by X 3. 0.312 X assassinatedY 4. 0.231 the assassination of Y by X 5. 0.208 ofX, the assassin of Y : : : Ranked Patterns ExtractedInstances Sentences 2nd iteration Ranked Instances . . .

Bootstrap Learning Algorithm • Search sentences by pattern(s) Seed Instances Extracted Patterns Sentences 1st iteration Ranked Patterns ExtractedInstances Sentences • Still shot from the CCTV video footage showing OguenSamast, the assassin ofHrantDink. • Henry Bellingham is a descendant of JohnBellingham, the assassin of Spencer Perceval. 2nd iteration Ranked Instances . . .

Bootstrap Learning Algorithm • Extract instances from sentences Seed Instances Extracted Patterns Sentences 1st iteration Ranked Patterns ExtractedInstances Sentences • Still shot from the CCTV video footage showing OguenSamast, the assassin ofHrantDink. • Henry Bellingham is a descendant of John Bellingham, the assassin of Spencer Perceval. 2nd iteration Ranked Instances . . .

Bootstrap Learning Algorithm • Score and rank instances Seed Instances Extracted Patterns Sentences • Rank instances by reliability: r(i) • (similar to pattern reliability scoring) 1st iteration Ranked Patterns ExtractedInstances Sentences 2nd iteration Ranked Instances . . .

Issue: Lack of Lexical Diversity • Words participating in patterns are skewed X, the assassin of Y assassination ofYbyX XassassinatedY the assassination ofYbyX ofX, the assassin ofY XassassinatedYin As a solution, we propose the Diversifiable Bootstrapping

Diversifiable Bootstrapping How is a pattern lexically different from other patterns originally ranked higher than this? Original reliability score of a pattern

Diversifiable Bootstrapping How is a pattern lexically different from other patterns originally ranked higher than this? Original reliability score of a pattern Interpolation parameter:

Diversifiable Bootstrapping Key contribution By tweaking the parameter λ, patterns to acquire can be diversifiable with a specific degree one can control. How is this pattern lexically different from other patterns originally ranked higher than this? Original reliability score of a pattern Interpolation parameter:

Experimental Settings • Bootstrapping Algorithm • Based on ESPRESSO framework [Pantel & Pennacchiotti, 2006] • Unlike ESPRESSO, we aim to obtain patterns not instances • Lexical diversity scoring function: • Based on Shima & Mitamura [2011] • Seed instances: Schlaefer et al., [2006] • Corpus: English Wikipedia

Acquired Paraphrases: killed (no diversification)

Acquired Paraphrases: killed

Acquired Paraphrases: died-of

Acquired Paraphrases: was-led-by

Related Works – Use of Thesaurus Synonyms of “lead (v)” in WordNet E.g., WordNet [Miller, 1995], FrameNet [Baker et al., 1998], Nomlex [Macleod et al., 1998], VerbNet [Kipper et al., 2006]

Related Works – Use of Thesaurus WEAKNESSNeed WSD or contexts to avoid false-positives. Synonyms of “lead (v)” in WordNet E.g., WordNet [Miller, 1995], FrameNet [Baker et al., 1998], Nomlex [Macleod et al., 1998], VerbNet [Kipper et al., 2006]

Related Works – Paraphrase Acquisition • Alignment Approach • Monolingual Comparable Corpus [Shinyama et al, 2002] • Bilingual Parallel Corpus [Barzilay & McKeown, 2001][Bannard & Callison-Burch, 2005][Callison-Burch, 2008] • Distributional Approach • Context as Vector Space [Pasca & Dienes, 2005][Bhagat & Ravichandran, 2008] • Context as Surface Pattern [Lin & Pantel, 2001][Ravichandran & Hovy, 2002]

Related Works – Paraphrase Acquisition Paraphrases acquired by Metzler et al., [2011]

Differences from Related Works • Our work requires just a plain non-parallel corpus • Language portability: • Good news for resource/tool-scarce languages • There’s a potential to learn words used in a closed community (slangs, technical terms etc) by providing a domain-specific corpus • Bootstrapping works iteratively with minimum supervision • Smaller human effort is required as compared to heavily supervised learning methods, or to relying on domain expert humans to hand-craft patterns.

Conclusion We proposed the Diversifiable Bootstrapping which can acquire lexically- diverse paraphrase patterns. We gave initial experimental results on a few relations, which look promising. As a future work, we hope to conduct formal evaluations on larger relations in different languages.

Acknowledgment This publication was made possible in part by a NPRP grant (No: 09-873-1-129) from the Qatar National Research Fund (a member of The Qatar Foundation). The statements made herein are solely the responsibility of the authors. We also gratefully acknowledge the support of Defense Advanced Research Projects Agency (DARPA) Machine Reading Program under Air Force Research Laboratory (AFRL) prime contract no. FA8750-09-C-0172. Any opinions, findings, and conclusion or recommendations expressed in this material are those of the authors and do not necessarily reflect the view of the DARPA, AFRL, or the US government.

Questions?

Language Technologies Institute School of Computer Science Carnegie Mellon University, USA

Language Technologies Institute School of Computer Science Carnegie Mellon University, USA

Presentation Transcript

Carnegie Mellon University

Edmund M. Clarke School of Computer Science Carnegie Mellon University

Mor Harchol-Balter Carnegie Mellon University Computer Science

Edmund M. Clarke School of Computer Science Carnegie Mellon University

Language Technologies Institute Carnegie Mellon University

School of Computer Science Carnegie Mellon University

Carnegie Mellon University

Mor Harchol-Balter Carnegie Mellon University School of Computer Science

Carnegie Mellon University

Norman M. Sadeh ISR - School of Computer Science Carnegie Mellon University

Tuomas Sandholm Computer Science Department Carnegie Mellon University

Professor Alan W. Black Language Technologies Institute, Carnegie Mellon University

Carnegie Mellon University

Carnegie Mellon University

Alon Lavie and Satanjeev Banerjee Language Technologies Institute Carnegie Mellon University

Carnegie Mellon University

Alon Lavie Language Technologies Institute Carnegie Mellon University Joint work with:

Tuomas Sandholm Computer Science Department Carnegie Mellon University

Tuomas Sandholm Computer Science Department Carnegie Mellon University

Norman M. Sadeh ISR - School of Computer Science Carnegie Mellon University

Carnegie Mellon University

Mor Harchol-Balter Carnegie Mellon University School of Computer Science