160 likes | 264 Views
Normalized alignment of dependency trees for detecting textual entailment. Erwin Marsi & Emiel Krahmer Tilburg University. Wauter Bosma & Mariët Theune University of Twente. Basic idea. A true hypothesis is included in the text, allowing omission and rephrasing
E N D
Normalized alignment of dependency trees for detecting textual entailment Erwin Marsi & Emiel Krahmer Tilburg University Wauter Bosma & Mariët Theune University of Twente
Basic idea • A true hypothesis is included in the text, allowing omission and rephrasing Text: The Rolling Stones kicked off their latest tour on Sunday with a concert at Boston's Fenway Park. Hypothesis:The Rolling Stones have begun their latest tour with a concert in Boston. Entailment:True • Omissions: • on Sunday • Fenway Park • Paraphrases: • kicked off begun • Boston's Fenway Park Boston RTE2 Workshop
Matching surface words alone is not sufficient... • Variation in surface realization perfect word match is no guarantee for entailment • Using syntactic analysis • for syntactic normalization • to match on hierarchical relations among constituents Example: “He became a boxing referee in 1964, and became well-known […]” “He became well-known in 1964” RTE2 Workshop
Preprocessing • Input: T-H pairs in XML • Processing pipeline: • Sentence splitting, MXTERMINATOR (Reynar & Ratnaparkhi, 1997) • Tokenization, Penn Treebank SED script • POS tagging with PTB POS tags using Mbt (van den Bosch et al) • Lemmatizing using Memory-based learning (van den Bosch et al) • Dependency parsing using Maltparser trained on PTB (Nivre & Scholz, 2004) • Syntactic normalization • Output: T-H dependency tree(s) pairs in XML RTE2 Workshop
Syntactic Normalization • Three types of syntactic normalization: • Auxiliary reduction • Passive to active form • Copula reduction RTE2 Workshop
Auxiliary Reduction • Auxiliaries of progressive and perfective tense are removed • Their children are attached to the remaining content verb • The same goes for modal verbs, and for do in the do-support function. Example: “demand for ivory has dropped” “demand for ivory dropped” Example: “legalization does not solve any social problems” “legalization not solves any social problems” RTE2 Workshop
Passive to Active Form • The passive form auxiliary is removed • The original subject becomes object • Where possible, a by-phrase becomes the subject Example: “Ahmedinejad was attacked by the US” “the US attacked Ahmedinejad” RTE2 Workshop
Copula Reduction • Copular verbs are removed by attaching the predicate as a daughter to the subject Example: “Microsoft Corp. is a partner of Intel Corp.” “Microsoft Corp., a partner of Intel Corp.” RTE2 Workshop
Alignment of Dependency Trees • Tree alignment algorithm based on (Meyers, Yangarbar and Grishman, 1996) • Searches for an optimal alignment of the nodes of the text tree to the nodes of the hypothesis tree • Tree alignment is a function of: • how well the words of the two nodes match • recursively, the weighted alignment score for each of the aligned daughter nodes RTE2 Workshop
Word Matching • function WordMatch(wt,wh) -> [0,1] maps text-hypothesis word pairs to a similarity score • returns 1 if • wt is identical to wh • the lemma of wt is identical to the lemma of wh • wt is a synonym of wh (lookup in EuroWordnet with lemma & POS) • wh is a hypernym of wt (idem) • returns similarity from automatically derived thesaurus if > 0.1 (Lin’s dependency-based thesaurus) • otherwise returns 0 • also match on phrasal verbs • e.g. “kick off“ is a synonym of “begin“ RTE2 Workshop
Alignment example Text:The development of agriculture by early humans, roughly 10,000 years ago, was also harmful to many natural ecosystems as they were systematically destroyed and replaced with artificial versions. Hypothesis: Humans existed 10,000 years ago. Entailment:True RTE2 Workshop
Alignment example (cont’d) RTE2 Workshop
Entailment prediction • Prediction rule: IF top node of the hypothesis is aligned AND score > threshold THEN entailment = true ELSE entailment = false • Threshold and parameters of tree alignment algorithm (skip penalty)optimized per task RTE2 Workshop
Results Percentage entailment accuracy (n=800) RTE2 Workshop
Problems • Many parses contain errors due to syntactic ambiguity and propagation of • Spelling errors • Tokenization errors • POS errors • broken dependency trees • Consequently, syntactic normalization & alignment failed • Dependency relations did not help RTE2 Workshop
Discussion & Conclusion • There are many forms of textual entailment that we cannot recognize automatically... • Paraphrasing • Co-reference resolution • Ellipsis • Condition/modality • Inference • Common sense / world knowledge • RTE requires a combination of deep NLP, common sense knowledge and reasoning RTE2 Workshop