160 likes | 169 Views
Sophia Katrenko & Pieter Adriaans Adaptive Information Disclosure project Human Computer Studies Laboratory, IvI, University of Amsterdam katrenko@science.uva.nl. Using Maximal Embedded Subtrees for Textual Entailment Recognition. Outline. Task statement Tree mining: methods Experiments
E N D
Sophia Katrenko & Pieter Adriaans Adaptive Information Disclosure project Human Computer Studies Laboratory, IvI, University of Amsterdam katrenko@science.uva.nl Using Maximal Embedded Subtrees for Textual Entailment Recognition
Outline • Task statement • Tree mining: methods • Experiments • Discussion
Why trees?… Complex structure • What do these two pictures have in common? (Scottish handwriting (17th century)) Complex structure!
Motivation Idea: trees can be compared in order to find highly similar structures • Tree mining is an intermediate step which allows for the frequent subtree discovery • When looking for the most frequent subtrees, we can relax the restrictions on how similar two subtrees should be
What type of trees? (1) • In tree mining, there are the following subtrees distinguished: • Bottom-up subtrees • Induced subtrees • Embedded subtrees • We use embedded tree mining as described in (M. Zaki, 2005, “Efficiently mining Frequent Trees in a Forest: Algorithms and Applications).
What type of trees?(2) A A B D B D C K E G C E H F K F G H Tree 2 Tree 1 Tree 1 RED – embedded trees YELLOW – bottom-up trees
Data Dependency parsing Depth first search (DFS, preorder) Rooted ordered emb. tree mining Setting thresholds Evaluation Methodology
Data preprocessing • Each pair of sentences has been parsed by Minipar (Dekang Lin) • Each dependency tree has been transformed by incorporating edge labels into node labels • Each transformed tree has been presented in preorder (or DFS)
Syntactic matching • Provided two sentences (trees, consequently) S1 and S2 where =|S1| and =|S2|, let the size of the rooted maximal embedded tree be . We define the similarity score as a ratio
Runs • Run 1: syntactic matching (syntactic functions being incorporated into the node labels) & lemmas overlap • Run 2: lemmas overlap (baseline) • Run 3: syntactic matching (without syntactic functions) & lemmas overlap
Official results (accuracy) • Run 1 (59%) • QA 60.50% • SUM 69.50% • IR 62.00% • IE 44.00%
Conclusions: Does it work? • Syntactic matching improves precision! But… • In some cases, it is too flexible (which leads to false positives) • We used ordered trees, therefore such pairs as below do not get high matching scores (h) The currency used in China is the Renminbi Yuan. (t) The Renminbi Yuan is the currency used in China.
Possible extensions • Use the synonyms/antonyms from WordNet • Handle situation where there are several maximal subtrees • Use weighing for the tree nodes • Use deep semantic analysis
H: The author expressed his gratitude to the audience T: Thank you! / False? True