200 likes | 293 Views
Toward Dependency Path based Entailment. Rodney Nielsen, Wayne Ward, and James Martin. Why Entailment. Intelligent Tutoring Systems Student Interaction Analysis Are all aspects of the student’s answer entailed by the text and the gold standard answer
E N D
Toward Dependency Path based Entailment Rodney Nielsen, Wayne Ward, and James Martin
Why Entailment • Intelligent Tutoring Systems • Student Interaction Analysis • Are all aspects of the student’s answer entailed by the text and the gold standard answer • Are all aspects of the desired answer entailed by the student’s response
Dependency Path-based Entailment • DIRT (Lin and Pantel, 2001) • Unsupervised method to discover inference rules • “X is author of Y ≈ X wrote Y” • “X solved Y ≈ X found a solution to Y” • Based on Harris’ Distributional Hypothesis • words occurring in the same contexts tend to be similar • If two dependency paths tend to link the same sets of words, they hypothesize that their meanings are similar
ML Classification Approach Dependency Path Based Entailment • Features derived from corpus statistics • Unigram co-occurrence • Surface form bigram co-occurrence • Dependency-derived bigram co-occurrence • Mixture of experts: • About 18 ML classifiers from Weka toolkit • Classify by majority vote or average probability Bag of Words Graph Matching
Corpora • 7.4M articles, 2.5B words, 347 words/doc • Gigaword (Graff, 2003) – 77% of documents • Reuters Corpus (Lewis et al., 2004) • TIPSTER • Lucene IR engine • Two indices • Word surface form • Porter stem filter • Stop words = {a, an, the}
Word Alignment Features • Unigram word alignment
Word Alignment Features • Bigram word alignment • Example: • <t>Newspapers choke on rising paper costs and falling revenue.</t><h>The cost of paper is rising.</h> • MLE(cost, t) = ncost of, costs of /ncosts of = 6086/35800 = 0.17
Hypothesis h Text t rising choke cost is Newspapers on The of costs paper rising paper and revenues falling Dependency Features • Dependency bigram features
Hypothesis h Text t rising choke cost is Newspapers on The of costs paper rising paper and revenues falling Dependency Features • Descendent relation statistics
Hypothesis h Text t rising choke cost is Newspapers on of The costs paper rising paper and revenues falling Dependency Features • Descendent relation statistics
Hypothesis h Text t rising choke cost is Newspapers on The of costs paper rising paper and revenues falling Dependency Features • Descendent relation statistics
Hypothesis h Text t rising choke cost is Newspapers on The of costs paper rising paper and revenues falling Dependency Features • Descendent relation statistics
Hypothesis h Text t rising choke cost is Newspapers on The of costs paper rising paper and revenues falling Verb Dependency Features • Combined verb descendent relation features • Worst verb descendent relation features
Hypothesis h Text t rising choke cost is Newspapers on The of costs paper rising paper and revenues falling SubjectDependencyFeatures • Combined and worst subject descendent relations • Combined and worst subject-to-verb paths
Other Dependency Features • Repeat these same features for: • Object • pcomp-n • Other descendent relations
Feature Analysis • All feature sets are contributing according to cross validation on the training set • Most significant feature set: • Unigram stem based word alignment • Most significant core repeated feature: • Average Probability
Conclusions • While our current dependency path features are only a step in the direction of our proposed inference system, they provided a significant improvement over the best results from the first PASCAL Recognizing Textual Entailment challenge (RTE1) • Our system (after fixing a couple of bugs) ranked 6th in accuracy and 4th in average precision out of 23 entrants at this year’s RTE2 challenge • We believe our proposed system will provide an effective foundation for the detailed assessment of students’ responses to an intelligent tutor
choke Newspapers on rising costs cost is rising paper and revenues The of falling paper Questions Dependency Path Based Entailment • Mixture of experts classifier using corpus co-occurrence statistics • Moving in the direction of DIRT • Domain of Interest: Student response analysis in intelligent tutoring systems Bag of Words Graph Matching Hypothesis h Text t