60 likes | 73 Views
PAE-DIRT Paraphrase Acquisition Enhancing DIRT. David Hall Michael Chung. Motivation. DIRT: Discovery of Inference Rules from Text Extracts paraphrases using dependency links: Desperate, Bush asked Congress if it would... Desperate, Bush inquired of Congress if it would...
E N D
PAE-DIRTParaphrase Acquisition Enhancing DIRT David Hall Michael Chung
Motivation • DIRT: Discovery of Inference Rules from Text • Extracts paraphrases using dependency links: • Desperate, Bush asked Congress if it would... • Desperate, Bush inquired of Congress if it would... • (“asked” and “inquired of” are similar!) • Unfortunately, some things aren't so great: • Angry, Bush told Congress it would... • “told” = “inquired of”? DIRT would think so. • Use context clues to determine similar sentences • Paraphrases!
Dependency Link Overlap Model • Sentences parsed with Minipar – sets of dependency links (triples with path length one) are extracted. • Ex. (produce, obj, evidence) • Sentence similarity score based on percentage of shared dependency links. • Best metric:
Bag of Words Models • Word overlap: • Use IDF scores to determine important words • See how much information two sentences share. • ngram Overlap • Use a BLEU score style metric to calculate the number of ngrams shared. (Use IDF scores!) • LSA + word overlap: • Use word-to-word similarity scores to help you find phrase-to-phrase similarity.
Conclusion • Trigram Overlap Metric has best performance with F-Measure 0.77 • Future Work: • Combine Metrics ( Filter with a high recall/high speed metric first, then re-calculate with dependency metric ) • Try using the various metrics with DIRT.