170 likes | 320 Views
Natural Language Based Reformulation Resource and Web Exploitation for Question Answering Ulf Hermjakob, Abdessamad Echihabi, Daniel Marcu University of Southern California. Presented By: Soobia Afroz. Introduction.
E N D
Natural Language Based Reformulation Resource and Web Exploitation for Question AnsweringUlf Hermjakob, Abdessamad Echihabi, Daniel MarcuUniversity of Southern California Presented By: Soobia Afroz
Introduction The degree of difficulty How closely a given corpus matches the question and NOT on the question itself Q: When was the UN founded? A: The UN was formed in January 1942. A: The name "United Nations", coined by United States President Franklin D. Roosevelt, was first used in the "Declaration by United Nations" of 1 January 1942, during the Second World War, when representatives of 26 nations pledged their Governments to continue fighting together against the Axis Powers. Larger text => Good Answers => Validation in original text
Paraphrasing questions: Create semantically equivalent paraphrases of the questions Match Answer/string with any of the paraphrases • Question paraphrases + Retrieval engine Find documents containing correct answers • Rank and select better answers • Automatically paraphrase questions by TextMap. Example: “How did Mahatma Gandhi die?” “How deep is Crater Lake?” “Who invented the cotton gin?”
How the system works: • Parse questions • Identify the answer type of the question • Reformulate the question average reformulations: 3.14 • Match at parse-tree level
1. Syntactic reformulations • Turn a question into declarative form, e.g.,
Information Retrieval and the Web Web TREC (Text Retrieval Conference) Web based IR system IR system for Webclopedia Sentence Ranking module Query Reformulation module Web Search engine
1. Query Reformulation module Previous attempts: • Simple, exhaustive string-based manipulations • Transformation grammars • Learning algorithms Current attempt: • Analyze how people naturally form queries to find answers • Randomly selected 50 TREC8 questions • Manually produced simplest queries that yield the most Web pages containing answers • Analyzed the manually-produced queries and categorized them into seven ‘natural’ techniques that were used to form a natural language question • Derived algorithms that replicate each of the observed technique
2. Sentence Ranking module • Produce a list of Boolean queries for each question using all the query reformulation techniques • Retrieve the top ten results for each query using a web search engine • Retrieve the documents, strip HTML, segment the text into sentences • Each sentence is ranked according to 2 schemas: Score w.r.t. queries terms: -- Each word in query assigned a weight -- Each quoted term in the query has a weight equal to the sum of the weights of its words -- Each sentence has a weight equal to the weighted overlap with queries terms Score w.r.t. answers: -- Tag sentences using BBN’s IdentiFinder (a hidden Markov model that learns to recognize and classify names, dates, times, and numerical quantities.) -- Score sentences according to the overlap with answer type, checked against the answer type and the semantic entities found by IdentiFinder
Evaluation of the results: Reformulations led to more correct answers when used in conjunction with a large corpus like the Web.
Conclusion Likelihood of finding correct answers is increased by QR IR module produces higher quality answer candidates Scoring precision is increased for answer candidates A strong match with a reformulation provides additional confidence in the correctness of the answer