290 likes | 503 Views
Semantic Retrieval for Question Answering. Student Research Symposium Language Technologies Institute Matthew W. Bilotti mbilotti@cs.cmu.edu September 23, 2005. Outline. What is Question Answering? What is the cause of wrong answers? What is Semantic Retrieval, and can it help?
E N D
Semantic Retrievalfor Question Answering Student Research Symposium Language Technologies Institute Matthew W. Bilotti mbilotti@cs.cmu.edu September 23, 2005
Outline • What is Question Answering? • What is the cause of wrong answers? • What is Semantic Retrieval, and can it help? • What have other teams tried? • How is JAVELIN using Semantic Retrieval? • How can we evaluate the impact of Semantic Retrieval on Question Answering systems? • Where can we go from here?
Question Answering Output Answers Input Question What is Question Answering? • A process that finds succinct answers to questions phrased in natural language Q: “Where is Carnegie Mellon?” A: “Pittsburgh, Pennsylvania, USA” Q: “Who is Jared Cohon?” A: “... is the current President of Carnegie Mellon University?” Q: “When was Herbert Simon born?” A: “15 June 1916” Google. http://www.google.com
Question Analysis Document Retrieval Answer Extraction Post- Processing Classic “Pipelined” QA Architecture • A sequence of discrete modules cascaded such that the output of the previous module is the input to the next module. Output Answers Input Question
Question Analysis Document Retrieval Answer Extraction Post- Processing Classic “Pipelined” QA Architecture “Where was Andy Warhol born? Output Answers Input Question
Question Analysis Document Retrieval Answer Extraction Post- Processing Classic “Pipelined” QA Architecture “Where was Andy Warhol born? Output Answers Input Question Discover keywords in the question, generate alternations, and determine answer type. Keywords: Andy (Andrew), Warhol, born Answer type: Location (City)
Question Analysis Document Retrieval Answer Extraction Post- Processing Classic “Pipelined” QA Architecture Output Answers Input Question Formulate IR queries using the keywords, and retrieve answer-bearing documents ( Andy OR Andrew ) AND Warhol AND born
Question Analysis Document Retrieval Answer Extraction Post- Processing Classic “Pipelined” QA Architecture Output Answers Input Question “Andy Warhol was born on August 6, 1928 in Pittsburgh and died February 22, 1927 in New York.” Extract answers of the expected type from retrieved documents. “Andy Warhol was born to Slovak immigrants as Andrew Warhola on August 6, 1928, on 73 Orr Street in Soho, Pittsburgh, Pennsylvania.”
Question Analysis Document Retrieval Answer Extraction Post- Processing Classic “Pipelined” QA Architecture Output Answers Input Question Pittsburgh 1. “Pittsburgh, Pennsylvania” 2. “New York” select appropriate granularity Pittsburgh, Pennsylvania merge 1. 73 Orr Street in Soho, Pittsburgh, Pennsylvania Answer cleanup and merging, consistency or constraint checking, answer selection and presentation. 2. New York rank
Question Analysis Document Retrieval Answer Extraction Post- Processing What is the cause of wrong answers? Failure point • A pipelined QA system is only as good as its weakest module • Poor retrieval and/or query formulation can result in low ranks for answer-bearing documents, or no answer-bearing documents retrieved Output Answers Input Question
What is Semantic Retrieval, and can it help? • Semantic Retrieval is a broad term for a document retrieval technique that makes use of semantic information and language understanding • Hypothesis: Use of Semantic Retrieval can improve performance, retrieving more, and more highly-ranked, relevant documents
What have other teams tried? • LCC/SMU approach • Use an existing IR system as a black box; rich query expansion • CL Research approach • Process top documents retrieved from an IR engine, extracting semantic relation triples, index and retrieve using RDBMS • IBM (Prager) Predictive Annotation • Store answer types (QA-Tokens) in the IR system’s index, and retrieve on them
LCC/SMU Approach • Syntactic relationships (controlled synonymy), morphological and derivational expansions for Boolean keywords • Statistical passage extraction finds windows around keywords • Semantic constraint check for filtering (unification) • NE recognition and pattern matching as a third pass for answer extraction • Ad hoc relevance scoring: term proximity, occurrence of answer in an apposition, etc Extended Wordnet Named Entity Extraction Passage Extraction Constraint Checking Boolean query IR Keywords and Alternations Passages Documents Answer Candidates Moldovan, et. al., Performance issues and error analysis in an open-domain QA system, ACM TOIS, vol. 21, no. 2. 2003
Litkowski/CL Research Approach • Relation triples: discourse entity (NP) + semantic role or relation + governing word; essentially similar to our predicates • Unranked XPath querying against RDBMS entity mention canonicalization jumped lazy dog The quick brown fox jumped over the lazy dog. quick brown fox RDBMS Docs Semantic relationship triples Sentences 10-20 top PRISE documents XML/xpath Litkowski, K.C. Question Answering Using XML-Tagged Documents. TREC 2003
Predictive Annotation • Textract identifies candidate answers at indexing time • QA-Tokens are indexed as text items along with actual doc tokens • Passage retrieval, with simple bag-of-words combo-match (heuristic) ranking formula Gasoline cost $0.78 MONEY$ per gallon VOLUME$ in 1999 YEAR$. Answer type taxonomy Gasoline cost $0.78 per gallon in 1999. Textract (IE/NLP) IR Docs Corpus QA-Tokens Prager, et. al. Question-answering by predictive annotation. SIGIR 2000
How is JAVELIN using Semantic Retrieval? • Annotate corpus with semantic content (e.g. predicates), and index this content • At runtime, perform similar analysis on input questions to get predicate templates • Maximal recall of documents that contain matching predicate instances • Constraint checking at the answer extraction stage to filter out false positives and rank best matches Nyberg, et. al. “Extending the JAVELIN QA System with Domain Semantics”, AAAI 2005.
Annotating and Indexing the Corpus Predicate- Argument Structure loves ARG1 ARG0 John Mary Actual Index Content loves ARG1 Mary Annotation Framework John ARG0 RDBMS Indexer IR Text Nyberg, et. al. “Extending the JAVELIN QA System with Domain Semantics”, AAAI 2005.
Question Analysis Document Retrieval Answer Extraction Post- Processing Retrieval on Predicate-Argument Structure “Who does John love?" Output Answers Input Question Nyberg, et. al. “Extending the JAVELIN QA System with Domain Semantics”, AAAI 2005.
Question Analysis Document Retrieval Answer Extraction Post- Processing Retrieval on Predicate-Argument Structure “Who does John love?" Output Answers Input Question loves ARG0 ARG1 John ?x Predicate-Argument Template Nyberg, et. al. “Extending the JAVELIN QA System with Domain Semantics”, AAAI 2005.
Question Analysis Document Retrieval Answer Extraction Post- Processing Retrieval on Predicate-Argument Structure “Who does John love?" Output Answers Input Question IR What the IR engine sees: loves Some Retrieved Documents: ARG0 ARG1 John ?x “Frank loves Alice. John dislikes Bob." "Johnloves Mary.” Nyberg, et. al. “Extending the JAVELIN QA System with Domain Semantics”, AAAI 2005.
Question Analysis Document Retrieval Answer Extraction Post- Processing Retrieval on Predicate-Argument Structure “Who does John love?" Output Answers Input Question “Mary” X “Frank loves Alice. John dislikes Bob." RDBMS "Johnloves Mary.” loves loves ARG0 ARG1 ARG0 ARG1 Frank Alice dislikes John Mary ARG0 ARG1 John Bob Matching Predicate Instance Nyberg, et. al. “Extending the JAVELIN QA System with Domain Semantics”, AAAI 2005.
How can we evaluate the impact of Semantic Retrieval on QA systems? • Performance can be indirectly evaluated by measuring the performance of the end-to-end QA system while varying the document retrieval strategy employed, in one of two ways: • NIST-style comparative evaluation • Absolute evaluation against new test sets • Direct analysis of document retrieval performance • Requires an assumption such as, “maximal recall of relevant documents translates to best end-to-end system performance”
NIST-style Comparative Evaluation • Answer keys developed by pooling • All answers gathered by all systems are checked by a human to develop the answer key • Voorhees showed that the comparative orderings between systems are stable regardless of exhaustiveness of judgments • Answer keys from TREC evaluations are never suitable for post-hoc evaluation (nor were they intended to be), since they may penalize a new strategy for discovering good answers not in the original pool • Manual scoring • Judging system output involves semantics (Voorhees 2003) • Abstract away from differences in vocabulary or syntax, and robustly handle paraphrase • This is the same methodology used in the Definition QA evaluation in TREC 2003 and 2004
Absolute Evaluation • Requires building new test collections • Not dependent on pooled results from systems, so suitable for post-hoc experimentation • Human effort is required; a methodology is described in (Katz and Lin 2005), (Bilotti, Katz and Lin 2004) and (Bilotti 2004) • Automatic scoring methods based on n-grams, or fuzzy unification on predicate-argument structure (Lin and Demner-Fushman 2005), (Vandurme et al. 2003) can be applied • Can evaluate at the level of documents or passages retrieved, predicates matched, or answers extracted, depending on the level of detail in the test set.
Preliminary Results:TREC 2005 Relationship QA Track • 25 scenario-type questions; the first time such questions have occurred officially in the TREC QA track • Semi-automatic runs were allowed: JAVELIN submitted a second run using manual question analysis • Results (in MRR of relevant nuggets): • Run 1: 0.1356 • Run 2: 0.5303 • Example on the next slide!
Example: Question Analysis The analyst is interested in Iraqi oil smuggling. Specifically, is Iraq smuggling oil to other countries, and if so, which countries? In addition, who is behind the Iraqi oil smuggling? interested smuggling ARG0 ARG1 ARG0 ARG2 The analyst Iraqi oil smuggling Iraq which countries ARG1 oil smuggling is behind ARG0 ARG2 Iraq ARG0 ARG1 other countries Who the Iraqi oil smuggling ARG1 oil
Example: Results The analyst is interested in Iraqi oil smuggling. Specifically, is Iraq smuggling oil to other countries, and if so, which countries? In addition, who is behind the Iraqi oil smuggling? 1. “The amount of oil smuggled out of Iraq has doubled since August last year, when oil prices began to increase,” Gradeck said in a telephone interview Wednesday from Bahrain. 2. U.S.: Russian Tanker Had Iraqi Oil By ROBERT BURNS, AP Military Writer WASHINGTON (AP) – Tests of oil samples taken from a Russian tanker suspected of violating the U.N. embargo on Iraq show that it was loaded with petroleum products derived from both Iranian and Iraqi crude, two senior defense officials said. 5. With no American or allied effort to impede the traffic, between 50,000 and 60,000 barrels of Iraqi oil and fuel products a day are now being smuggled along the Turkish route, Clinton administration officials estimate. (7 of 15 relevant)
Where do we go from here? • What to index and how to represent it • Moving to Indri1 allows exact representation of our predicate structure in the index • Building a Scenario QA test collection • Query formulation and relaxation • Learning or planning strategies • Ranking retrieved predicate instances • Aggregating information across documents • Inference and evidence combination • Extracting answers from predicate-argument structure 1. http://www.lemurproject.org
References • Bilotti. Query Expansion Techniques for Question Answering. Masters’ Thesis, MIT. 2004. • Bilotti, et. al. What Works Better for Question Answering: Stemming or Morphological Query Expansion? IR4QA workshop at SIGIR 2004. • Lin and Demner-Fushman. Automatically Evaluating Answers to Definition Questions. HLT/EMNLP 2005. • Litkowski, K.C. Question Answering Using XML-Tagged Documents. TREC 2003. • Metzler and Croft. Combining the Language Model and Inference Network Approaches to Retrieval. Information Processing and Management Special Issue on Bayesian Networks and Information Retrieval, 40(5), 735-750, 2004. • Metzler, et. al., Indri at TREC 2004: Terabyte Track. TREC 2004. • Moldovan, et. al., Performance issues and error analysis in an open-domain question answering system, ACM TOIS, vol. 21, no. 2. 2003. • Nyberg, et. al. “Extending the JAVELIN QA System with Domain Semantics”, Proceedings of the 20th National Conference on Artificial Intelligence (AAAI 2005). • Pradhan, S., et. al. Shallow Semantic Parsing using Support Vector Machines. HTL/NAACL-2004. • Prager, et. al. Question-answering by predictive annotation. SIGIR 2000. • Vandurme, B. et. al. Towards Light Semantic Processing for Question Answering. HLT/NAACL 2003. • Voorhees, E. Overview of the TREC 2003 question answering track. TREC 2003.