130 likes | 138 Views
Explore advanced question answering techniques and methodologies to improve information retrieval, passage retrieval, and answer selection. Learn about successful systems, evaluation metrics, and the benefits of exploiting redundancy in question answering.
E N D
Question AnsweringFrom Zero to Hero Elena Eneva 11 Oct 2001 Advanced IR Seminar
Sources • TREC-9. 2001. • http://la.lti.cs.cmu.edu/Javelin • E. Voorhees. "The Overview of the TREC-9 Question Answering track." • J. Prager, E. Brown, A. Coden and D. Radev. "Question answering by predictive annotation." SIGIR '00. • C.L.A. Clarke, G.V. Cormack and T.R. Lynam. "Exploiting redundancy in question answering." In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2001. V P C
Question Answering • IR • Successful in large scale text search problems • Retrieve full documents • IE • Successful in extracting very precise answers from text • Work on pre-specified domains • Combining the strengths
QA track in TREC • Collection of unstructured documents (table 1 in V) • Short factual questions in English (Why can't ostriches fly ? Where did Bill Gates go to college ?) also figure 1 in V • Return answer as a ranked list of 5 fragments of documents (2 categories: 50 and 250 bytes)
Evaluation • By people • Reciprocal rank of first correct answer or 0 • % answers which were found • Strict and Lenient scores (supported and unsupported judgment) • Short and Long version
2 QA TREK systems • Question Answering by Predictive Annotation - Prager, Brown, Coden (IBM) and Radev (U of Michigan) • Exploiting Redundancy in Question Answering - Clarke, Cormack, Lynam (U of Waterloo) • Ranking - Table 2 in V
Exploiting Redundancy in Question Answering Figure 1 in C Question -> a query for submission to a passage retrieval component -> a set of selection rules what guides the process of extracting answers from the passages (answer category) Get a list of k passages Identify possible answers Rank the possible answers Question analysis – IR – IE
3 features with greatest contribution • Flexibility of the parser • Passage retrieval technique (high quality passages) • Redundancy in the answer selection component – contribution of evidence from multiple passages to identify the most likely answer
Passage Retrieval techniques • Each document D is an ordered sequence of terms D= d1 d2 d3 … dm • Extent (u, v) (minimal) • Query Q generated from the question Q={q1, q2, q3, …} • Compute the score for an extent(u, v) for which TQ is a cover • Higher scores to passages whose P of occurrence is lower
Redundancy • Each candidate term t is is assigned a weight that takes into account the number of distinct passages in which the term appears, as well as the relative frequency of the term in the database • Wt = Ct log (N/ft) • Ct is the number of distinct passages in which t appears • Summing the weights of a all terms in a candidate answer • Determine the first one, reduce weights to 0, do all over until have 5 • Figure 2 in C
Exploiting redundancy • “Who” questions • 100 GB corpus • K depth, W width • Figure 2 in C
Who wants to be a Millionaire? • Real life example • 70% correct overall • Figure 5 in C
Question answering by predictive annotation • IBM system • Shallow NLP • System structure Figure 1 in P • Annotation • Indexing