270 likes | 407 Views
Using Information Extraction for Question Answering. Done by Rani Qumsiyeh. Problem. More Information added to the web everyday. Search engines exist but they have a problem This calls for a different kind of search engine. History of QA. QA can be dated back to the 1960’s
E N D
Using Information Extraction for Question Answering Done by Rani Qumsiyeh
Problem • More Information added to the web everyday. • Search engines exist but they have a problem • This calls for a different kind of search engine.
History of QA • QA can be dated back to the 1960’s • Two common approaches to design QA: • Information Extraction • Information Retrieval • Two conferences to evaluate QA systems • TREC (Text REtrieval Conference) • MUC (Message Understanding Conference)
Common Issues with QA systems • Information retrieval deals with keywords. • Information extraction learns the question. • The question could have multiple variations which means • Easier for IR but more broad results • Harder for IE but more EXACT results
Message Understanding Conference (MUC) • Sponsored by the Defense Advanced Research Projects Agency (DARPA) 1991-1998. • Developed methods for formal evaluation of IE systems • In the form of a competition, where the participants compare their results with each other and against human annotators‘ key templates. • Short system preparation time to stimulate portability to new extraction problems. Only 1 month to adapt the system to the new scenario before the formal run.
Evaluation Metrics • Precision and recall: • Precision: correct answers/answers produced • Recall: correct answers/total possible answers • F-measure • Where is a parameter representing relative importance of P & R: • E.g., =1, then P&R equal weight, =0, then only P • Current State-of-Art: F=.60 barrier
MUC Extraction Tasks • Named Entity task (NE) • Template Element task (TE) • Template Relation task (TR) • Scenario Template task (ST) • Coreference task (CO)
Named Entity Task (NE) • Mark into the text each string that represents, a person, organization, or location name, or a date or time, or a currency or percentage figure
Template Element Task (TE) • Extract basic information related to organization, person, and artifact entities, drawing evidence from everywhere in the text.
Template Relation task (TR) • Extract relational information on employee_of, manufacture_of, location_of relations etc. (TR expresses domain independent relationships between entities identified by TE)
Scenario Template task (ST) • Extract prespecified event information and relate the event information to particular organization, person, or artifact entities (ST identifies domain and task specific entities and relations)
Coreference task (CO) • Capture information on corefering expressions, i.e. all mentions of a given entity, including those marked in NE and TE (Nouns, Noun phrases, Pronouns)
An Example • The shiny red rocket was fired on Tuesday. It is the brainchild of Dr. Big Head. Dr. Head is a staff scientist at We Build Rockets Inc. • NE: entities are rocket, Tuesday, Dr. Head and We Build Rockets • CO: it refers to the rocket; Dr. Head and Dr. Big Head are the same • TE: the rocket is shiny red and Head‘s brainchild • TR: Dr. Head works for We Build Rockets Inc. • ST: a rocket launching event occurred with the various participants.
Scoring templates • Templates are compared on a slot-by-slot basis • Correct: response = key • Partial: response » key • Incorrect: response != key • Spurious: key is blank • overgen=spurious/actual • Missing: response is blank
KnowitAll, TextRunner, KnowitNow • Differ in implementation, but do the same thing.
Using them as QA systems • Able to handle questions that produce 1 relation • Who is the president of the US? “can handle” • Who was the president of the US in 1998? “fails” • Produces a huge number of facts that the user still has to go through.
Textract • Aims at solving ambiguity in text by introducing more named entities. • What is Julian Werver Hill's wife's telephone number? • equivalent to: What is Polly's telephone number? • Where is Werver Hill's affiliated company located? • equivalent to: Where is Microsoft located?
Proposed System • Determine what named entity we are looking for using Textract. • Use Part of Speech tagging. • Use TextRunner as the basis for search. • Use WordNet to find synonyms. • Use extra entities in text as “constraints”
Example • (WP who) (VBD was) (DT the) (JJ first) (NN man) (TO to) (VB land) (IN on) (DT the) (NN moon) • The verb (VB) is treated as the argument. • The noun (NN) is treated as the predicate • We make sure that position is maintained • We keep prepositions if they have two nouns. (president of the US) • Other non stop words are constraints, i.e., “first”
Anaphora Resolution • Use anaphora resolution to determine that landed is not associated with landed but wrote instead.
Use Synonyms • We use word net to find possible synonyms for verbs and nouns to produce more facts. • We only consider 3 synonyms as it takes more time the more fact retrievals we have to do.
Delimitations • Works well with Who, When, Where questions as named entity is easily determined. • Achieves about 90% accuracy on all • Works less well with What, How questions • Achieves about 70% accuracy • Takes about 13 seconds to answer question.
Future Work • Build an ontology to determine named entity and parse question (faster) • Handle combinations of questions. • When and where did the holocaust happen?