160 likes | 329 Views
Increasing the coverage of answer extraction by applying anaphora resolution IS-LTC October 10 2006 Jori Mur Humanities Computing University of Groningen. Outline. Background Question Answering (QA) Off-line answer extraction Anaphora resolution for answer extraction
E N D
Increasing the coverage of answer extraction by applying anaphora resolution IS-LTC October 10 2006 Jori Mur Humanities Computing University of Groningen
Outline • Background • Question Answering (QA) • Off-line answer extraction • Anaphora resolution for answer extraction • Anaphora resolution technique for definite nouns • Anaphora resolution technique for pronouns • Experiment and Results • Conclusion
Question Anwering (QA) • Task: Find an answer in a text collection to a question posed in a natural language. • Question: How old is John McEnroe?Answer: 35 years • Question: When was Hillary Clinton born?Answer: October 26 1947
Off-line answer extraction • Use dependency parser to parse the corpus • Define dependency patterns • [Location Name] has [Number] inhabitants<have, subj, [Location Name]><have, obj, inhabitants><inhabitants, det, [Number]> • Match dependency relations of sentence from text with dependency pattern • Extract and save facts
Text: McEnroe was injured on his right knee. [...] The problems with his knee kept bothering the 35-year old American for two weeks. Problem
Anaphora resolution for definite nouns • Modify patterns to match definite nouns • [Definite noun] has [Number] inhabitants<have, subj, [Definite noun]><have, obj, inhabitants><inhabitants, det, [Number]> • Create instance list using predicate and apposition relation • Select first preceding name, check if it occurs together with the noun at the instance list • Fall back: select first preceding name
Experiment • 12 question types • Age • Date of Birth • Location of Birth • Capital • Date of Death • Location of Death • Manner/Cause of Death • Age of Death • Founded • Function • Inhabitants • Winner
Experiment • Clef corpus for Dutch: Two newspapers (Algemeen Dagblad and NRC Handelsblad) • 1994 and 1995 • Simple predefined dependency patterns and patterns based on anaphora resolution • 200 Dutch Questions of Clef-2005 • QA system: Joost
Results for extraction • Around 10,900 fact-types extra
Results for QA • 200 questions from Clef-2005 data-set
Discussion of Results • Hypothesis 1: Precision should be increased. • Hypothesis 2: Selection of types was limited. • Hypothesis 3: Answers to questions occur in one sentence
Answer in one sentence • Question 107: Who was the pilot of the mission that repaired the astronomic satelite, the Hubble Space Telescope? • Text AD19940719: Bowersox was the pilot of the mission that repaired the astronomic satelite, the Hubble Space Telescope.
Conclusion • One way to improve the coverage of answer extraction is anaphora resolution • Although precision drops it doesn’t hurt the performance of QA. Result even improved. • It should be investigated what happens if the domain of question types on which anaphora resolution is applied is broadened • It should be investigated what happens if the questions are really independent of the corpus
Anaphora resolution for pronouns • Modify patterns to match pronouns • [Pronoun] has [Number] inhabitants<have, subj, [Pronoun]><have, obj, inhabitants><inhabitants, det, [Number]> • Create list of boys and girls names (baby names site at the internet) • Select first preceding name, check if it does not occur on the list of the opposite sexe of the pronoun • Fall back: select first preceding name
Text: NH19941209 35-year old McEnroe ... Question: How old is McEnroe ? Answer: 35 Example