300 likes | 475 Views
A Pattern Based Approach to Answering Factoid, List and Definition Questions. Mark A. Greenwood and Horacio Saggion Natural Language Processing Group Department of Computer Science University of Sheffield, UK. Outline of Talk. What is Question Answering? Different Question Types
E N D
A Pattern Based Approach to AnsweringFactoid, List and Definition Questions Mark A. Greenwood and Horacio Saggion Natural Language Processing Group Department of Computer Science University of Sheffield, UK
Outline of Talk • What is Question Answering? • Different Question Types • System Description • Factoid and List Questions • System Architecture • Surface Matching Text Patterns • Fallback to Semantic Entities • Definition Questions • System Architecture • Knowledge Acquisition • Locating Possible Definitions • Results and Evaluation • Factoid and List Questions • Definition Questions • Conclusions and Future Work RIAO 2004
What is Question Answering? • The main aim of QA is to present the user with a short answer to a question rather than a list of possibly relevant documents. • As it becomes more and more difficult to find answers on the WWW using standard search engines, question answering technology will become increasingly important. • Answering questions using the web is already enough of a problem for it to appear in fiction (Marshall, 2002): “I like the Internet. Really, I do. Any time I need a piece of shareware or I want to find out the weather in Bogotá… I’m the first guy to get the modem humming. But as a source of information, it sucks. You got a billion pieces of data, struggling to be heard and seen and downloaded, and anything I want to know seems to get trampled underfoot in the crowd.” RIAO 2004
Different Question Types • Clearly there are many different types of questions which a user can ask. The system discussed in this presentation attempts to answer: • Factoid Questions usually require a single fact as answer and include questions such as “How high is Everest?” or “When was Mozart born?”. • List Questions require multiple facts to be returned in answer to a question. Examples are “Name 22 cities that have a subway system” or “Name companies which manufacture tractors”. • Definition Questions, such as “What is aspirin?”, which require answers covering essential (e.g. “aspirin is a drug”) as well as non-essential (e.g. “aspirin is a blood thinner”) descriptions of the definiendum (the term being defined). • The system makes no attempt to answer other question types. For example speculative questions, such as “Is the airline industry in trouble?” are not handled. RIAO 2004
Outline of Talk • What is Question Answering? • Different Question Types • System Description • Factoid and List Questions • System Architecture • Surface Matching Text Patterns • Fallback to Semantic Entities • Definition Questions • System Architecture • Knowledge Acquisition • Locating Possible Definitions • Results and Evaluation • Factoid and List Questions • Definition Questions • Conclusions and Future Work RIAO 2004
System Description • As the three types of questions of questions require different techniques to answer them the system consists of two sub-systems: • Factoid: This system answers both the factoid and list questions. For factoid questions the system returns the best answers and for list questions the system returns all the answers it found. • Definition: This system is only responsible for answering the definition questions. • The rest of this section will provide an overview of both systems and how patterns are used to answer the differing question types. RIAO 2004
Factoid System Architecture RIAO 2004
Surface Text Patterns • Learning patterns which can be used to find answers involves a two stage process: • The first stage is to learn a set of patterns from a set of question-answer pairs. • The second stage involves assigning a precision to each pattern and discarding those patterns which are tied to a specific question-answer pair. • To explain the process we will use questions of the form “When was X born?”: • As a concrete example we will use “When was Mozart born?”. • For which the question-answer pair is: • Mozart • 1756 RIAO 2004
Surface Text Patterns • The first stage is to learn a set of patterns from the question-answer pairs for a specific question type: • For each example the question and answer terms are submitted to Google and the top ten documents are downloaded. • Each document then has the question and answer terms replaced by AnCHoR and AnSWeR respectively. • Depending upon the question type other replacements are also made, e.g. any dates may be replaced by a tag DatE. • Those sentences which contain both AnCHoR and AnSWeR are retained and joined together to create a single document. • This generated document is then used to build a token-level suffix tree, from which repeated strings containing both AnCHoR and AnSWeR and which do not span a sentence boundary are extracted as patterns. RIAO 2004
Surface Text Patterns • The result of the first stage is a set of patterns. For questions of the form “When was X born?” these may include: AnCHor ( AnSWeR – From AnCHoR ( AnSWeR – DatE ) AnCHor ( AnSWeR • Unfortunately some of these patterns may be specific to the question used to generate them. • So the second stage of the approach is concerned with filtering out these specific patterns to produce a set which can be used to answer unseen questions. RIAO 2004
Surface Text Patterns • The second stage of the approach requires a different set of question-answer pairs to those used in the first stage: • Within each of the top ten documents returned by Google, using only the question term: the question term is replaced by AnCHoR and the answer (if it is present) with AnSWeR and any other replacements made in the first stage are also carried out. • Those sentences which contain AnCHoR are retained. • All of the patterns from the first stage are converted to regular expressions designed to capture the token which appears in place of AnSWeR. • Each regular expression is then matched against each sentence and along with each pattern two counts are maintained: Ca which is the total number of times this pattern has matched and Cc which counts the number of times AnSWeR was selected as the answer. • After a pattern has been matched against every sentence if Cc is less than 5 then it is discarded otherwise it’s precision is calculated as Cc/Ca and the pattern is retained only if the precision is greater than 0.1. RIAO 2004
Surface Text Patterns • The result of assigning precision to patterns in this way is a set of precisions and regular expressions such as: 0.967: AnCHoR \( ([^ ]+) - DatE 0.566: AnCHoR \( ([^ ]+) 0.263: AnCHoR ([^ ]+) – • These patterns can then be used to answer unseen questions: • The question term is submitted to Okapi and the top 20 returned documents have the question term replaced with AnCHoR and any other replacments necessary are also made. • Those sentences which contain AnCHoR are extracted and combined to make a single document. • Each pattern is then applied to each sentence to extract possible answers. • All the answers found are sorted based firstly on the precision of the pattern which selected it and secondly on the number of times the same answer was found. RIAO 2004
If Q contains ‘how’ and ‘high’ then the semantic class, S, is measurement:distance 29,035 feet Fallback to Semantic Entities Okapi Q: How high is Everest? D1: Everest’s 29,035 feet is 5.4 miles above sea level… D2: At 29,035 feet the summit of Everest is the highest… # Known Entities 2 location(‘Everest’) 2 measurement:distance(‘29,035 feet’) 1 measurement:distance(‘5.4 miles’) RIAO 2004
Definition System • Definition questions such as “What is Goth?” contain very little information which can be used to retrieve relevant documents as they have almost nothing in common with potential answers: • “a subculture that started as one component of the punk rock scene” • “horror/mystery literature that is dark, eerie, and gloomy” • Having extra knowledge about the definiendum is important: • 217 sentences in AQUAINT contain the term “Goth”. • If we know that “Goth” seems to be associated with “subculture” in definition passages then we can narrow the search space. • Only 6 sentences in AQUAINT contain the terms “Goth” & “subculture”. • “the Goth subculture” • “gloomy subculture known as Goth” RIAO 2004
Definition System • To extract extra information about the definiendum we use a set of linguistic patterns which we instantiate with the definiendum, for example: • “X is a” • “such as X” • “X consists of” • The patterns match many sentences some of which are definition bearing and some of which are not: • “Goth is a subculture” • “Becoming a Goth is a process that demands lots of effort” • These patterns can be used to find terms which regularly appear along with the definiendum, outside of the target collection. RIAO 2004
Definition System Architecture RIAO 2004
Knowledge Acquisition • We parse the question in order to extract the definiendum • We then use the linguistic patterns (“Goth is a”, “such as Goth”…) to find definition-bearing passages in: • WordNet • Britannica • Web • From these source we extract words (nouns, adjectives, verbs) and their frequencies from definition-bearing sentences. • A sentence is definition bearing if: • WordNet: the gloss of the definiendum and any associated hypernyms. • Britannica: only if the sentence contains the definiendum. • Web: only if sentence contains one of the linguistic patterns. RIAO 2004
Knowledge Acquisition • We retain all the words extracted from WordNet and all those words which occurred more than once. The words are sorted based on their frequency of occurrence. • A list of n secondary terms to be used for query expansion is formed: • All terms found in WordNet, m • A maximum of (n – m) / 2 terms from Britannica • The list is expanded to size n with terms found on the web RIAO 2004
Locating Possible Definitions • An IR query consisting of all the words in the question as well as the acquired secondary terms is submitted to Okapi and the 20 most relevant passage are retrieved. • Sentence which pass one of the following tests are then extracted as definition candidates: • The sentence matches one of the linguistic patterns. • The sentence contains the definiendum and at least 3 secondary terms • To avoid the inclusion of unnecessary information we discard the sentence prefix which does not contain either the definiendum or any secondary terms. RIAO 2004
Locating Possible Definitions • Equivalent definitions are identified via the vector space model using the cosine similarity measure, and only one definition is retained. • For example, the following two definitions are similar and only one would be retained by the system: • “the Goth subculture” • “gloomy subculture known as Goth” RIAO 2004
Outline of Talk • What is Question Answering? • Different Question Types • System Description • Factoid and List Questions • System Architecture • Surface Matching Text Patterns • Fallback to Semantic Entities • Definition Questions • System Architecture • Knowledge Acquisition • Locating Possible Definitions • Results and Evaluation • Factoid and List Questions • Definition Questions • Conclusions and Future Work RIAO 2004
Results and Evaluation • The system was independently evaluated as part of the TREC 2003 question answering evaluation. This consisted of answer 413 factoid questions, 37 list questions and 50 definition questions. • For further details on the evaluation metrics used by NIST see (Voorhees, 2003). RIAO 2004
Results & Evaluation: Factoid • Unfortunately only 12 of the 413 factoid questions were suitable to be answered by the pattern sets. • Even worse is the fact that none of the patterns were able to select any answers, correct or otherwise. • The fallback system correctly identified the answer type for 241 of the 413 questions • 53 were given an incorrect type. • 119 were outside the scope of the system. • Okapi only located relevant documents for 131 of the questions the system could answer giving: • a maximum attainable score of 0.317 (131/413) • An official score of 0.138 (57/413) which contained 15 correct NIL responses so… • The system answered 42 questions giving a score of 0.102, 32% of the maximum score. RIAO 2004
Results & Evaluation: List • Similar problems occurred when the system was used to answer list questions. • Over 37 questions only 20 distinct correct answers were returned • Giving an official F-score of 0.029 • The ability of the system to locate a reasonable number of correct answers was offset as many answers were returned per question. • There are seven known answers (in AQUAINT) to the question “What countries have won the men’s World Cup for soccer?” • This system returned 32 answers only two of which were correct • This gives recall of 0.286 but precision of only 0.062 RIAO 2004
Results & Evaluation: Definition • Definition systems are evaluated based on their ability to return information nuggets (snippets of text containing information that helps define the definiendum). Some of these nuggets are considered essential, i.e. a full definition must contain them. • Our system produced answers for 28 of the 50 questions, 23 of which contained at least one essential nugget. • The official score for the system was 0.236 placing the system 9th out of the 25 participants. • The knowledge acquisition step provided relevant secondary terms for a number of questions. • WordNet helped in 4 cases • Britannica helped in 5 cases • Web helped in 39 cases RIAO 2004
Outline of Talk • What is Question Answering? • Different Question Types • System Description • Factoid and List Questions • System Architecture • Surface Matching Text Patterns • Fallback to Semantic Entities • Definition Questions • System Architecture • Knowledge Acquisition • Locating Possible Definitions • Results and Evaluation • Factoid and List Questions • Definition Questions • Conclusions and Future Work RIAO 2004
Conclusions • When using patterns for answering factoid and list questions the surface text patterns should probably be acquired from a source with similar writing style to the collection from which answers will be drawn. • Here we used the web to acquire the patterns and used them to find answers in the AQUAINT collection which have differing writing styles. • Using patterns to answering definition questions while more successful than the factoid system still has it’s problems: • The filters used to determine if a passage is definition bearing is too restrictive. • Despite these failings the use of patterns for answering factoid, list and definition questions shows promise. RIAO 2004
Future Work • For the factoid and list QA system future work could include: • acquiring a wider range of pattern sets to cover more question types; • Using the full question not just the question term for passage retrieval; • For the definition QA system future research could include: • extracted secondary terms for definition questions could be ranked, perhaps using IDF values, to help to eliminate inappropriate matches (aspirin is a great choice for active people). • a syntactic-based technique that prunes parse trees could be implemented to extract better definition strings • coreference information could be used in combination with the extraction patterns; RIAO 2004
Any Questions? Copies of these slides can be found at: http://www.dcs.shef.ac.uk/~mark/phd/work/
Bibliography Hamish Cunningham, Diana Maynard, Kalina Bontcheva and Valentin Tablan. GATE: A framework and graphical development environment for robust NLP tools and applications. In Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics, 2002. Mark A. Greenwood and Robert Gaizauskas. Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering. In Proceedings of the Workshop on Natural Language Processing for Question Answering (EACL03), pages 29–34, Budapest, Hungary, April 14, 2003. Michael Marshall. The Straw Men. HarperCollins Publishers, 2002. Ellen M. Voorhees. Overview of the TREC 2003 Question Answering Track. In Proceedings of the 12th Text REtrieval Conference, 2003. RIAO 2004