1 / 36

Acquiring Syntactic and Semantic Transformations in Question Answering

Learn about the complexity of factoid QA, the challenges with answer sentences, and the usefulness of QASPs for acquiring syntactic and semantic reformulations in QA.

dnicole
Download Presentation

Acquiring Syntactic and Semantic Transformations in Question Answering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Acquiring Syntactic and Semantic Transformations in Question Answering MSR Summer School 2010 Michael Kaisser Was: PhD student, School of Informatics, University of Edinburgh Now: Program Manager, Search Technology Center Munich, Bing

  2. Overview • What is Question Answering (QA)? • Why is QA difficult? • A Corpus of Question-Answer Sentence Pairs (QASPs) • Acquiring reformulation rules from the QASP Corpus

  3. Part 1What is Question Answering and why is it difficult?

  4. What is factoid QA? • Question answering (QA) is the task of automatically answering a question posed in natural language.  something like a search engine. • Usually, a QA system searches for the answer in a collection of natural language texts. • This might be a news paper corpus or the WWW (or something else).

  5. What is factoid QA?

  6. Why is factoid QA difficult? • Questions are fairly simple. • But what about the sentences containing the answers? Average length in words (TREC 02-06 data): Questions: 8.14 (st. dev. 2.81) Answer Sentences: 28.99 (st. dev. 13.13) (in a corpus of news paper articles, e.g. From the NYT.)

  7. Why is factoid QA difficult? But what about the sentences containing the answers? Who defeated the Spanish armada? "The old part of Plymouth city clusters around the Hoe, the famous patch of turf on which Drake is said to have finished a calm game of bowls before heading off to defeat the invading Spanish Armada in 1588.“ What day did Neil Armstrong land on the moon? "Charlie Duke, Jim Lovell, Apollo 11's back-up commander, and Fred Haise, the back-up lunar module pilot, during the tense moments before thelunar module carrying Neil Armstrong and Edwin ``Buzz'' Aldrin Jr. landed on the moon on July 20, 1969."

  8. Why is factoid QA difficult? But what about the sentences containing the answers? Average length in words (TREC 02-06 data): Questions: 8.14 (st. dev. 2.81) Answer Sentences: 28.99 (st. dev. 13.13) The problematic part of factoid QA is not to the questions, but the answer sentences.

  9. Why is factoid QA difficult? The problematic part of factoid QA are not the questions, but the answer sentences. The problem here are the many syntactic and semantic possibilities in which an answer to a question can be formulated. (Paraphrasing)

  10. Why is factoid QA difficult? The problematic part of factoid QA are not the questions, but the answer sentences. The problem here are the many syntactic and semantic possibilities in which an answer to a question can be formulated. (Paraphrasing) Who can we deal with this? How can we detect all these possible answer sentence formulations?

  11. Part 2A Corpus of Question Answer Sentence Pairs (QASPs)

  12. Usefulness of TREC data TREC publishes lots of valuable data: • question test sets • correct answers • lists of documents that contain the identified instances of the correct answers

  13. Usefulness of TREC data TREC publishes lots of valuable data: • question test sets • correct answers • lists of documents that contain the identified instances of the correct answers • yet,no answer sentences are identified

  14. Usefulness of TREC data TREC publishes lots of valuable data: • question test sets • correct answers • lists of documents that contain the identified instances of the correct answers • yet,no answer sentences are identified But maybe we can get these ourselves?

  15. Excursus: Mechanical Turk • Amazon Web Service • “Artificial Artificial Intelligence” • Requesters upload Human Intelligence Tasks (HITs) • Users (turkers) complete HITs for small monetary rewards

  16. QASP Corpus Creation TREC question Instructions AQUAINT document (shortened for screenshot) Input field for answer sentence Input field for answer

  17. QASP Corpus – Numbers Data collected: 8,830 QASPs Price for complete experiment: Approx. USD 650

  18. QASP Corpus - Examples 1396, XIE19961004.0048, "What is the name of the volcano that destroyed the ancient city of Pompeii?", "However, both sides made some gestures of appeasement before Chirac set off for the Italian resort city lying beside the Vesuve volcano which destroyed the Roman city of Pompeii.", "Vesuve", 1 1396, NYT19980607.0105, "What is the name of the volcano that destroyed the ancient city of Pompeii?", "The ruins of Pompeii, the ancient city wiped out in A.D. 79 by the eruption at Vesuvius, are Italy's most popular tourist attraction, visited by two million people a year.", "Vesuvius", 1 1396, NYT19981201.0229, "What is the name of the volcano that destroyed the ancient city of Pompeii?", "Visiting tourists enter the excavated ruins of the city - buried by the eruption of Mount Vesuvius - via a tunnel through the defensive walls that surround it, just as visiting traders did 2,000 years ago.", "Mount Vesuvius", C

  19. Part 3Acquiring Syntactic and Semantic Reformulations from the QASP corpus

  20. Employing the QASP Corpus We will use the QASP corpus to learn syntactic structures of answer sentences for classes of questions. Algorithm has three steps: • Rule Creation • Rule Evaluation • Rule Execution

  21. Employing the QASP Corpus Example QASP: Q: “Who is Tom Cruise married to?” AS: “Tom Cruise and Nicole Kidman are married.” A: “Nicole Kidman” (Side note: Data from 2000.) Question matches pattern “Who+is+NP+VERB+to+?” Non-stop-word constituents are: NP (“Tom Cruise”), V (“married”) These and the answer can be found in the answer sentence.

  22. Employing the QASP Corpus “Nicole Kidman and Tom Cruise are married.” Sentence is parsed with the Stanford Parser. Paths from each question constituent to the answer are extracted and stored in a rule: Pattern: Who[1]+is[2]+NP[3]+VERB[4]+to[5] Path 3: ↑conj Path 4: ↓nsubj

  23. Employing the QASP Corpus “Nicole Kidman and Tom Cruise are married.” Pattern: Who[1]+is[2]+NP[3]+VERB[4]+to[5] Path 3: ↑conj Path 4: ↓nsubj 1:Nicole (Nicole,NNP,2)[nn] 2:Kidman (Kidman,NNP,7)[nsubj] 3:and (and,CC,2)[cc] 4:Tom (Tom,NNP,5)[nn] 5:Cruise (Cruise,NNP,2)[conj] 6:are (be,VBP,7)[cop] 7:married (married,JJ,null)[ROOT] married cop nsubj are Kidman nn conj cc Nicole and Cruise nn Tom

  24. Employing the QASP Corpus “Tom CruisemarriedNicole Kidman in 1990.” Pattern: Who[1]+is[2]+NP[3]+VERB[4]+to[5] Path 3: ↑nsubj ↓dobj Path 4: ↓dobj “Tom Cruise is married to Nicole Kidman.” Pattern: Who[1]+is[2]+NP[3]+VERB[4]+to[5] Path 3: ↑nsubjpass ↓prep ↓pobj Path 4: ↓prep ↓pobj

  25. Employing the QASP Corpus “Tom Cruise is married to Nicole Kidman.” Pattern: Who[1]+is[2]+NP[3]+VERB[4]+to[5] Path 3: ↑nsubjpass ↓prep ↓pobj Path 4: ↓prep ↓pobj • Process is repeated for all QASPs (a test set might be set aside) • All rules are stored in a file.

  26. Employing the QASP Corpus Rule evaluation: • For each question in the corpus • Search for candidate sentences in AQUAINT corpus using Lucene. • Test if paths are present and point to the same node. Check if answer is correct. Store results in file. Pattern: Who[1]+is[2]+NP[3]+VERB[4]+to[5] Path 3: ↑nsubj ↓dobj Path 4: ↓dobj correct: 5 incorrect: 3

  27. Employing the QASP Corpus Pattern: Who[1]+is[2]+NP[3]+VERB[4]+to[5] Path 3: ↑nsubj ↓dobj Path 4: ↓dobj correct: 5 incorrect: 3 correct Pattern precision p = correct + incorrect (see e.g. Ravichandran and Hovy, 2002)

  28. Employing the QASP Corpus Rule execution: • Very similar to rule evaluation • For each question in the corpus • Search for candidate sentences in AQUAINT corpus using Lucene. • Test if paths are present and point to the same node. If so, extract answer.

  29. Finally...

  30. Employing the QASP Corpus Results evaluation set 3: * Results rise to 0.278 / 0.411 with semantic alignment (not covered in this talk).

  31. Employing the QASP Corpus Comparison with baseline: • Baseline gets syntactic answer structures from questions not from answer sentences, otherwise similar. • Large tradition in QA to do this: • Katz and Lin, 2003; Punyakanok et al., 2004; Bouma et al., 2005; Cui et al., 2005 etc. • Baseline is very simple, uses none of commonly used improvements (e.g. fuzzy matching), but so does the method proposed here. • Baseline performance (Evaluation set 3): • 0.068 (accuracy overall) compared to 0.278 (+308%) • 0.100 (accuracy if rule exists) compared to 0.411 (+311%)

  32. Employing the QASP Corpus

  33. Part 4STC Europe

  34. STC Europe • Program Manager at STC Europe in Munich • STC = Search Technology Center • We have sites in London, Munich, Paris and soon in Poland • In Munich we work on Relevance (Quality of our 10 blue links) for European markets • In my group, we work on query alterations

  35. Thank You! PS: QASP corpus downloadable on my homepage (http://homepages.inf.ed.ac.uk/s0570760/).

  36. References [Fillmore and Lowe, 1998] Fillmore, C. F. B. C. J. and Lowe, J. B. (1998). The BerkeleyFrameNet Project. In Proceedings of COLING-ACL 1998. [Gildea and Jurafsky, 2002] Gildea, D. and Jurafsky, D. (2002). Automatic Labeling of Semantic Roles. Computatrional Linguistics, 28(3):245–288. [Lin and Katz, 2005] Lin, J. and Katz, B. (2005). Building a Reusable Test Collection for Question Answering. Journal of the American Society for Information Science and Technology. [Xue and Palmer, 2004] Xue, N. and Palmer, M. (2004). Calibrating Features for Semantic Role Labeling. In Proceedings of EMNLP 2004, Barcelona, Spain.

More Related