1 / 26

PIQUANT

PIQUANT. A Question Answering system integrating Information Retrieval, Natural Language Processing and Knowledge Representation. Practical Intelligent QUestion ANswering Technology. Prime Contractor: IBM T.J. Watson Research Center 30 Saw Mill River Road Hawthorne, NY 10532.

Download Presentation

PIQUANT

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PIQUANT A Question Answering system integrating Information Retrieval, Natural Language Processing and Knowledge Representation Practical Intelligent QUestion ANswering Technology Prime Contractor: IBM T.J. Watson Research Center 30 Saw Mill River Road Hawthorne, NY 10532 Subcontractor: Cycorp Dec 3-5 2001

  2. IBM & CycorpBringing Complementary Strengths to QA } Both symbolic and statistical • IBM • Information Retrieval • Natural Language Processing • Scalable System Architectures • Business Applications Architectures • Cycorp • Structured Knowledge Representation • Rich Common Sense Knowledge Bases • Deep Inferencing • Ontologies Dec 3-5 2001

  3. Experience from TREC8-10 • End-to-end system that has performed well • Invaluable experience in learning where the problems are: • Coverage • Engineering • Understanding Dec 3-5 2001

  4. IBM’s PIQUANTPrincipal Extensions • Integration of IR/NLP with Structure KBs and Deep Inference • Knowledge System to assist in decomposing and answering questions • Provide justification and/or invalidation of candidate answers • Parallel Solution Paths and Pervasive Confidence Analysis • Multiple parallel solution approaches to problem/subproblem • Pervasive use of confidences to mediate management of alternatives • Extensive reinforcement of symbolic approaches by statistical data • Well-Defined Component Architecture • Modular • Defined interfaces between NLP, IR, KS and Statistical Components • Declarative representation of question answering plans Dec 3-5 2001

  5. Where Knowledge-Systems Help Heuristic of finding short passages with all the query terms/semantic classes is good but not sufficient. E.g. from TREC9: Q: How much folic acid should an expectant mother take daily? A: 360 tons Q: What is the diameter of the Earth? A: 14 ft. Q: How many states have a lottery? A: 3,312 We will investigate the use of a sophisticated inference engine and knowledge-base (Cyc) to eliminate such answers. Dec 3-5 2001

  6. Question Complexity “Simple” questions are not a solved problem: • Complex questions can be decomposed into simpler components. • If simpler questions cannot be handled successfully, there’s no hope for more complex ones. • Areas not explored (intentionally) by TREC to date: • spelling errors • grammatical errors • syntactic precision e.g. significance of articles • not, only, just … BUT: Dec 3-5 2001

  7. Complexity is a function for question and data source “simple” -> “simple to state” Is there such a thing as a “simple” question? Which is more complex? A: How many members are there in the Cabinet? Suppose there is no text that gives the answer explicitly B: What is the meaning of life? 42 (from HGTTG) Dec 3-5 2001

  8. Different Solution ApproachesWhat is the largest city in England? • Text Match • Find text that says “London is the largest city in England” (or paraphrase). Confidence is confidence of NL parser * confidence of source. Find multiple instances and confidence of source -> 1. • “Superlative” Search • Find a table of English cities and their populations, and sort. • Find a list of the 10 largest cities in the world, and see which are in England. • Uses logic: if L > all objects in set R then L > all objects in set E < R. • Find the population of as many individual English cities as possible, and choose the largest. • Heuristics • London is the capital of England. (Not guaranteed to imply it is the largest city, but quite likely.) • Complex Inference • E.g. “Birmingham is England’s second-largest city”; “Paris is larger than Birmingham”; “London is larger than Paris”; “London is in England”. Dec 3-5 2001

  9. Parallel Confidence Propagation Question Classifications Confidences . . . Selected Answers Candidate Answers QPLANS QFRAMES Validation and Sanity Checks Eliminate some Answers and Adjust Confidences Goals (logical forms) with boolean connectives, sequencing and recombination information Dec 3-5 2001

  10. Probability Management • Associated with every data element • A priori probabilities associated with every processing module. Given default values at first, then learned as experience is gained • Bayesian, Dempster-Shafer, … Dec 3-5 2001

  11. IBM PIQUANT High-Level Architecture Dec 3-5 2001

  12. IBM PIQUANT Block Diagram Dec 3-5 2001

  13. QA-Manager Internals NLP Components Linguistic Question Analysis QFRAME Ontology & Data Services Knowledge Representation Reasoning Services Question Classification WN CYC QFRAMES QPLANS IR QGOAL Plan Generation QPLAN Execution Engine DB KB Answer Candidates Answer Presentation Answer Resolution Answers Dec 3-5 2001

  14. Question Classification “Daemons” Classifiers act as “daemons”; perform recognition and sub-plan generation • Definition • What is OPEC? • Comparative & Superlative • Does Kuwait export more oil than Venezuela? • Which country exports the most uranium? • Profile • Who is Rabbani? • Relationship • Which countries are allies of Qatar? • Chronology • Was OPEC formed before Nixon became president? • Enumeration • How many oil refineries are in the U.S.? • Cause & Effect • Why did Iraq invade Kuwait? • Combination • Which countries are Qatar’s most powerful allies? Dec 3-5 2001

  15. Architectural Features • Modularity • Self-contained components with well-defined functions and interfaces • Ease of development, experimentation and maintenance • Robustness • If a “Knowledge Source” fails the system will continue to operate with (minor) degradation • Exploit redundancy to find best answer • Reinforcement • Multiple sources of evidence for same answer are synergistic • Transparency • Explicit plans permit ready generation of explanations and symbolic analysis Dec 3-5 2001

  16. IBM PIQUANT Implementation Highlights Dec 3-5 2001

  17. Implementation Highlights • Predictive Annotation • Shift computational burden from NLP towards IR • Index semantic labels along with text • Beat the Precision-Recall tradeoff by boosting precision at little cost to recall • Virtual Annotation • Answer definitional (“What is”) questions by combination of linguistic, ontological and statistical techniques • Find the hypernyms in e.g. WordNet that have the best combination of closeness and co-occurrence Dec 3-5 2001

  18. including COUNTRY$ in CONTINENT$ PLACE$ PLACE$ Belize central America Predictive Annotation (1) Predictive Annotation • Annotate entire corpus and index semantic labels along with text • Identify sought-after label(s) in questions and include in queries • Example: Question is “Where is Belize?” • “Where” can map to CONTINENT$, COUNTRY$, STATE$, CITY$, CAPITAL$, PLACE$. • Knowing Belize is a country: “Where is Belize?” {CONTINENT$ Belize} (assume CONTINENT$ = Continents plus sub-continental regions) • Suppose text is “… including Belize in central America … ” Dec 3-5 2001

  19. including COUNTRY$ in CONTINENT$ PLACE$ PLACE$ Belize central America Predictive Annotation (2) Increased precision of enhanced bag-of-words: • “Where is Belize”  {CONTINENT$ Belize} • Belize occurs 704 times in TREC corpus • Belize and CONTINENT$ co-occur in only 22 sentences • Note: data structure equally appropriate for “Name a country in Central America”, which  {COUNTRY$ Central America} Dec 3-5 2001

  20. Dec 3-5 2001

  21. Summary • Leverage existing technology base • Parallel approach to find answer, exploiting redundancy • Declarative plan representation • Associate confidences with each component and each intermediate and final result • CYC’s knowledge-base and inference engine to solve sub-problems and eliminate nonsensical answer candidates Dec 3-5 2001

  22. High-Level 1st Year Development Plan • Finalize design of data-structures: • QFRAME: question and derived attributes • QPLAN: script for tackling solution • QGOAL: logical-form like structure representing predicate for instantiation or verification • Build several recognizers and QPLAN executor (many pieces already exist) • Run on many examples to fine-tune and to develop a priori component confidence values • Build answer resolution module Dec 3-5 2001

  23. IBM PIQUANT Back up Slides Dec 3-5 2001

  24. Statistical Features • Co occurrences to support definition answers • Machine Learning to evaluate search engine results • Machine Learning to assist in answer selection • Learn probable confidence of question recognizers Dec 3-5 2001

  25. QPLAN • Multiple per question type • Declarative representation of a solution • Independent of knowledge source’s details • Executed by planning engine • Sequence of solution steps • structure knowledge queries • text search queries • statistical queries etc. • Confidences learned over time Dec 3-5 2001

  26. High-level View of Solution Steps • Question is processed by linguistic tools. • Question is classified into 1 or more types • Parallel solution plan is generated and executed. • Responses are gathered and examined. • If necessary, plan is revised and steps 3-5 revisited. • Candidate answers are checked for sanity, merged, sorted and presented • Note: • Dialog manager functions are not considered here. • All data-structures are assigned confidences and all selections of next steps are mediated by probabilistic computations. Dec 3-5 2001

More Related