260 likes | 470 Views
PIQUANT. A Question Answering system integrating Information Retrieval, Natural Language Processing and Knowledge Representation. Practical Intelligent QUestion ANswering Technology. Prime Contractor: IBM T.J. Watson Research Center 30 Saw Mill River Road Hawthorne, NY 10532.
E N D
PIQUANT A Question Answering system integrating Information Retrieval, Natural Language Processing and Knowledge Representation Practical Intelligent QUestion ANswering Technology Prime Contractor: IBM T.J. Watson Research Center 30 Saw Mill River Road Hawthorne, NY 10532 Subcontractor: Cycorp Dec 3-5 2001
IBM & CycorpBringing Complementary Strengths to QA } Both symbolic and statistical • IBM • Information Retrieval • Natural Language Processing • Scalable System Architectures • Business Applications Architectures • Cycorp • Structured Knowledge Representation • Rich Common Sense Knowledge Bases • Deep Inferencing • Ontologies Dec 3-5 2001
Experience from TREC8-10 • End-to-end system that has performed well • Invaluable experience in learning where the problems are: • Coverage • Engineering • Understanding Dec 3-5 2001
IBM’s PIQUANTPrincipal Extensions • Integration of IR/NLP with Structure KBs and Deep Inference • Knowledge System to assist in decomposing and answering questions • Provide justification and/or invalidation of candidate answers • Parallel Solution Paths and Pervasive Confidence Analysis • Multiple parallel solution approaches to problem/subproblem • Pervasive use of confidences to mediate management of alternatives • Extensive reinforcement of symbolic approaches by statistical data • Well-Defined Component Architecture • Modular • Defined interfaces between NLP, IR, KS and Statistical Components • Declarative representation of question answering plans Dec 3-5 2001
Where Knowledge-Systems Help Heuristic of finding short passages with all the query terms/semantic classes is good but not sufficient. E.g. from TREC9: Q: How much folic acid should an expectant mother take daily? A: 360 tons Q: What is the diameter of the Earth? A: 14 ft. Q: How many states have a lottery? A: 3,312 We will investigate the use of a sophisticated inference engine and knowledge-base (Cyc) to eliminate such answers. Dec 3-5 2001
Question Complexity “Simple” questions are not a solved problem: • Complex questions can be decomposed into simpler components. • If simpler questions cannot be handled successfully, there’s no hope for more complex ones. • Areas not explored (intentionally) by TREC to date: • spelling errors • grammatical errors • syntactic precision e.g. significance of articles • not, only, just … BUT: Dec 3-5 2001
Complexity is a function for question and data source “simple” -> “simple to state” Is there such a thing as a “simple” question? Which is more complex? A: How many members are there in the Cabinet? Suppose there is no text that gives the answer explicitly B: What is the meaning of life? 42 (from HGTTG) Dec 3-5 2001
Different Solution ApproachesWhat is the largest city in England? • Text Match • Find text that says “London is the largest city in England” (or paraphrase). Confidence is confidence of NL parser * confidence of source. Find multiple instances and confidence of source -> 1. • “Superlative” Search • Find a table of English cities and their populations, and sort. • Find a list of the 10 largest cities in the world, and see which are in England. • Uses logic: if L > all objects in set R then L > all objects in set E < R. • Find the population of as many individual English cities as possible, and choose the largest. • Heuristics • London is the capital of England. (Not guaranteed to imply it is the largest city, but quite likely.) • Complex Inference • E.g. “Birmingham is England’s second-largest city”; “Paris is larger than Birmingham”; “London is larger than Paris”; “London is in England”. Dec 3-5 2001
Parallel Confidence Propagation Question Classifications Confidences . . . Selected Answers Candidate Answers QPLANS QFRAMES Validation and Sanity Checks Eliminate some Answers and Adjust Confidences Goals (logical forms) with boolean connectives, sequencing and recombination information Dec 3-5 2001
Probability Management • Associated with every data element • A priori probabilities associated with every processing module. Given default values at first, then learned as experience is gained • Bayesian, Dempster-Shafer, … Dec 3-5 2001
IBM PIQUANT High-Level Architecture Dec 3-5 2001
IBM PIQUANT Block Diagram Dec 3-5 2001
QA-Manager Internals NLP Components Linguistic Question Analysis QFRAME Ontology & Data Services Knowledge Representation Reasoning Services Question Classification WN CYC QFRAMES QPLANS IR QGOAL Plan Generation QPLAN Execution Engine DB KB Answer Candidates Answer Presentation Answer Resolution Answers Dec 3-5 2001
Question Classification “Daemons” Classifiers act as “daemons”; perform recognition and sub-plan generation • Definition • What is OPEC? • Comparative & Superlative • Does Kuwait export more oil than Venezuela? • Which country exports the most uranium? • Profile • Who is Rabbani? • Relationship • Which countries are allies of Qatar? • Chronology • Was OPEC formed before Nixon became president? • Enumeration • How many oil refineries are in the U.S.? • Cause & Effect • Why did Iraq invade Kuwait? • Combination • Which countries are Qatar’s most powerful allies? Dec 3-5 2001
Architectural Features • Modularity • Self-contained components with well-defined functions and interfaces • Ease of development, experimentation and maintenance • Robustness • If a “Knowledge Source” fails the system will continue to operate with (minor) degradation • Exploit redundancy to find best answer • Reinforcement • Multiple sources of evidence for same answer are synergistic • Transparency • Explicit plans permit ready generation of explanations and symbolic analysis Dec 3-5 2001
IBM PIQUANT Implementation Highlights Dec 3-5 2001
Implementation Highlights • Predictive Annotation • Shift computational burden from NLP towards IR • Index semantic labels along with text • Beat the Precision-Recall tradeoff by boosting precision at little cost to recall • Virtual Annotation • Answer definitional (“What is”) questions by combination of linguistic, ontological and statistical techniques • Find the hypernyms in e.g. WordNet that have the best combination of closeness and co-occurrence Dec 3-5 2001
including COUNTRY$ in CONTINENT$ PLACE$ PLACE$ Belize central America Predictive Annotation (1) Predictive Annotation • Annotate entire corpus and index semantic labels along with text • Identify sought-after label(s) in questions and include in queries • Example: Question is “Where is Belize?” • “Where” can map to CONTINENT$, COUNTRY$, STATE$, CITY$, CAPITAL$, PLACE$. • Knowing Belize is a country: “Where is Belize?” {CONTINENT$ Belize} (assume CONTINENT$ = Continents plus sub-continental regions) • Suppose text is “… including Belize in central America … ” Dec 3-5 2001
including COUNTRY$ in CONTINENT$ PLACE$ PLACE$ Belize central America Predictive Annotation (2) Increased precision of enhanced bag-of-words: • “Where is Belize” {CONTINENT$ Belize} • Belize occurs 704 times in TREC corpus • Belize and CONTINENT$ co-occur in only 22 sentences • Note: data structure equally appropriate for “Name a country in Central America”, which {COUNTRY$ Central America} Dec 3-5 2001
Summary • Leverage existing technology base • Parallel approach to find answer, exploiting redundancy • Declarative plan representation • Associate confidences with each component and each intermediate and final result • CYC’s knowledge-base and inference engine to solve sub-problems and eliminate nonsensical answer candidates Dec 3-5 2001
High-Level 1st Year Development Plan • Finalize design of data-structures: • QFRAME: question and derived attributes • QPLAN: script for tackling solution • QGOAL: logical-form like structure representing predicate for instantiation or verification • Build several recognizers and QPLAN executor (many pieces already exist) • Run on many examples to fine-tune and to develop a priori component confidence values • Build answer resolution module Dec 3-5 2001
IBM PIQUANT Back up Slides Dec 3-5 2001
Statistical Features • Co occurrences to support definition answers • Machine Learning to evaluate search engine results • Machine Learning to assist in answer selection • Learn probable confidence of question recognizers Dec 3-5 2001
QPLAN • Multiple per question type • Declarative representation of a solution • Independent of knowledge source’s details • Executed by planning engine • Sequence of solution steps • structure knowledge queries • text search queries • statistical queries etc. • Confidences learned over time Dec 3-5 2001
High-level View of Solution Steps • Question is processed by linguistic tools. • Question is classified into 1 or more types • Parallel solution plan is generated and executed. • Responses are gathered and examined. • If necessary, plan is revised and steps 3-5 revisited. • Candidate answers are checked for sanity, merged, sorted and presented • Note: • Dialog manager functions are not considered here. • All data-structures are assigned confidences and all selections of next steps are mediated by probabilistic computations. Dec 3-5 2001