710 likes | 915 Views
Open-Domain Question Answering. Eric Nyberg Associate Professor ehn@cs.cmu.edu. Outline. What is question answering? Typical QA pipeline Unsolved problems The JAVELIN QA architecture Related research areas.
E N D
Open-DomainQuestion Answering Eric NybergAssociate Professor ehn@cs.cmu.edu 15-381 Lecture, Spring 2003
Outline • What is question answering? • Typical QA pipeline • Unsolved problems • The JAVELIN QA architecture • Related research areas These slides and links to other background material canbe found here: http://www.cs.cmu.edu/~ehn/15-381 15-381 Lecture, Spring 2003
TextCorpora& RDBMS Question Answering • Inputs: a question in English; a set of text and database resources • Output: a set of possible answers drawn from the resources “When is the next train to Glasgow?” QA SYSTEM “8:35, Track 9.” 15-381 Lecture, Spring 2003
Ancestors of Modern QA • Information Retrieval • Retrieve relevant documents from a set of keywords; search engines • Information Extraction • Template filling from text (e.g. event detection); e.g. TIPSTER, MUC • Relational QA • Translate question to relational DB query; e.g. LUNAR, FRED 15-381 Lecture, Spring 2003
http://trec.nist.gov 15-381 Lecture, Spring 2003
Question Query Docs Answers ExtractKeywords SearchEngine PassageExtractor AnswerSelector Answer Corpus Typical TREC QA Pipeline “A simple factoidquestion” “A 50-byte passage likelyto contain the desired answer” (TREC QA track) 15-381 Lecture, Spring 2003
Sample Results Mean Reciprocal Rank (MRR): Find the ordinalposition of the correct answer in your output (1stanswer, 2nd answer, etc.) and divide by one; averageover entire test suite. 15-381 Lecture, Spring 2003
Functional Evolution • Traditional QA Systems (TREC) • Question treated like keyword query • Single answers, no understandingQ: Who is prime minister of India?<find a person name close to prime, minister, India (within 50 bytes)>A: John Smith is not prime minister of India 15-381 Lecture, Spring 2003
Functional Evolution [2] • Future QA Systems • System understands questions • System understands answers and interprets which are most useful • System produces sophisticated answers (list, summarize, evaluate)What other airports are near Niletown?Where can helicopters land close to the embassy? 15-381 Lecture, Spring 2003
Major Research Challenges • Acquiring high-quality, high-coverage lexical resources • Improving document retrieval • Improving document understanding • Expanding to multi-lingual corpora • Flexible control structure • “beyond the pipeline” • Answer Justification • Why should the user trust the answer? • Is there a better answer out there? 15-381 Lecture, Spring 2003
Why NLP is Required • Question: “When was Wendy’s founded?” • Passage candidate: • “The renowned Murano glassmaking industry, on an island in the Venetian lagoon, has gone through several reincarnations since it was founded in 1291. Three exhibitions of 20th-century Murano glass are coming up in New York. By Wendy Moonan.” • Answer: 20th Century 15-381 Lecture, Spring 2003
Predicate-argument structure • Q336: When was Microsoft established? • Difficult because Microsoft tends to establish lots of things… Microsoft plans to establish manufacturing partnerships in Brazil and Mexico in May. • Need to be able to detect sentences in which `Microsoft’ is object of `establish’ or close synonym. • Matching sentence: Microsoft Corp was founded in the US in 1975, incorporated in 1981, and established in the UK in 1982. 15-381 Lecture, Spring 2003
Why Planning is Required • Question: What is the occupation of Bill Clinton’s wife? • No documents contain these keywords plus the answer • Strategy: decompose into two questions: • Who is Bill Clinton’s wife? = X • What is the occupation of X? 15-381 Lecture, Spring 2003
JAVELIN GUI operator (action) models Data Repository Execution Manager Domain Model Planner process history and results Question Analyzer Request Filler Answer Generator Retrieval Strategist ... search engines & document collections JAVELIN: Justification-based Answer Valuation through Language InterpretationCarnegie Mellon Univ. (Language Technologies Institute) • OBJECTIVES • QA as planning by • developing a glass box • planning infrastructure • Universal auditability by • developing a detailed set of • labeled dependencies that • form a traceable network of • reasoning steps • Utility-based information • fusion PLAN • Address the full Q/A task: • Question analysis - question typing, interpretation, refinement, clarification • Information seeking - document retrieval, entity and relation extraction • Multi-source information fusion - multi-faceted answers, redundancy and contradiction detection 15-381 Lecture, Spring 2003
JAVELIN Objectives • QA as Planning • Create a general QA planning system • How should a QA system represent its chain of reasoning? • QA and Auditability • How can we improve a QA system’s ability to justify its steps? • How can we make QA systems open to machine learning? 15-381 Lecture, Spring 2003
JAVELIN Objectives [2] • Utility-Based Information Fusion • Perceived utility is a function of many different factors • Create and tune utility metrics, e.g.: U = Argmax k [F (Rel(I,Q,T), Nov(I,T,A), Ver(S,Sup(I,S)), Div(S), Cmp(I,A)), Cst(I,A)] - relevance- novelty- veracity, support- diversity- comprehensibility- cost I: Info item, Q: Question, S: Source, T: Task context, A: Analyst 15-381 Lecture, Spring 2003
Control Flow StrategicDecisionPoints 15-381 Lecture, Spring 2003
Repository ERD(Entity Relationship Diagram) 15-381 Lecture, Spring 2003
JAVELIN User Interface 15-381 Lecture, Spring 2003
JAVELIN GUI operator (action) models Data Repository Execution Manager Domain Model Planner process history and results Question Analyzer InformationExtractor Answer Generator Retrieval Strategist ... search engines & document collections Javelin Architecture Integrated w/XML Modules can run on different servers 15-381 Lecture, Spring 2003
Module Integration • Via XML DTDs for each object type • Modules use simple XML object-passing protocol built on TCP/IP • Execution Manager takes care of checking objects in/out of Repository 15-381 Lecture, Spring 2003
Sample Log File Excerpt Components communicate viaXML object representations 15-381 Lecture, Spring 2003
Question input (XML format) Question Analyzer Wordnet Kantoo Lexicon Brill Tagger BBN Identifier KANTOO lexifier Tokenizer Token information extraction • Taxonomy of question-answer types and type-specific constraints • Knowledge integration • Pattern matching approach for this year’s evaluation Token string input QA taxonomy + Type-specific constraints KANTOO grammars Parser Yes Get FR? No FR Pattern matchingRequest object builder Event/entitytemplate fillerRequest object builder Request object + system result (XML format) 15-381 Lecture, Spring 2003
Question Taxonomies • Q-Types • Express relationships between events, entities and attributes • Influence Planner strategy • A-Types • Express semantic type of valid answers We expect to add moreA-types and refine granularity 15-381 Lecture, Spring 2003
Sample of Q-Type Hierarchy 15-381 Lecture, Spring 2003
Sample of A-Type Hierarchy 15-381 Lecture, Spring 2003
Request Object Who was the first U.S. presidentto appear on TV? event-completion person-name order 1 first, U.S. president, appear, TV (event(subject(person-name ?) (occupation “U.S. president”)) (act appear) (order 1)(theme TV)) • Question type • Answer type • Computation element • Keyword set • F-structure 15-381 Lecture, Spring 2003
How the Retrieval Strategist Works • Inputs: • Keywords and keyphrases • Type of answer desired • Resource constraints • Min/Max documents, time, etc. • Outputs: • Ranked set of documents • Location of keyword matches 15-381 Lecture, Spring 2003
How the Retrieval Strategist Works • Constructs sequences of queries based on a Request Object • Start with very constrained queries • High quality matches, low probability of success • Progressively relax queries until search constraints are met • Lower quality matches, high probability of success 15-381 Lecture, Spring 2003
*** ** **** ** * *** ** ** *** ** *** * * * ** * *** * *** ** * ** *** ** * ** *** ** ** *** * * ** * ** ** *** * * ** * * ** ** ** *** * * * ** * ** * **** * * * ** * ** * * ** *** ** **** ** * *** ** ** *** ** *** * * * ** * *** * *** ** * ** *** ** * ** *** ** ** *** * * ** *** ** **** ** * *** ** ** *** ** *** * * * ** * *** * *** ** * ** *** ** * ** *** ** ** *** * * ** * ** ** *** * * ** * * ** ** ** *** * * * ** * ** * **** * * * ** * ** * * ** *** ** **** ** * *** ** ** *** ** *** * * * ** * *** * *** ** * ** *** ** * ** *** ** ** *** * * ** *** ** **** ** * *** ** ** *** ** *** * * * ** * *** * *** ** * ** *** ** * ** *** ** ** *** * * ** * ** ** *** * * ** * * ** ** ** *** * * * ** * ** * **** * * * ** * ** * * ** *** ** **** ** * *** ** ** *** ** *** * * * ** * *** * *** ** * ** *** ** * ** *** ** ** *** * * ** Sample Search Strategy 15-381 Lecture, Spring 2003
Retrieval Strategist (RS):TREC Results Analysis • Success: % of questions where at least 1 answer document was found • TREC 2002:Success rate @ 30 docs: ~80% @ 60 docs: ~85% @ 120 docs: ~86% • Reasonable performance for a simple method, but room for improvement 15-381 Lecture, Spring 2003
RS: Ongoing Improvements • Improved incremental relaxation • Searching for all keywords too restrictive • Use subsets prioritized by discriminative ability • Remove duplicate documents from results • Don’t waste valuable list space • 15% fewer failures (229 test questions) • Overall success rate: @ 30 docs 83% (was 80%) @ 60 docs 87% (was 85%) • Larger improvements unlikely without additional techniques, such as constrained query expansion • Investigate constrained query expansion • WordNet, Statistical methods 15-381 Lecture, Spring 2003
What Does the Request Filler Do? • Input: • Request Object (from QA module) • Document Set (from RS module) • Output: • Set of extracted answers which match the desired type (Request Fill objects) • Confidence scores • Role in JAVELIN: Extract possible answers & passages from documents 15-381 Lecture, Spring 2003
Request Filler Steps • Filter passages • Match answer type? • Contain sufficient keywords? • Create variations on passages • POS tagging (Brill) • Cleansing (punctuation, tags, etc.) • Expand contractions • Reduce surface forms to lexemes • Calculate feature values • A classifier scores the passages, which are output with confidence scores 15-381 Lecture, Spring 2003
Features • Features are self-contained algorithms that score passages in different ways • Example: Simple Features • # Keywords present • Normalized window size • Average <Answer,Keywords> distance • Example: Pattern Features • cN [..] cV [..] in/on [date] • [date], iN [..] cV [..] • Any procedure that returns a numeric value is a valid feature! 15-381 Lecture, Spring 2003
Learning An Answer Confidence Function • Supervised learning • Answer type-specific model • Aggregate model across answer types • Decision Tree – C4.5 • Variable feature dependence • Fast enough to re-learn from each new instance 15-381 Lecture, Spring 2003
A When Q-Type Decision Tree % Keywords present in the passage 0.75 > 0.75 % Keywords present in the passage Average distance <date, keywords> 0.2 > 0.2 > 60 60 876.0/91.8 62.0/11.6 Maximum scaled keyword window size 0.75 > 0.2.4 5.0/1 33.0/10.3 15-381 Lecture, Spring 2003
The company said it believes the expenses of the restructuring will be recovered by the end of 1992 …the artist expressed The company said it believes … It is a misconception the Titanic sank on April the 15th,1912 … The/DT company/NN say/VBD it/PRP believe/VBZ the/DT expense/NNS of/IN the/DT restructuring/NN will/MD be/VB recover/VBN by/IN the/DT end/NN of/IN 1992/CD … the performer expressed Microsoft said it believes … The Titanic sank on April the 15th,1912 … Semantic Analysis Would Help 15-381 Lecture, Spring 2003
Information Extractor (IX):TREC Analysis If the answer is in the doc set returned by the RetrievalStrategist, does the IX module identify it as an answercandidate with a high confidence score? 15-381 Lecture, Spring 2003
IX: Current & Future Work • Enrich feature space beyond surface patterns & surface statistics • Perform AType-specific learning • Perform adaptive semantic expansion • Enhance training data quantity/quality • Tune objective function 15-381 Lecture, Spring 2003
NLP for Information Extraction • Simple statistical classifiers are not sufficient on their own • Need to supplement statistical approach with natural language processing to handle more complex queries 15-381 Lecture, Spring 2003
Example question • Question: “When was Wendy’s founded?” • Question Analyzer extended output: • { temporal(?x), found(*, Wendy’s) } • Passage discovered by retrieval module: • “R. David Thomas founded Wendy’s in 1969, …” • Conversion to predicate form by Passage Analyzer: • { founded(R. David Thomas, Wendy’s), DATE(1969), … } • Unification of QA literals against PA literals: • Equiv(found(*,Wendy’s), founded(R. David Thomas, Wendy’s)) • Equiv(temporal(?x), DATE(1969)) • ?x := 1969 • Answer: 1969 15-381 Lecture, Spring 2003
Answer Generator • Currently last module in pipe-line. • Main tasks: • Combination of different sorts of evidence for answer verification. • Detection and combination of similar answer candidates to address answer granularity. • Initiation of processing loops to gather more evidence. • Generation of answers in required format. 15-381 Lecture, Spring 2003
Answer Generator input • Analyzed question (RequestObject): • Question/Answer type (qtype/atype) • Number of expected answers; • Syntactic parse and keywords. • Passages (RequestFills): • Marked candidates of right semantic type (right NE type); • Confidences computed using set of text-based (surface) features such as keyword placement. 15-381 Lecture, Spring 2003
Answer Generator output • Answer string from document (for now). • Set of text passages (RequestFills) Answer Generator decided were supportive of answer. • Or, requests for more information (exceptions) passed on to Planner: • “Not enough answer candidates” • “Can’t distinguish answer candidates” 15-381 Lecture, Spring 2003
Types of evidence • Currently implemented: Redundancy, frequency counts. • Preference given to more often occurring, normalized answer candidates. • Next step: Structural information from parser. • Matching question and answer predicate-argumentstructure. • Detecting hypotheticals, negation, etc. • Research level: Combining collection-wide statistics with ‘symbolic’ QA. • Ballpark estimates of temporal boundaries of events/states. 15-381 Lecture, Spring 2003
Example • Q: What year did the Titanic sink? A: 1912 Supporting evidence: It was the worst peacetime disaster involving a British ship since the Titanic sank on the 14th of April,1912. The Titanic sank after striking an iceberg in the North Atlantic on April 14th, 1912. The Herald of Free Enterprise capsized off the Belgian port of Zeebrugge on March 6, 1987, in the worst peacetime disaster involving a British ship since the Titanic sank in 1912. 15-381 Lecture, Spring 2003
What happened? • Different formats for answer candidates detected, normalized and combined: • `April 14th, 1912’ • `14th of April, 1912’ • Supporting evidence detected and combined: • `1912’ supports `April 14th, 1912’ • Structure of date expressions understood and correct piece output: • `1912’ rather than `April 14th, 1912’ • Most frequent answer candidate found and output: • `April 14th, 1912’ rather than something else. 15-381 Lecture, Spring 2003
Answer Normalization • Request Filler/Answer Generator aware of NE types: dates, times, people names, company names, locations, currency expressions. • `April 14th, 1912’, `14th of April 1912’, `14 April 1912’ instances of same date, but different strings. • For date expressions, normalization performed to ISO 8601 (YYYY-MM-DD) in Answer Generator. • ‘summer’, ‘last year’, etc. remain as strings. 15-381 Lecture, Spring 2003
Answer Normalization • Normalization enables comparison and detection of redundant or complementary answers. • Define supporting evidence as piece of text expressing same or less specific information. • E.g., `1912’ supports `April 12th, 1912’. • Complementary evidence: ‘1912’ complements ‘April 12th’. • Normalization and supporting extend to other NE types: • `Clinton’ supports `Bill Clinton’; • `William Clinton’ and `Bill Clinton’ are normalized to same. • For locations, `Pennsylvania’ supports `Pittsburgh’. 15-381 Lecture, Spring 2003