1 / 32

Is Question Answering an Acquired Skill?

Is Question Answering an Acquired Skill?. With Ganesh Ramakrishnan Deepa Paranjpe Vijay Krishnan Arnab Nandi. Soumen Chakrabarti IIT Bombay. Web search and QA. Information need – words relating “things” + “thing” aliases = telegraphic Web queries

deva
Download Presentation

Is Question Answering an Acquired Skill?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Is Question Answeringan Acquired Skill? WithGanesh RamakrishnanDeepa Paranjpe Vijay Krishnan Arnab Nandi Soumen ChakrabartiIIT Bombay

  2. Web search and QA • Information need – words relating “things” + “thing” aliases = telegraphic Web queries • Cheapest laptop with wireless best price laptop 802.11 • Why is the sky blue?  sky blue reason • When was the Space Needle built? “Space Needle” history • Entity + relation extraction technology better than ever (SemTag, KnowItAll, Biotext) • Ontology extension (e.g., is a kind of) • List extraction (e.g., is an instance of) • Slot-filling (author X wrote book Y) Chakrabarti

  3. Factoid QA • Specialize given domain to a token related to ground constants in the query • What animal is Winnie the Pooh? • hyponym(“animal”) NEAR “Winnie the Pooh” • When was television invented? • instance-of(“time”) NEAR “television” NEAR synonym(“invented”) • FIND x “NEAR” GroundConstants(question) WHERE x IS-A Atype(question) • Ground constants: Winnie the Pooh, television • Atypes: animal, time Chakrabarti

  4. A relational view of QA Question Atypeclues Attributeor columnname Selectors Locate whichcolumn to read Directsyntacticmatch Entity class IS-A Limit searchto certain rows Questionwords Answerpassage “Answer zone” “Answer zone” • Entity class or atype may be expressed by • A finite IS-A hierarchy (e.g. WordNet, TAP) • A surface pattern matching infinitely many strings (e.g. “digit+”, “Xx+”, “preceded by a preposition”) • Match selectors, specialize atype to answer tokens Chakrabarti

  5. Benefits of the relational view • “Scaling up by dumbing down” • Next stop after vector-space • Far short of real knowledge representation and inference • Barely getting practical at (near) Web scale • Can set up as a learning problem: train with questions (query logs) and answers in context • Transparent, self-tuning, easy to deploy • Feature extractors used in entity taggers • Relational/graphical learning on features Chakrabarti

  6. What TREC QA feels like • How to assemble chunker, parser, POS and NE tagger, WordNet, WSD, … into a QA system? • Experts get much insight from old QA pairs • Matching an upper-cased term adds a 60% bonus … for multi-words terms and 30% for single words • Matching a WordNet synonym … discounts by 10% (lower case) and 50% (upper case) • Lower-case term matches after Porter stemming are discounted 30%; upper-case matches 70% Chakrabarti

  7. Talk outline • Relational interpretation of QA • Motivation for a “clean-room” IE+ML system • Learning to map between questions and answers using is-a hierarchies and IE-style surface patterns • Can handle prominent finite set of atypes: person, place, time, measurements,… • Extending to arbitrary atype specializations • Required for what… and which… questions • Ongoing work and concluding remarks Chakrabarti

  8. Feature + Soft match • FIND x “NEAR” GroundConstants(question) WHERE x IS-A Atype(question) • No fixed question or answer type system • Convert “x IS-A Atype(question)” to a soft match “DoesAtypeMatch(x, question) Passage Question Answer tokens IE-style surfacefeature extractors IE-style surfacefeature extractors Question feature vector WordNet hypernymfeature extractors Learn joint distrib. Snippet feature vector Chakrabarti

  9. Feature extraction: Intuition how who abstraction#n#6NNS NNP, person fast far many rich wrote first rate#n#2 explorer mile#n#3linear_unit#n#1 paper_money#n#1 currency#n#1 writer, composer,artist, musician measure#n#3definite_quantity#n#1 rate#n#2magnitude_relation#n#1 A cheetah can chase its preyat up to 90 km/h Nothing moves faster than186,000 miles per hour, thespeed of light How fast can a cheetah run? How fast does light travel? Chakrabarti

  10. Feature extractors • Question features: 1, 2, 3-token sequences starting with standard wh-words • Passage surface features: hasCap, hasXx, isAbbrev, hasDigit, isAllDigit, lpos, rpos,… • Passage WordNet features: all noun hypernym ancestors of all senses of token • Get top 300 passages from IR engine • For each token invoke feature extractors • Label = 1 if token is in answer span, 0 o/w • Question vector xq, passage vector xp Chakrabarti

  11. Preliminary likelihood ratio tests Surface patterns WordNet hypernyms Chakrabarti

  12. how_far region#n#3 when entity#n#1 what_city A simple, flat conditional model • Let x = xq xp (pairwise product of elems) • Model Pr(Y=1|x) = exp(wx)/(1+exp(wx)) • For every question-feature, passage-feature pair, w has a parameter • Expect to performbetter than “linear”model x=(xp,xq) • Can discount for redundancy in pair info • If xq(xp) is fixed, what xp(xq) will yield the largest Pr(Y=1|x)? (linear iceberg query) Chakrabarti

  13. Classification accuracy • Pairing more accurate than linear model • Steep learning curve; linear never “gets it” beyond “prior” atypes like proper nouns (common in TREC) • Are the estimated w parameters meaningful? Chakrabarti

  14. Parameter anecdotes • Surface and WordNet features complement each other • General concepts get negative params: use in predictive annotation • Learning is symmetric (QA) Chakrabarti

  15. Query-driven information extraction • “Basis” of atypes A, a  A could be a synset, a surface pattern, feature of a parse tree • Question q “projected” to vector (wa: a  A) in atype space via learning conditional model • E.g. if q is “when…” or “how long…” whasDigit and wtime_period#n#1 are large, wregion#n#1 is small • Each corpus token t has associated indicator features a(t ) for every a • E.g. hasDigit(3,000) = is-a(region#n#1)(Japan) = 1 • Can also learn [0,1] value of is-a proximity Chakrabarti

  16. Single token scoring • A token t is a candidate answer if • Hq(t ): Reward tokens appearing “near” selectors matched from question • 0/1: appears within fixed window with selector/s • Activation in linear token sequence model • Proximity in chunk sequences, parse trees,… • Order tokens by decreasing Projection of questionto “atype space” Atype indicator features of the token …the armadillo, found in Texas, is covered with strong horny plates Chakrabarti

  17. Mean reciprocal rank (MRR) • nq = smallest rank among answer passages • MRR = (1/|Q|) qQ(1/nq) • Dropping passage from #1 to #2 as bad as dropping it from #2 to  • TREC requires MRR5: round up nq>5 to  • Improving rank from 20 to 6 as useless as improving it from 20 to 15 • Aggregate score influenced by many complex subsystems • Complete description rarely available Chakrabarti

  18. Effect of eliminating non-answers • 300 top IR score hits • If Pr(Y=1|token) < threshold reject token • All tokens rejected then reject passage • Present survivors in IR order Chakrabarti

  19. Drill-down and ablation studies • Scale average MRR improvement to 1 • What, Which < average • Who  average • Atype of what… and which… not captured well by 3-grams starting at wh-words • Atype ranges over essentially infiniteset with relativelylittle training data Chakrabarti

  20. Talk outline • Relational interpretation of QA • Motivation for a “clean-room” IE+ML system • Learning to map between questions and answers using is-a hierarchies and IE-style surface patterns • Can handle prominent finite set of atypes: person, place, time, measurements,… • Extending to arbitrary atype specializations • Required for what… and which… questions • Ongoing work and concluding remarks Chakrabarti

  21. What…, which…, name… atype clues • Assumption: Question sentence has a wh-word and a main/auxiliary verb • Observation: Atype clues are embedded in a noun phrase (NP) adjoining the main or auxiliary verb • Heuristic: Atype clue = head of this NP • Use a shallow parser and apply rule • Head can have attributes • Which (American(general)) is buried in Salzburg? • Name (Saturn’s (largest (moon))) Chakrabarti

  22. Atype clue extraction stats • Simple heuristic quite effective • If successful, extracted atype is mapped to WordNet synset (mooncelestial body etc.) • If no atype of this form available, try the “self-evident” atypes (who, when, where, how_X etc.) • New boolean feature for candidate token: is token hyponym of atype synset? Chakrabarti

  23. The last piece: Learning selectors • Which question words are likely to appear (almost) unchanged in an answer passage? • Constants in select-clauses of SQL queries • Guides backoff policy for keyword query • Local and global features • POS of word, POS of adjacent words, case info, proximity to wh-word • Suppose word is associated with synset set S • NumSense: size of S (how polysemous is the word?) • NumLemma: average #lemmas describing sS POS@-1 POS@0 POS@1 Chakrabarti

  24. Selector results • Global features (IDF, NumSense, NumLemma) essential for accuracy • Best F1 accuracy with local features alone: 71—73% • With local and global features: 81% • Decision trees better than logistic regression • F1=81% as against LR F1=75% • Intuitive decision branches • But logistic regression gives scores for query backoff Chakrabarti

  25. Putting together a QA system Learning tools TrainingCorpus Shallow parser Wordnet QASystem POSTagger N-E Tagger Chakrabarti

  26. Noun andverb markers Taggedquestion Tokenizer POS Tagger ShallowParser AtypeExtractor Atype clues SelectorLearner • Learning to rerank passages • Sample features: • Do selectors match? How many? • Is some non-selector passage token a specialization of the question’s atype clue? • Min, avg, linear token distance between candidate token and matched selectors Is QA pair? Taggedpassage Tokenizer POS TaggerEntity Extractor LogisticRegression Rerankedpassages Putting together a QA system Question Keyword querygenerator Keyword query PassageIndex Candidatepassage Sentence splitterPassage indexer Corpus Chakrabarti

  27. Surface pattern hasDigits selector match WordNet match 5 tokens apart 1 Learning to re-rank passages • Remove passage tokens matching selectors • User already knows these are in passage • Find passage token/s specializing atype • For each candidate token collect • Atype of question, original rank of passage • Min, avg linear distances to matched selectors • POS and entity tag of token if available How many inhabitants live in the town of Ushuaia Ushuaia, a port of about 30,000 dwellers set between the Beagle Channel and … Chakrabarti

  28. Re-ranking results • Categorical andnumeric attributes • Logistic regression • Good precision,poor recall • Use logit score tore-rank passages • Rank of first correctpassage shifts substantially Chakrabarti

  29. MRR gains from what, which, name • Substantial gain in MRR • What/which now show above-average MRR gains • TREC 2000 top MRRs:0.76 0.71 0.46 0.46 0.31 Chakrabarti

  30. Generalization across corpora • Across-year numbers close to train/test split on a single year • Features and model seem to capture corpus-independent linguistic Q+A artifacts Chakrabarti

  31. Conclusion • Clean-room QA= feature extraction+learning • Recover structure info from question • Learn correlations between question structure and passage features • Competitive accuracy with negligible domain expertise or manual intervention • Ongoing work • Model how selector and atype are related • Model coefficients to predictive annotation • Combine token scores to better passage scores • Treat all question types uniformly • Use redundancy available from the Web Chakrabarti

More Related