Employing Two Question Answering Systems in TREC 2005

Employing Two Question Answering Systems inTREC 2005 Harabagiu, Moldovan, et al 2005 Language Computer Corporation

Highlights • Two Systems PowerAnswer-2 : factoids (main task) PALANTIR : relationships • Bells and whistles • Web-boosting strategy • Abductive logic prover • World-knowledge axioms: XWN, SUMO,… • Results : “above median for all groups” • 53.4% Main task, 20.4% Relationships task

TREC 2005 • Tasks: Main (factoids), Relationships • What’s new • Question types: “Other” • Answer types: Events • Challenges • More complex coreference resolution • Temporal and other event-like constraints • Discovering info nuggets for “Other” questions

Challenges:Coreference resolution • TREC 2004: single antecedent for anaphora • TREC 2005: more candidate antecedents…

Challenges:Inter-Question constraints • A question and its answer constrain the subsequent questions • Correct answer to Q136.5 depends on • correct coreference resolution with previous Q’s • correct answer to Q136.4 • Event answer types • Nominal answer types act as topics of subsequent questions; Events constrain subsequent questions with event-like properties: time, participants…

The LCC Solution:Two Systems • PowerAnswer-2 • Factoid questions • Includes: Abductive logic, temporal reasoner, world-knowledge axioms • Bonus: discover interesting and novel nuggets for “Other” questions • PALANTIR • Relationship questions • Includes: keyword expansion, topic representation, automatic lexicon generation

PowerAnswer-2:Architecture

PowerAnswer-2:Components • Standard modules: QP, PR, AP • Question Processor, Passage Retrieval, Answer Processor • Sneaky module: WebBooster • Fancy module: COGEX Logic Prover • World-knowledge: SUMO, eXtended WordNet, JAGUAR • Linguistic knowledge: WordNet, manual ellipses and coreference axioms • “Prove” correct answers with abductive logic • Temporal inference from “advanced textual inference techniques”

WebBooster • Exploit redundancy on web for answer ranking • Construct series of search engine queries • from “linguistic patterns” (morph/lex alternations?) • Extract most redundant answers from web documents • “Boost” (ie, increase weight of) answers from TREC collection that most closely match answers from web collection • Justification: the larger the set, the easier it is to pinpoint answers that more closely resemble surface form of question • Results: 20.8 % increase in factoid score

COGEX:Logic Prover • Convert Question  QLF, Answer  ALF • Perform “proof” on question over candidate answers • Rank answers by semantic similarity to question • Semantic similarity: WordNet! • Ex: similarity of “buy” and “own” judged by length of connecting path in WordNet • Results: 12.4 % increase in factoid score

COGEX:Temporal Context Reasoner • Document processing: index by dates • Q and A processing: represent temporal relations as triples (S, E1, E2) • S is temporal signal (“during”, “after”), Es are events • Reasoning: • Prefer passages that match detected temporal constraints in Q • Discover events related by temporal signals in the Q and candidate As • Perform temporal unification btw the Q and candidate As, boosting As that match Q times • Results: 2 % increase in factoid score

“Other” Questions • Generic definition-pattern based nuggets “...Russian submarine Kursk, which is lying on the sea bed in the Barents Sea...” • Answer-type based nuggets • Nugget-patterns pecific to properties of answer type • 33 target classes generated by Naïve Bayes classifier on WordNet synsets Bing Crosby  musican_person: band, singer, born, … • Entity-relationship based nuggets • Nugget patterns are based on relations to other NEs Akira Kurosawa AND _date Akira Kurosawa AND _location …

PALANTIR: Architecture

PALANTIR:Keyword Selection • Collocation detection • identify complete phrases that aren’t just bags of keywords (Organization of African States) • Keyword Ranking • detect overall importance of keyword in query • Use keyword-density strategy for doc ranking • Keyword Expansion • Synonyms, alternate forms for keywords

PALANIR:Topic Representation • Harvest “topic signatures” from text • ?? • Find relationships between topic signatures • Use syntax- and semantic-based relations between verbs and arguments • Use context-based relations that exist between entities

PALANTIR:Lexicon Generation • Q: Relationship questions have no single semantic answer type; how to identify appropriate answers from passages? • A: By generating set-types on the fly, of course! • Use weakly-supervised learning approach to identify semantic sets in question, then keywords relevant to that set (South American countries) • Automatically generate a large db of syntactic frames that represent semantic relations

Results PowerAnswer-2 PALANTIR

Summary • WebBooster – 20% increase • COGEX – 12% increase • Temporal Reasoner – 2% increase • Nugget-pattern discovery – 22.8% f-measure • PALANTIR strategies:

Employing Two Question Answering Systems in TREC 2005