350 likes | 545 Views
QUALIFIER in TREC-12 QA Main Task. Hui Yang, Hang Cui, Min-Yen Kan, Mstislav Maslennikov, Long Qiu, Tat-Seng Chua School of Computing National University of Singapore Email: yangh@comp.nus.edu.sg. Outline. Introduction Factoid Subsystem List Subsystem Definition Subsystem Result
E N D
QUALIFIER in TREC-12 QA Main Task Hui Yang, Hang Cui, Min-Yen Kan, Mstislav Maslennikov, Long Qiu, Tat-Seng Chua School of Computing National University of Singapore Email: yangh@comp.nus.edu.sg
Outline • Introduction • Factoid Subsystem • List Subsystem • Definition Subsystem • Result • Conclusion and Future Work
Introduction • Given a question and a large text corpus, return an “answer” rather than relevant “documents” • QA is at the intersection of IR + IE + NLP • Our system - QUALIFIER • Consists 3 subsystems • External Resources – Web, WordNet, Ontology • Event-based Question Answering • New Modules introduced
Outline • Introduction • Factoid Subsystem • List Subsystem • Definition Subsystem • Result • Conclusion and Future Work
Factoid Subsystem • Detailed Question Analysis • QA Event Construction • QA Event Mining • Answer Selection • Answer Justification • Fine-grained Named Entity Recognition • Anaphora Resolution • Canonicalization Coreference • Successive Constraint Relaxation
Factoid Subsystem • Detailed Question Analysis • QA Event Construction • QA Event Mining • Answer Selection • Answer Justification • Fine-grained Named Entity Recognition • Anaphora Resolution • Canonicalization Coreference • Successive Constraint Relaxation
Why Event-based QA - I • The world consists of two basic types of things:entitiesand events and people often ask questions about them. • From Question Answering’s Point of View • Questions = “enquiries about entities or events”.
Why Event-based QA - II • QA Entities • “Anything having existence (living or nonliving)” • E.g. “What is the democratic party symbol?” • QA Events • “Something that happens at a given place and time”. • E.g. “How did donkey become democratic party symbol?” Thomas Nast 1870 Harper’s Weekly cartoon
Why Event-based QA - III Table 1: Correspondence of WH-Questions & Event Elements • Entity Questions • Properties, or • entities themselves • definition questions. • Event Questions • Elements of events • Location, • Time, • Subject, • Object, • Quantity • Description • Action, etc. question :== event | event_element | entity | entity_property event :== { event_element } event_element :== time | location | subject | object | quantity | description | action | other entity :== object | subject entity_property :== quantity | description | other
Event-based QA Hypothesis • Equivalency: QA event Ei,Ej ,if all_elements(Ei) = all_elements(Ej), then Ei = Ej, and vice versa; • Generality: if all_elements(Ei) is a subset of all_elements(Ej), then Ei is more general than Ej; • Cohesiveness: if elements a, b both belong to an event Ei, and a, c do not belong to a known event, then co-occurrence(a,b) is greater than co-occurrence(a,c); • Predictability: if elements a, b both belong to an event Ei, then a => b and b => a.
QA Event Space • Consider an event to be a point in a multi-dimensional QA event space. • If we know all the elements about an event, then we can easily answer different questions about it • E.g. “When did Bob Marley die ?” • As there are innate associations among these elements if they belong to the same event (Cohesiveness), we can use what are already known • To narrow the search scope • To find rest of the unknown event elements, the answer (Predictability)
Problems to be Solved • However, for most of the cases, it is difficult to find the correct unknown element(s), i.e., the correct answer • Two major problems: • Insufficient known elements • Inexact known elements • Solution: • Explore the use of world knowledge (Web and WordNet glosses) to find more known elements • Exploit the lexical knowledge from (WordNet synsets and morphemics) to find exact forms.
How to Find a QA Event • Using Web • From original query term q(0) , retrieve top N web documents • qi(0)q(0), extract nearby non-trivial words in one sentence or n words away (in Cq ) and rank them by computing its probability of correlation with qi(0) • Using WordNet • qi(0)q(0), extract terms that are lexically related to qi(0) by locating them in Gloss Gq and Synset Sq • Combine the external knowledge resources to form term collection: Kq = Cq + (GqSq)
QA Event Construction • Structured Query Formulation • We perform structural analysis on Kq to form semantic groups of terms Given any two distinct terms ti, tjKq, we compute their • Lexical correlation • Co-occurrence correlation • Distance correlation
QA Event Construction • For example, “What Spanish explorer discovered the Mississippi River?” The final Boolean query becomes: “(Mississippi) & (French|Spanish) & (Hernando & Soto & De) & (1541) & (explorer) & (first | European |river)”.
QA Event Mining • Extract important association rules among the elements by using data mining techniques. • Given a QA event Ei, we define X, Y as two sets of event elements. • Event mining studies the rules of the form X Y, where X, Y are QA event element sets, X Y =, and Y {elementoriginal }=. • if X Y , ignore X Y. • if cardinality(Y) > 1, ignore X Y. • if Y {elementoriginal }, ignore X Y.
Passage & Answer Selection • Select Passage based on Answer Event Score (AES) from the relevant documents in the QA corpus: • Support (X Y) = • Confidence (X Y) = • The weight for answers candidate j is defined as:
Related Modules: Fine-grained Named Entity Recognition • Fine-grained NE Tagging • Non-ascii Character Remover • Number Format Converter • E.g. “one hundred eleven” => 111 • Rule Confliction Revolver • Longer Length • Ontology • Handcrafted Priorities
Related Modules: Answer Justification • We generate axioms based on our manually constructed ontology. For example, • q1425: What is the population of Maryland? • Sentence: “Maryland 's population is 50,000 and growing rapidly.” • Ontology Axiom (OA): Maryland (c1) & population (c1, c2) -> 5000000(c2) • In this way, we could identify the wrong answer “50000”, which is the surface text shown.
Outline • Introduction • Factoid Subsystem • List Subsystem • Definition Subsystem • Result • Conclusion and Future Work
List Subsystem • Multiple Answers from Same Paragraph • Canonicalization Resolution • Unique answer • “the States” , “USA”, “United States”, etc • Pattern-based Answer Extraction • <same_type_NE>, <same_type_NE> and <same_type_NE> + verb … • … include: <same_type_NE>, <same_type_NE>, <same_type_NE> … • “list of …” • “top” + number + adj-superlative
Outline • Introduction • Factoid Subsystem • List Subsystem • Definition Subsystem • Result • Conclusion and Future Work
Definition Subsystem • Pre-processing • document filter • anaphora resolution • sentence “positive set” and “negative set” • Sentence Ranking • Sentence weighting in Corpus • Sentence weighting in Web • Overall weighting :
Definition Subsystem • Answer Generation (Progressive Maximal Margin Relevance) • All sentences are ordered in descending order by weights. • Add the first sentence to the summary. • Examine the following sentences. If Weight(stc)- Weight(next_stc) >avg_sim(stc), Add next_stc to summary; • Go to Step 3) till the length limit of the target summary is satisfied.
Definition Results • We empirically set the length of the summary for People and Objects based on question classification results.
Outline • Introduction • Factoid Subsystem • List Subsystem • Definition Subsystem • Result • Conclusion and Future Work
Conclusion and Future Work • Conclusion • Event-based Question Answering • Factoid question and list questions explore the power of Event-based QA • Definition questions answering combines IR and Summarization • Use Ontology to boost the performance of our NE and answer justification modules • Future Work • Give a formal proof of our QA event hypothesis • Working towards an online question answering system • Interactive QA • Analysis and opinion questions • VideoQA – question answering on news video