420 likes | 514 Views
Finding What Matters in Questions. Xiaoqiang Luo , Hema Raghavan , Vittorio Castelli , Sameer Maskey and Radu Florian IBM T.J. Watson Research Center. Introduction. e.q . : “ How does one apply for a New York day care license?” bag-of-words model 的最高分 :
E N D
Finding What Matters in Questions XiaoqiangLuo, HemaRaghavan, VittorioCastelli, SameerMaskey and RaduFlorian IBM T.J. Watson Research Center NAACL-HLT 2013
Introduction • e.q.:“How does one apply for a New York day carelicense?” • bag-of-words model 的最高分: • “New licenses for day care centersin York county, PA” • MMPmodel : • 用 “New York,” “day care,” and “license”這三個phrase來搜尋 • We call these important phrases mandatory matchingphrases (MMPs) NAACL-HLT 2013
Question Corpus • subset of the DARPA BOLT corpus containing forum postings in English. • 四人挑選question • 以下5種question 不會用 • 需要推理或計算才能得到答案的問句 • 問題描述不清楚或有ambiguation • 可以拆成很多問句的問題 • multiple choice questions • factoid questions NAACL-HLT 2013
Question Corpus • 兩位標記者負責標記所挑選的question 的MMP類型(MMP-Must, MMP-Maybe)以及span • E.q. 不重疊 連續 NAACL-HLT 2013
Generate MMP Training Instances NAACL-HLT 2013
Generate MMP Training Instances N N N m m m NAACL-HLT 2013
Generate MMP Training Instances • MMP type: • MMP-Must:+1 • MMP-Skip:-1 • MMP-Maybe:-1 deep: 0 1 2 3 4 5 6 • Output instances: • < span, MMP type> E.q. : hedge funds= <(5, 6), +1> <(4, 6), +1> Np Np <(4, 4), +1> <(4, 6), +1> <(5, 6), +1> <(7, 9), +1> p p p Position: 0123456789 <(9, 9), +1> NAACL-HLT 2013
Generate MMP Training Instances NAACL-HLT 2013
MMP Features Lexical Features: • CaseFeatures: • is the first word of an MMPupper-case? • Is it all capital letters? • Does it containnumeric letters? • E.q. : • For “(NP American)” in Figure 1, the upper-case feature fires. NAACL-HLT 2013
MMP Features Lexical Features: • CommonQWord: • Does the MMP contain question words, including “What,” “When,” “Who,” etc. NAACL-HLT 2013
MMP Features Syntactic Features: • PhraseLabel: • this feature returns the phrasal label of the MMP. • E.q: • For “(NP American)” in Figure 1, the feature value is “NP.” NAACL-HLT 2013
MMP Features Syntactic Features: • NPUnique: • this Boolean feature fires if a phrase is the only NP in a question • E.q.: • For “(NP American),” the feature value would be false. NAACL-HLT 2013
MMP Features Syntactic Features: • PosOfPTN: • (1) the position of the left-most word of the node • (2) whether the left-most word is the beginning of the question • (3) the depth of the anchoring node, defined as the length of the path to the root node. NAACL-HLT 2013
E.q. of PosOfPTN deep: 0 1 2 3 4 5 6 • E.q: For “(NP American)” in Figure 1: • 5th word in the sentence • not the first word of the sentence • Depthof the node is 6 Position: 123456789 10 NAACL-HLT 2013
MMP Features Syntactic Features: • PhrLenToQLenRatio: • This feature computes thenumber of words in an MMP, and its relative ratio tothe sentence length. NAACL-HLT 2013
MMP Features Semantic Features (NETypes): • The feature tests ifa phrase is or contains a named entity, and, if thisis the case, the value is the entity type. • information extraction (IE) pipeline consisting of syntactic parsing, mention detection and coreference resolution (Florian et al., 2004; Luo et al., 2004; Luo and Zitouni, 2005) • E.q. : For “(NP American)” in Figure 1, the feature value would be “GPE.” NAACL-HLT 2013
MMP Features Corpus-based Features ( AvgCorpusIDF): • This group of features computes the average of the IDFs of the words in this phrase. • Have stop words NAACL-HLT 2013
MMP Classification Results Classifier: • logistic regression binary classifier using WEKA. Data set: NAACL-HLT 2013
Performances of the MMP classifier NAACL-HLT 2013
Example Questions by MMP Model NAACL-HLT 2013
Data for Relevance Model • From BOLT-IR task(IR, 2012) • Top snippets returned by the search engine are judged for relevancy by our annotators. NAACL-HLT 2013
Relevance Prediction • The relevance model is a conditional distribution P(r|q, s;D) • where r is a binary random variable indicating if the candidate snippet s is relevant to the question q. • D is the document where the snippet s is found. NAACL-HLT 2013
Relevance Prediction Baseline system • (1) Text Match Features • query and snippet 的 cosine scores • (2) Answer Type Features: • The top 3 predictions of a statistical classifier trained to predict answer categories were used as features. NAACL-HLT 2013
Relevance Prediction Baseline system • (3) Mention Match Features • whether a named entity in the query occurs in the snippet. NAACL-HLT 2013
Relevance Prediction Baseline system • (4) Event match features • use several hand-crafted dictionaries containing terms exclusive to various types of events like ”violence”, ”legal”, ”election”. • If both the query and snippet contain the same event type • The features take value is ‘1’ NAACL-HLT 2013
Relevance Prediction Baseline system • (5) Snippet Statistics: • snippet length • the position of the snippet in the post etc were created. NAACL-HLT 2013
Relevance Prediction Features Derived from MMP • HardMatch: • LetI(m ∈ s) be a 1 or 0 functionindicating if a snippet contains the MMP m NAACL-HLT 2013
Relevance Prediction Features Derived from MMP • SoftLMMatch: • The SoftLMMatch score is a language-model (LM) based score, similar to that used in (Bendersky and Croft, 2008), except that MMPs play the role of concepts. NAACL-HLT 2013
Relevance Prediction Features Derived from MMP • SoftLMMatch: • The SoftLMMatch score is a language-model (LM) based score, similar to that used in (Bendersky and Croft, 2008), except that MMPs play the role of concepts. NAACL-HLT 2013
Relevance Prediction Features Derived from MMP • SoftLMMatch: • where wi is the ith in snippet s • I(wi= v) is an indicator function, taking value 1 if wiis v and 0 otherwise • |V | is the vocabulary size NAACL-HLT 2013
Relevance Prediction Features Derived from MMP • MMPInclScore: • where w ∈ m are the words in m • I(・) is the indicator function taking value 1 when the argument is true and 0 otherwise • is a constant threshold • l(w, s) is the similarity of word w to the snippet s as: • l(w, s) = maxv ∈ s JW(w, v) • JW(w, v) = (Jaro Winkler similarity score between words w and v) NAACL-HLT 2013
Relevance Prediction Features Derived from MMP • MMPInclScore: • The MMP weighted inclusion score between the question q and snippet s is computed as: NAACL-HLT 2013
Relevance Prediction Features Derived from MMP • MMPRankDep: • This feature, RD(q, s) first tests if there exists a matched bilexcial dependency between q and s; NAACL-HLT 2013
Relevance Prediction Features Derived from MMP • MMPRankDep: • Let m(i) be the ith ranked MMP • let <wh, wd| q> and <uh, ud| s> be bilexical dependencies from qand s, respectively • whand uh are the heads • wdand ud are the dependents NAACL-HLT 2013
Relevance Prediction Features Derived from MMP • MMPRankDep: • EQ(w, u) • EQ(w, u) is true if either w and u are exactly the same, or their morphs are the same, or they head the same entity, or their synset in WordNetoverlap • RD(q, s) • RD(q, s) is true if and only if • EQ(wh, uh) ∧ EQ(wd, ud) ∧ wh ∈ m(i) ∧ wd ∈ m(j) is true for some <wh, wd | q>, for some <uh, ud | s> and for some i and j. NAACL-HLT 2013
Relevance Prediction 3snippet classifiers model • noMMP model • a system without MMP features; • IDF-as-MMP model • a baseline with each word as an MMP and the word’s IDF as the MMP score. • MMP model NAACL-HLT 2013
Relevance Prediction Performance of 3snippet classifiers system NAACL-HLT 2013
End-to-End System Results • The question-answering system is used in the 2012 BOLT IR evaluation (IR, 2012) • There are 499K(Arabic), 449K(Chinese ) and 262K(English ) threads in each of these languages. • The Arabic and Chinese posts were first translated into English before being processed. NAACL-HLT 2013
End-to-End System Results • performance NAACL-HLT 2013
BOLT Evaluation Results • The BOLT evaluation consists of 146 questions, mostly event- or topic- related NAACL-HLT 2013
BOLT Evaluation Results NAACL-HLT 2013
Conclusions • 作者提供一個使用mandatory matching phrases (MMP) 的QA系統 • 從question抽取出MMP的F-measure 高達 88.6% • 將MMP model 跟snippet relevance model 合併可以有效提升snippet relevance model的效能 • 使用MMP的QA系統是2012 BOLT IR中效能最好的系統 NAACL-HLT 2013