250 likes | 408 Views
Huizhong Doan, Yonbo Cao, Chin-Yew Lin and Yong Yu Shanghai Jiao Tong University & MSRA ACL 2008. Searching Question by Identifying Question Topic and Question Focus. Introduction. Question & their Answers A very large archives Built up by Online Services Example
E N D
Huizhong Doan, Yonbo Cao, Chin-Yew Lin and Yong Yu Shanghai Jiao Tong University & MSRA ACL 2008 Searching Question byIdentifyingQuestion Topic and Question Focus Rick Liu
Introduction • Question & their Answers • A very large archives • Built up by Online Services • Example • Traditional FAQ services • Community-based Q&A services • Emerging • Yahoo! Answers, Live QnA, BaiduZhidao Rick Liu
Motivation • Question Search • Help users to search previous answers Any cool clubs in Berlin or Hamburg? What are the best/most fun clubs in Berlin? • Any nice hotels in Berlin or Hamburg? • How long does it take to Hamburg from Berlin? • Cheap hotels in Berlin? Rick Liu
Motivation Any cool clubs in Berlin or Hamburg? Question Focus Question Topic Rick Liu
Approach • Identifying question topic & focus • Question tree • Determining the tree cut • Modeling question topic & focus for search • Language model Rick Liu
Question Tree • Topic terms • BaseNP, WH-ngram • Topic profile • probability distribution of categories • Specificity • inverse of the entropy of the topic profile • Topic chain • topic terms ordered by specificity value (desc) • Topic tree Rick Liu
Question Tree Example Rick Liu
Tree Cut Model • M = ( Γ , θ ) • Γ = [ C1, C2, .. Ck ] , tree cut • Θ = [ P(C1), P(C2), .. P(Ck) ] , probparam vector • A cut is any set of nodes • Σi=1..kP( Ci ) = 1 Rick Liu
Tree Cut Model Example [n0, n11], [n12, n21, n22, n23], [n13, n24] [n11, n21, n22, n23, n24] Rick Liu
MDL-base Tree Cut Model • Minimum Description Length Ref : Li and Abe, 1998 Rick Liu
Determining the Tree Cut TAIL HEAD Rick Liu
Modeling for Search • P( q | q ) • q : queried question • q : targeted question ~ ~ Rick Liu
Experimental Data • Yahoo! Answers • Resolved questions • travel : 314,616 items • computers & internet : 210,785 items • Tree fields • title ( only used ) • description • answers Rick Liu
Ground Truth • Employed Vector Space Model • Manual judgments : relevant / irrelevant • Baseline : VSM, LMIR • Evaluation : MAP, R-precision, MRR Rick Liu
Results for ‘travel’ Rick Liu
Results for ‘computer & internet’ Rick Liu
About the λ Emphasize more in question topic Rick Liu
Error Analysis ( travel ) • Examine the correctness of question topics and question foci • 200 queried question => 69 question incorrect • (a) Only have the head part ( 59 ) • (b) Incorrect order ( 10 ) • (a) explains why λ is 0.7 Rick Liu
Related Work • FAQ data • Community based • Jeon et al., 2005 • Compared four different retrieval methods • Vector space model • Okapi • Language model • Translation-based model • Translation-based model performed the best Rick Liu
Translation Model • Lexical chasm • Where to stay in Hamburg? • The best hotel in Hamburg? • IBM model 1 • Use question titles and question description as the parallel corpus Rick Liu
Results Rick Liu
Conclusions and Future Work • Data Structure • Use MDL-based Tree Cut Model to Identify • A new form of language modeling for question search • Extensive experiments • Now only community-based • From forum sites / FAQ sites Rick Liu
Thanks Rick Liu
Modeling for Search Rick Liu
Translation Probability Rick Liu