220 likes | 337 Views
Segmentation of Multi-Sentence Questions: Towards Effective Question Retrieval in cQA Services. Kai Wang, Zhao-Yan Ming, Xia Hu, Tat-Seng Chua SIGIR ’ 10 Speaker: Hsin-Lan, Wang Date: 2011/03/07. Outline. Introduction Question Sentence Detection Sequential Pattern Mining
E N D
Segmentation of Multi-Sentence Questions: Towards Effective Question Retrieval in cQA Services Kai Wang, Zhao-Yan Ming, Xia Hu, Tat-Seng Chua SIGIR’10 Speaker: Hsin-Lan, Wang Date: 2011/03/07
Outline • Introduction • Question Sentence Detection • Sequential Pattern Mining • Syntactic Shallow Pattern Mining • Model Learning • Multi-Sentence Question Segmentation • Building Graphs for Question Threads • Propagating the Closeness Scores • Segmentation-aided Retrieval • Experiment • Conclusion
Introduction • cQA: Community-based Question Answering services
Introduction • A new graph based approach to segment multi-sentence questions would be introduced in this paper. • Basic idea: • Detect question sentences • Measure the closeness score • Model their relationships to form a graph • Use the graph to propagate the closeness scores • Group topically related sentences
Question Sentence Detection • Human generated content on the Web are usually informal. • Solve: Use salient sequential and syntactic patterns as features to build a question detector.
Question Sentence Detection • Sequential Pattern Mining • Sequential Pattern is also referred to as Labeled Sequential Pattern. S→C, C is the class label that the sequence S is classified to. • Sequence is defined to be a series of tokens from sentences, and the class is in the binary form of {Q, NQ}.
Question Sentence Detection • Sequential Pattern Mining • The purpose is to extract a set of frequent subsequence of words that are indicative of questions. • Applying POS taggers to all tokens except some keywords. <any1, know, what>→<any1, VB, what>
Question Sentence Detection • Syntactic Shallow Pattern Mining
Question Sentence Detection • Model Learning • Certain patterns from questions becomes unnatural to identify characteristics for non-questions. • Solve: One-class SVM • Training data: assuming all questions ending with question marks as an initial set of positive examples.
Multi-Sentence Question Segmentation • Building Graphs for Question Threads • Vq: question sentence vertex set Vc: context sentence vertex set • Model the question thread into a weighted graph (V,E).
Multi-Sentence Question Segmentation • Building Graphs for Question Threads • Directed edge (u→v): • KL-divergence • Coherence • Coreference
Multi-Sentence Question Segmentation • Building Graphs for Question Threads • Undirected edge (u-v): • Cosine Similarity • Distance : proportional to the number of sentences between u and v.
Multi-Sentence Question Segmentation • Building Graphs for Question Threads • Undirected edge (u-v): • Coherence • Coreference
Multi-Sentence Question Segmentation • Propagating the Closeness Scores
Multi-Sentence Question Segmentation • Propagating the Closeness Scores • Sort edges in Er by the closeness score. <e1, e2, … , en > • Extraction process terminates at em when one of the following criteria is met:
Multi-Sentence Question Segmentation • Propagating the Closeness Scores • Example: final edge set {(q1,c1), (q2,c2), (q1,c2)} question segments (q1–c1, c2), (q2–c2)
Multi-Sentence Question Segmentation • Segmentation-aided Retrieval
Experiments • Evaluation of Question Detection • Dataset: issued getByCategory API query to Yahoo! Answers. • Generate three datasets: • Pattern Mining Set: 350k sentences extracted from 60k question threads. • Training Set: 130k sentences from another 60k question threads. • Testing Set: Two annotators are asked to tag 2004 question sentences and 2039 non-question sentences.
Experiments • Evaluation of Question Detection
Experiments • Direct Assessment of Multi-Sentence Question Segmentation via User Study
Experiments • Performance Evaluation on Question Retrieval with Segmentation Model
Conclusion • Present a new segmentation approach for segmenting multi-sentence questions. • Separates question sentences from non-question sentences and aligns them according to their closeness scores.