350 likes | 467 Views
A Classification-based Approach to Question Answering in Discussion Boards. Liangjie Hong and Brian D. Davison Department of Computer Science and Engineering Lehigh University SIGIR 2009. Outline. Introduction Related Work Problem Definition Classification Methods Experiments
E N D
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong and Brian D. Davison Department of Computer Science and Engineering Lehigh University SIGIR 2009
Outline • Introduction • Related Work • Problem Definition • Classification Methods • Experiments • Conclusion
Introduction • Online users share ideas, discuss issues and form communities within discussion boards(online forums) • Knowledge discovery and information extraction • Several potential applications about mining QA content: • Search engines • Online QA services • Experts in social media • Knowledge base of automatic chat-bots
Related Work • Cong et al., 2008 • They developed a classification-based method for question detection • sequential pattern features extracted from both questions and non-questions in forums • Preprocess by applying a POS tagger while keeping 5W1H and modal words • Time-consuming problem • Focus on question sentences or question paragraphs
Related Work(cont’d) • Knowledge acquisition from discussion boards • Zhou and Hovy, 2005 • Feng et al., 2006 • Using non-textual features like click count to predict the quality of answers • Jeon et al., 2006 In general all related work does not need to detect questions
Tasks • Tasks: • Identifying question-related first posts • Fining potential answers in subsequent responses within the corresponding threads • Some questions…
Tasks(cont’d) • Some questions: • Can we detect question-related threads in an efficient and effective manner? • What other features can be used to improve the performance? • How much can the combinations of some simple heuristics improve performance? • Are traditional relevance-based approaches suitable to these QA content?
Problem Definition • Questions • Focus on finding whether the first post is a question post • Treat the whole post as a question post:
Problem Definition • Questions • Focus on finding whether the first post is a question post • Treat the whole post as a question post:
Problem Definition • Questions • Focus on finding whether the first post is a question post • Treat the whole post as a question post:
Problem Definition(cont’d) • Answers • If one of the replied posts contains answers to the questions proposed in the first post, then regard that reply as an answer post • Also consider replied post not containing the actual content of answers but providing links to other potential answers an answer posts. • Result from the system: Question-answer post pairs
Classification Methods(1/3) • NTU CSIE LIBSVM 2.88 • Question detection: • Question mark • 5W1H words • Total number of posts within one thread • Authorship • N-gram
Classification Methods(2/3) • Answer detection • The position of the answer post • Authorship • N-gram • Stop words • Query likelihood model score
Classification Methods(3/3) • Cong et al., 2008 • Sequential pattern mining • Graph-based model • Query likelihood language model • KL-divergence language model
Experiments(1/9) • Data crawled • 555,954 threads from Ubuntudataset • 721,422 threads from Photography On The Net • Question detection task: • Randomly sampled 572 threads from Ubuntu dataset and 500 threads from the DC dataset • Answer detection task: • Randomly sampled 500 question-related threads from both dataset
Experiments(8/9) • Propose a ranking scheme • Ranking score: V1: position + authorship, V2: position, V3: authorship
Conclusion • Use of N-grams and the combination of several non-content features can improve the performance • Relevance-based retrieval methods would not be effective in tackling the problembut the performance can be improved by combining with non-content features • Design a simple ranking scheme that outperforms previous approaches
Combine several potential answers together to make a better answer ? • A good understanding of the interaction of question answering in the discussion boards