1 / 34

A Classification-based Approach to Question Answering in Discussion Boards

A Classification-based Approach to Question Answering in Discussion Boards. Liangjie Hong and Brian D. Davison Department of Computer Science and Engineering Lehigh University SIGIR 2009. Outline. Introduction Related Work Problem Definition Classification Methods Experiments

todd
Download Presentation

A Classification-based Approach to Question Answering in Discussion Boards

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong and Brian D. Davison Department of Computer Science and Engineering Lehigh University SIGIR 2009

  2. Outline • Introduction • Related Work • Problem Definition • Classification Methods • Experiments • Conclusion

  3. Introduction • Online users share ideas, discuss issues and form communities within discussion boards(online forums) • Knowledge discovery and information extraction • Several potential applications about mining QA content: • Search engines • Online QA services • Experts in social media • Knowledge base of automatic chat-bots

  4. Related Work • Cong et al., 2008 • They developed a classification-based method for question detection • sequential pattern features extracted from both questions and non-questions in forums • Preprocess by applying a POS tagger while keeping 5W1H and modal words • Time-consuming problem • Focus on question sentences or question paragraphs

  5. Related Work(cont’d) • Knowledge acquisition from discussion boards • Zhou and Hovy, 2005 • Feng et al., 2006 • Using non-textual features like click count to predict the quality of answers • Jeon et al., 2006 In general all related work does not need to detect questions

  6. Tasks • Tasks: • Identifying question-related first posts • Fining potential answers in subsequent responses within the corresponding threads • Some questions…

  7. Tasks(cont’d) • Some questions: • Can we detect question-related threads in an efficient and effective manner? • What other features can be used to improve the performance? • How much can the combinations of some simple heuristics improve performance? • Are traditional relevance-based approaches suitable to these QA content?

  8. Problem Definition • Questions • Focus on finding whether the first post is a question post • Treat the whole post as a question post:

  9. Problem Definition • Questions • Focus on finding whether the first post is a question post • Treat the whole post as a question post:

  10. Problem Definition • Questions • Focus on finding whether the first post is a question post • Treat the whole post as a question post:

  11. Problem Definition(cont’d) • Answers • If one of the replied posts contains answers to the questions proposed in the first post, then regard that reply as an answer post • Also consider replied post not containing the actual content of answers but providing links to other potential answers an answer posts. • Result from the system: Question-answer post pairs

  12. Classification Methods(1/3) • NTU CSIE LIBSVM 2.88 • Question detection: • Question mark • 5W1H words • Total number of posts within one thread • Authorship • N-gram

  13. Classification Methods(2/3) • Answer detection • The position of the answer post • Authorship • N-gram • Stop words • Query likelihood model score

  14. Classification Methods(3/3) • Cong et al., 2008 • Sequential pattern mining • Graph-based model • Query likelihood language model • KL-divergence language model

  15. Experiments(1/9) • Data crawled • 555,954 threads from Ubuntudataset • 721,422 threads from Photography On The Net • Question detection task: • Randomly sampled 572 threads from Ubuntu dataset and 500 threads from the DC dataset • Answer detection task: • Randomly sampled 500 question-related threads from both dataset

  16. Experiments(2/9)

  17. Experiments(2/9)

  18. Experiments(2/9)

  19. Experiments(2/9)

  20. Experiments(3/9)

  21. Experiments(4/9)

  22. Experiments(4/9)

  23. Experiments(5/9)

  24. Experiments(5/9)

  25. Experiments(5/9)

  26. Experiments(6/9)

  27. Experiments(7/9)

  28. Experiments(7/9)

  29. Experiments(7/9)

  30. Experiments(8/9) • Propose a ranking scheme • Ranking score: V1: position + authorship, V2: position, V3: authorship

  31. Experiments(9/9)

  32. Conclusion • Use of N-grams and the combination of several non-content features can improve the performance • Relevance-based retrieval methods would not be effective in tackling the problembut the performance can be improved by combining with non-content features • Design a simple ranking scheme that outperforms previous approaches

  33. Combine several potential answers together to make a better answer ? • A good understanding of the interaction of question answering in the discussion boards

  34. Thank You !

More Related