1 / 24

LEARNING FROM THE PAST: ANSWERING NEW QUESTIONS WITH PAST ANSWERS

LEARNING FROM THE PAST: ANSWERING NEW QUESTIONS WITH PAST ANSWERS. Date: 2012/11/22 Author: Anna Shtok, Gideon Dror, Yoelle Maarek, Idan Szpektor Source: WWW ’12 Advisor: Dr. Jia-Ling Koh Speaker: Yi-Hsuan Yeh. OUTLINE. Introduction Description of approach

carys
Download Presentation

LEARNING FROM THE PAST: ANSWERING NEW QUESTIONS WITH PAST ANSWERS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LEARNING FROM THE PAST:ANSWERING NEW QUESTIONS WITH PAST ANSWERS Date: 2012/11/22 Author: Anna Shtok, Gideon Dror, Yoelle Maarek, Idan Szpektor Source: WWW ’12 Advisor: Dr. Jia-Ling Koh Speaker: Yi-Hsuan Yeh

  2. OUTLINE • Introduction • Description of approach • Stage one: top candidate selection • Stage two: top candidate validation • Experiment • Offline • Online • Conclusion

  3. INTRODUCTION • Users struggle with expressing their need as short query

  4. INTRODUCTION • Community-based Question Answering(CQA) sites, such as Yahoo! Answers or Baidu Zhidao Title 15% of the questions unanswered Body • Answer new questions by past resolved question

  5. OUTLINE • Introduction • Description of approach • Stage one: top candidate selection • Stage two: top candidate validation • Experiment • Offline • Online • Conclusion

  6. A TWO STAGE APPROACH find the most similar past question. decides whether or not to serve the answer

  7. STAGE ONE: TOP CANDIDATE SELECTION • Vector-space unigram model with TF-IDF weight w1 w2 w3 . . . wn (title) Qnew Qpast 1 Qpast 2 . . Qpast n TF-IDF 0.1 0.2 0.12 . . . 0.8 0.3 0.5 0.2 . . . 0.1 0.2 0 0.1 . . . 0.6 0.9 0.3 0.5 . . . 0.1 • Cosine similarity => threshold α • Ranking: Cos(Qpast title+body, Qnew title+body) => the top candidate past question and A

  8. STAGE TWO: TOP CANDIDATE VALIDATION • Train a classifier that validates whether A can be served as an answer to Qnew.

  9. SURFACE-LEVEL FEATURE • Surface level statistics • text length, number of question marks, stop word count, maximal IDF within all terms in the text, minimal IDF, average IDF, IDF standard deviation, http link count, number of figures. • Surface level similarity • TF-IDF weighted word unigram vector space model • Cosine similarity • Qnew title - Qpast title • Qnew body - Qpast body • Qnew title+ body - Qpast title+body • Qnew title+ body - Answer • Qpast title+ body - Answer

  10. LINGUISTIC ANALYSIS • Latent topic • LDA(Latent Dirichlet Allocation) Qnew Qpast A Topic 1 0.3 0.1 0.25 Topic 2 0.03 0.1 0.02 Topic 3 0.15 0.08 0.12 . . . . . . . . . . . . . . . . Topic n 0.06 0.13 0.05 • Entropy • Most probable topic • JS divergence

  11. Lexico-syntactic analysis • Stanford dependency parser • Main verb , subject, object, the main noun and adjective Ex: Q1:Why doesn’t my dog eat? Main predicate : eat Main predicate argument: dog Q2:Why doesn’t my cat eat? Main predicate : eat Main predicate argument: cat

  12. RESULT LIST ANALYSIS • Query clarity Qnew Qpast1 Qpast2 Qpast3 Qpastall 0 0.5 0.1 0.4 0.5 0 0.3 0.2 A B C D 0.1 0 0 0.9 0.5 0 0.3 0.2 • Language model & KL divergence

  13. Query feedback • Informational similaritybetween two queries can be effectively estimated by thesimilarity between their ranked document lists. • Result list length • The number of questions that pass the threshold α

  14. CLASSIFIER MODEL • Random forest classifier • Random n feature & training n past questions … ….

  15. OUTLINE • Introduction • Description of approach • Stage one: top candidate selection • Stage two: top candidate validation • Experiment • Offline • Online • Conclusion

  16. OFFLINE • Dataset • Yahoo! Answer: Beauty & Style, Health and Pets. • Included best answers chosen by the askers, and received at least three stars. • Between Feb and Dec 2010

  17. MTurk • Fleiss’s kappa

  18. ONLINE

  19. OUTLINE • Introduction • Description of approach • Stage one: top candidate selection • Stage two: top candidate validation • Experiment • Offline • Online • Conclusions

  20. CONCLUSIONS • Short questions might suffer from vocabulary mismatch problems and sparsity. • The long cumbersome descriptions introduce many irrelevant aspects which can hardly be separated from the essential question details(even for a human reader). • Terms that are repeated in the past question and in its best answer should usually be emphasized more as related to the expressed need.

  21. A general informative answer can satisfy a number of topically connected but different questions. • A general social answer, may often satisfy a certain type of questions. • In future work, we would like to better understand time-sensitive questions, such as common in the Sports category

More Related