1 / 27

Predicting Long-Term Impact of CQA Posts: A Comprehensive Viewpoint

This study presents a family of algorithms for predicting the long-term impact of CQA posts by capturing coupling, non-linearity, and dynamics. The algorithms aim to be scalable and efficient, achieving up to 35.8% improvement in effectiveness and 390x speedup in efficiency.

mburns
Download Presentation

Predicting Long-Term Impact of CQA Posts: A Comprehensive Viewpoint

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Predicting Long-Term Impact of CQA Posts: A Comprehensive Viewpoint Yuan Yao Joint work with Hanghang Tong, FengXu, and Jian Lu Aug 24-27, KDD 2014 1

  2. Roadmap 2 Background and Motivations Modeling Multi-aspect Computation Speedup Empirical Evaluations Conclusions

  3. Roadmap 3 Background and Motivations Modeling Multi-aspect Computation Speedup Empirical Evaluations Conclusions

  4. CQA 4

  5. Long-Term Impact Q: How many users will find it beneficial? What is the Long-Term Impact of a Q/A post?

  6. Challenges 6 • Q: Why not off-the-shell data mining algorithms? • Challenge 1: Multi-aspect • C1.1. Coupling between questions and answers • C1.2.Feature non-linearity • C1.3.Posts dynamically arrive • Challenge 2: Efficiency

  7. C1.1 Coupling Strong positive correlation! [Yao+@ASONAM’14] Predicted Q Impact Yq Predicted A Impact Ya Fq 1 Fa 2 3 VotingConsistency Question Features Answer Features [Yao+@ASONAM’14] Y Yao, H Tong, T Xie, L Akoglu, F Xu, J Lu. Joint Voting Prediction for Questions and Answers in CQA. ASONAM 2014.

  8. C1.1 Coupling Question prediction Answer prediction 1 2 Voting Consistency Regularization 3 8 LIP-M: [Yao+@ASONAM’14]

  9. C1.2 Non-linearity • The kernel trick (e.g., SVM) • Mercer kernel • Kernel matrix as new feature matrix 9

  10. C1.3 Dynamics (x1, y1) (x2, y2) … (xt, yt) Current model Existing examples Modelt New model New examples + Modelt+1 (xt+1, yt+1) [Haykin2005] S Haykin. Adaptive filter theory. 2005. Solution: recursive least squares regression [Haykin2005]

  11. This Paper 11 • Q1: how to comprehensively capture the multi-aspect in one algorithm? • Coupling, non-linearity, and dynamics • Q2: how to make the long-term impact prediction algorithm efficient?

  12. Roadmap 12 Background and Motivations Modeling Multi-aspect Computation Speedup Empirical Evaluations Conclusions

  13. Modeling Non-linearity Question prediction Answer prediction 1 2 Voting Consistency Regularization 3 13 • Basic Idea: kernelize LIP-M • Details - LIP-KM: • Closed-form solution: • Complexity: O(n3)

  14. Modeling Dynamics (matrix inverse lemma) 14 Basic idea: recursively update LIP-KM Details - LIP-KIM: Complexity: O(n3) -> O(n2)

  15. Roadmap 16 Background and Motivations Modeling Multi-aspect Computation Speedup Empirical Evaluations Conclusions

  16. Approximation Method (1) (Nyström approximation) (SVD on X1) (Eigen-decomposition) (Eigen-decomposition) 17 • Basic idea: compress the kernel matrix • Details – LIP-KIMA: • 1) Separate decomposition • 2) Make decomposition reusable • 3) Apply decomposition on LIP-KIM • Complexity: O(n2) -> O(n)

  17. Approximation Method (2) Current model New examples (x1, y1) (x2, y2) … (xt, yt) Existing examples (xt+1, yt+1) Modelt Filtering New model + ? Modelt+1 18 Basic idea: filter less informative examples Details - LIP-KIMAA: Complexity: O(n) -> <O(n)

  18. Summary LIP-KIMAA <O(n) LIP-KIMA O(n) K: Non-linearity I: Dynamics M: Coupling A: Approximation LIP-KIM O(n2) LIP-KM O(n3) LIP-KI (Recursive Kernel Ridge Regression) LIP-IM LIP-K(Kernel Ridge Regression) LIP-M (CoPs) LIP-I (Recursive Ridge Regression) Coupling Non-linearity Dynamics Ridge Regression 19

  19. Roadmap 20 Background and Motivations Modeling Multi-aspect Computation Speedup Empirical Evaluations Conclusions

  20. Experiment Setup Time Initial set Incremental set Training set Test set 21 • Datasets (http://blog.stackoverflow.com/category/cc-wiki-dump/) • Stack Overflow , Mathematics Stack Exchange • Features • Content (bag-of-words) & contextual features

  21. Evaluation Objectives 22 • O1: Effectiveness • How accurate are the proposed algorithms for long-term impact prediction? • O2: Efficiency • How scalable are the proposed algorithms?

  22. Effectiveness Results Our methods (better) Comparisons with existing models. 23

  23. Efficiency Results LIP-KIMAA (sub-linear) Ours (better) The speed comparisons. 25

  24. Quality-Speed Balance-off Our methods (better) 26

  25. Roadmap 27 Background and Motivations Modeling Multi-aspect Computation Speedup Empirical Evaluations Conclusions

  26. Conclusions 28 A family of algorithms for long-term impact prediction of CQA posts • Q1: how to capture coupling, non-linearity, and dynamics? • A1: voting consistency + kernel trick + recursive updating • Q2: how to make the algorithms scalable? • A2: approximation methods Empirical Evaluations • Effectiveness: up to 35.8% improvement • Efficiency: up to 390x speedup and sub-linear scalability

  27. Thanks! Q&A • Yuan Yao, yyao@smail.nju.edu.cn Authors: Yuan Yao, Hanghang Tong, FengXu, and Jian Lu

More Related