270 likes | 283 Views
This study presents a family of algorithms for predicting the long-term impact of CQA posts by capturing coupling, non-linearity, and dynamics. The algorithms aim to be scalable and efficient, achieving up to 35.8% improvement in effectiveness and 390x speedup in efficiency.
E N D
Predicting Long-Term Impact of CQA Posts: A Comprehensive Viewpoint Yuan Yao Joint work with Hanghang Tong, FengXu, and Jian Lu Aug 24-27, KDD 2014 1
Roadmap 2 Background and Motivations Modeling Multi-aspect Computation Speedup Empirical Evaluations Conclusions
Roadmap 3 Background and Motivations Modeling Multi-aspect Computation Speedup Empirical Evaluations Conclusions
CQA 4
Long-Term Impact Q: How many users will find it beneficial? What is the Long-Term Impact of a Q/A post?
Challenges 6 • Q: Why not off-the-shell data mining algorithms? • Challenge 1: Multi-aspect • C1.1. Coupling between questions and answers • C1.2.Feature non-linearity • C1.3.Posts dynamically arrive • Challenge 2: Efficiency
C1.1 Coupling Strong positive correlation! [Yao+@ASONAM’14] Predicted Q Impact Yq Predicted A Impact Ya Fq 1 Fa 2 3 VotingConsistency Question Features Answer Features [Yao+@ASONAM’14] Y Yao, H Tong, T Xie, L Akoglu, F Xu, J Lu. Joint Voting Prediction for Questions and Answers in CQA. ASONAM 2014.
C1.1 Coupling Question prediction Answer prediction 1 2 Voting Consistency Regularization 3 8 LIP-M: [Yao+@ASONAM’14]
C1.2 Non-linearity • The kernel trick (e.g., SVM) • Mercer kernel • Kernel matrix as new feature matrix 9
C1.3 Dynamics (x1, y1) (x2, y2) … (xt, yt) Current model Existing examples Modelt New model New examples + Modelt+1 (xt+1, yt+1) [Haykin2005] S Haykin. Adaptive filter theory. 2005. Solution: recursive least squares regression [Haykin2005]
This Paper 11 • Q1: how to comprehensively capture the multi-aspect in one algorithm? • Coupling, non-linearity, and dynamics • Q2: how to make the long-term impact prediction algorithm efficient?
Roadmap 12 Background and Motivations Modeling Multi-aspect Computation Speedup Empirical Evaluations Conclusions
Modeling Non-linearity Question prediction Answer prediction 1 2 Voting Consistency Regularization 3 13 • Basic Idea: kernelize LIP-M • Details - LIP-KM: • Closed-form solution: • Complexity: O(n3)
Modeling Dynamics (matrix inverse lemma) 14 Basic idea: recursively update LIP-KM Details - LIP-KIM: Complexity: O(n3) -> O(n2)
Roadmap 16 Background and Motivations Modeling Multi-aspect Computation Speedup Empirical Evaluations Conclusions
Approximation Method (1) (Nyström approximation) (SVD on X1) (Eigen-decomposition) (Eigen-decomposition) 17 • Basic idea: compress the kernel matrix • Details – LIP-KIMA: • 1) Separate decomposition • 2) Make decomposition reusable • 3) Apply decomposition on LIP-KIM • Complexity: O(n2) -> O(n)
Approximation Method (2) Current model New examples (x1, y1) (x2, y2) … (xt, yt) Existing examples (xt+1, yt+1) Modelt Filtering New model + ? Modelt+1 18 Basic idea: filter less informative examples Details - LIP-KIMAA: Complexity: O(n) -> <O(n)
Summary LIP-KIMAA <O(n) LIP-KIMA O(n) K: Non-linearity I: Dynamics M: Coupling A: Approximation LIP-KIM O(n2) LIP-KM O(n3) LIP-KI (Recursive Kernel Ridge Regression) LIP-IM LIP-K(Kernel Ridge Regression) LIP-M (CoPs) LIP-I (Recursive Ridge Regression) Coupling Non-linearity Dynamics Ridge Regression 19
Roadmap 20 Background and Motivations Modeling Multi-aspect Computation Speedup Empirical Evaluations Conclusions
Experiment Setup Time Initial set Incremental set Training set Test set 21 • Datasets (http://blog.stackoverflow.com/category/cc-wiki-dump/) • Stack Overflow , Mathematics Stack Exchange • Features • Content (bag-of-words) & contextual features
Evaluation Objectives 22 • O1: Effectiveness • How accurate are the proposed algorithms for long-term impact prediction? • O2: Efficiency • How scalable are the proposed algorithms?
Effectiveness Results Our methods (better) Comparisons with existing models. 23
Efficiency Results LIP-KIMAA (sub-linear) Ours (better) The speed comparisons. 25
Quality-Speed Balance-off Our methods (better) 26
Roadmap 27 Background and Motivations Modeling Multi-aspect Computation Speedup Empirical Evaluations Conclusions
Conclusions 28 A family of algorithms for long-term impact prediction of CQA posts • Q1: how to capture coupling, non-linearity, and dynamics? • A1: voting consistency + kernel trick + recursive updating • Q2: how to make the algorithms scalable? • A2: approximation methods Empirical Evaluations • Effectiveness: up to 35.8% improvement • Efficiency: up to 390x speedup and sub-linear scalability
Thanks! Q&A • Yuan Yao, yyao@smail.nju.edu.cn Authors: Yuan Yao, Hanghang Tong, FengXu, and Jian Lu