430 likes | 877 Views
Introduction to Recommender System. Guo , Guangming guogg.good@gmail.com. Outline . Background & Definition Some history worth noting Various applications Main-stream approach Evaluation Some resources. Outline . Background & Definition Related areas Challenges Paradigms
E N D
Introduction to Recommender System Guo, Guangming guogg.good@gmail.com
Outline • Background & Definition • Some history worth noting • Various applications • Main-stream approach • Evaluation • Some resources Lab of Semantic Computing and Data Mining
Outline • Background & Definition • Related areas • Challenges • Paradigms • Some history worth noting • Various applications • Main-stream approach • Evaluation • Some resources Lab of Semantic Computing and Data Mining
Become clear with basic concepts • First step of learning • Building blocks of new ideas • Define the rules to play with • Prerequisites for communication Lab of Semantic Computing and Data Mining
Definition of Recommender Systems • Also named recommendation systems • A subclass of information filtering system that seek to predict the 'rating' or 'preference' that a user would give to an item (such as music, books, or movies) or social element (e.g. people or groups) they had not yet considered, using a model built from the characteristics of an item (content-based approaches) or the user's social environment (collaborative filtering approaches). --http://en.wikipedia.org/wiki/Recommender Lab of Semantic Computing and Data Mining
More truth • Important vertical technique in data mining • One of the most success solution for industry • Became an independent research area in 1990s • Many highly reputed academic conferences such as SIGIR, KDD, ICML, WWW, EMNLP et al. have it as their subtopics. • RecSys is fully devoted to this area • Data mining/machine learning approach • 1) specifying heuristics that define the utility function and empirically validating its performance • 2) estimating the utility function that optimizes certain performance criterion, such as the mean square error. Lab of Semantic Computing and Data Mining
Chanllenges • Cold start • Long tail • Data sparsity • Scalability • Social & Temporal • Context-aware • Personality-aware • Being accuracy is not enough Lab of Semantic Computing and Data Mining
Related Research Area • Cognitive science • Text mining • Natural Language Processing • Information retrieval • Machine learning • Association mining • Approximation theory • Management science • Consumer choice in marketing Lab of Semantic Computing and Data Mining
Paradigm of RecSys • Content-based recommendations: • recommended items similar to the ones the user preferred in the past; • Collaborative recommendations: • recommended items that people with similar tastes and preferences liked in the past; • Knowledge-based recommendations: • recommended items based existing knowledge models that fit the needs of users • Hybrid approaches: • Combination of various input data or/and composition various mechanism Lab of Semantic Computing and Data Mining
Background • Universe Problem in Information Age • Information overload • From SE to Recsys • pull vs. push • Web 1.0 vs. web 2.0 • Leverage the existing user generated data • User profile • Behavior history on the web,Rating • Click through data, browse data • Great benefits(win-win) • Help users find valuable information • Help business make more profits Lab of Semantic Computing and Data Mining
Outline • Background & Definition • Some history worth noting • Netflix prize • Various applications • Main-stream approach • Evaluation • Some resources Lab of Semantic Computing and Data Mining
A peak in the history • Research on collaborative filtering algorithm reached a peak during the Netflix movie recommendation competition • October 2, 2006 ~ September 21, 2009 • RMSE • Must outperform baseline by 10% Lab of Semantic Computing and Data Mining
The Million Dollar Programming Prize • The Netflix Prize • Greatly energize the research in Recsys • Last from 2006 to 2009 • Finalist: BellKor’sPragamatic Chaos team • A joint-team • Andreas Töscher and Michael Jahrer ( Commendo Research &Consulting GmbH), originally team BigChaos • Robert Bell, and Chris Volinsky (AT& T), Yehuda Koren (Yahoo),originally team BellKor • Martin Piotte and Martin Chabbert, originally team Pragmatic Theory • The ensemble Team • The most accurate algorithm in 2007 used an ensemble method of 107 different algorithmic approaches Lab of Semantic Computing and Data Mining
Outline • Background & Definition • Some history worth noting • Various applications • Main-stream approach • Evaluation • Some resources Lab of Semantic Computing and Data Mining
Existing applications • News/Article recommendation • Targeted Advertisement • Tags Recommendation • Mobile Recommendation • E-commerce • Books, movies, music… Lab of Semantic Computing and Data Mining
Benefits • Alternative to Search Engine • Boost the profit • Amazon et al. • Better user experience Lab of Semantic Computing and Data Mining
Outline • Background & Definition • Some history worth noting • Various applications • Main-stream approach • Content-based • Collaborative filtering • Evaluation • Some resources Lab of Semantic Computing and Data Mining
Content-based • Simple compute the similarity • Cosine similarity or pearson correlation coefficient • TF-IDF • Utilize dimensionality reduction • LDA Lab of Semantic Computing and Data Mining
Collaborative filtering • Association mining • Memory-based • Nearest-neighbors • Model-based • Latent fator model • Some comparison • Space & time • Theory foundation and interpretability Lab of Semantic Computing and Data Mining
Latent factor model • LSI, pLSA, LDA, latent class model, Topic model et al. • A method based on matrix factorization/decomposition where R is the rating matrix, P and Q are sub-matrix after dimension reduction An low-rank approximation of the original matrix Lab of Semantic Computing and Data Mining
Computations • Traditional SVD • Needs a simple method to complete the matrix • Cost on the completed dense matrix is very high • Situation changed in 2006 after the Netflix Prize • Simon Funk • Defined a cost function on the training data • To avoid overfitting, add regularization term • Gradient descent to optimize C(p,q) Lab of Semantic Computing and Data Mining
Outline • Background & Definition • Some history worth noting • Various applications • Main-stream approach • Evaluation • Some resources Lab of Semantic Computing and Data Mining
Evaluation Criterion • User satisfaction by quesionnaire • Precision • RMSE • Top-k • Coverage • Diversity • Novelty • Serendipity • Originally thinking recommendation has non-sense • … Lab of Semantic Computing and Data Mining
Outline • Background & Definition • Some history worth noting • Various applications • Main-stream approach • Evaluation • Some resources Lab of Semantic Computing and Data Mining
葫芦项亮 Lab of Semantic Computing and Data Mining
Resources • www.recsyswiki.com • 各大推荐引擎资料汇总 by 大魁 • http://blog.csdn.net/lzt1983/article/details/7914536 Lab of Semantic Computing and Data Mining