370 likes | 490 Views
The Summary of My Work In Graduate Grade One. Reporter: Yuanshuai Sun E-mail: sunyuan_2008@yahoo.cn. 1. 2. 3. 4. 5. Content. Recommender System. KNN Algorithm—CF. Matrix Factorization. MF on Hadoop. Thesis Framework. Recommender System. 1.
E N D
The Summary of My Work In Graduate Grade One Reporter: Yuanshuai Sun E-mail: sunyuan_2008@yahoo.cn
1 2 3 4 5 Content Recommender System KNN Algorithm—CF Matrix Factorization MF on Hadoop Thesis Framework
Recommender System 1 Recommender system is a system which can recommend something you are maybe interested that you haven’t a try. For example, if you have bought a book about machine learning, the system would give a recommendation list including some books about data mining, pattern recognition, even some programming technology.
Recommender System 1 But how sheget the recommendation list ? Machine Learning 1. Nuclear Pattern Recognition Method and Its Application 2. Introduction to Robotics 3. Data Mining 4. Beauty of Programming 5. Artificial Intelligence
Recommender System 1 There are many ways by which we can get the list. Recommender systems are usually classified into the following categories, based on how recommendations are made, 1. Content-based recommendations: The user will be recommended items similar to the ones the user preferred in the past;
Recommender System 1 2. Collaborative recommendations: The user will be recommended items that people with similar tastes and preferences liked in the past; Corated Item recommend it to target user Top 1 The similar user favorite but target user not bought
Recommender System 1 3. Hybrid approaches: These methods combine collaborative and content-based methods, which can help to avoid certain limitations of content-based and collaborative. Different ways to combine collaborative and content-based methods into a hybrid recommender system can be classified as follows: 1). implementing collaborative and content-based methods separately and combining their predictions, 2). incorporating some content-based characteristics into a collaborative approach, 3). incorporating some collaborative characteristics into a content-based approach, 4). constructing a general unifying model that incorporates both content-based and collaborative characteristics.
KNN Algorithm—CF 2 KDD CUP 2011 website: http://kddcup.yahoo.com/index.php Recommending Music Items based on the Yahoo! Music Dataset. The dataset is split into two subsets: - Train data: in the file trainIdx2.txt - Test data: in the file testIdx2.txtAt each subset, user rating data is grouped by user. First line for a user is formatted as: <UsedId>|<#UserRatings>\n Each of the next <#UserRatings> lines describes a single rating by <UsedId>. Rating line format: <ItemId>\t<Score>\n The scores are integers lying between 0 and 100, and are withheld from the test set. All user id's and item id's are consecutive integers, both starting at zero
KNN Algorithm—CF 2 KNN is the algorithm used when I participate the KDD CUP 2011 with my advisor Mrs Lin, KNN belongs to collaborative recommendation. Corated Item recommend it to target user Top 1 The similar user’s favorite song but target user not seen
KNN Algorithm—CF 2 item user
KNN Algorithm—CF 2 1. Cosine distance 2. Pearson correlation coefficient Where Sxy is the set of all items corated by both users x and y.
KNN Algorithm—CF 2 1. Cosine distance where and
KNN Algorithm—CF 2 2. Pearson correlation coefficient where and
KNN Algorithm—CF 2 trackData.txt - Track information formatted as:<TrackId>|<AlbumId>|<ArtistId>|<Optional GenreId_1>|...|<Optional GenreId_k>\n albumData.txt - Album information formatted as:<AlbumId>|<ArtistId>|<Optional GenreId_1>|...|<Optional GenreId_k>\n artistData.txt - Artist listing formatted as:<ArtistId>\n genreData.txt - Genre listing formatted as:<GenreId>\n
KNN Algorithm—CF 2 1. The distance between parent node with child node where is comentropy. 2. Similarity between c1 and c2
Matrix Factorization 3 i1 i2 i3 Users Feature Matrix Items Feature Matrix u1 u2 u3 x11*y11 + x12*y12 = 1 x11*y21 + x12*y22 = 3 x21*y11 + x22*y12 = 2 x31*y21 + x32*y22 = 1 x31*y31 + x32*y32 = 3 x11*y31 + x12*y32 = ? x21*y21 + x22*y22 = ? x21*y31 + x22*y32 = ? x31*y11 + x32*y12 = ? U,V
Matrix Factorization 3 Matrix factorization (abbr. MF), just as the name suggests, decomposes a big matrix into the multiplication form of several small matrix. It defines mathematically as follows, We here assume the target matrix , the factor matrix and , where k << min (m, n), so it is
Matrix Factorization 3 Kernel Function Kernel Function decides how to compute the prediction matrix , that is, it’s a function with the features matrix U and V as the arguments. We can express it as follows:
Matrix Factorization 3 Kernel Function For the kernel K : one can use one of the following well-known kernels: ……………… linear ………… polynomial ……….. RBF ……… logistic with
Matrix Factorization 3 We quantify the quality of the approximation with the Euclidean distance, so we can get the objective function as follows, Where i.e. is the predict value.
Matrix Factorization 3 1. Alternating Descent Method This method only works, when the loss function implies with Euclidean distance. So, we can get The same to .
Matrix Factorization 3 2. Gradient Descent Method The update rules of U defines as follows, where The same to .
Matrix Factorization 3 Stochastic Gradient Algorithm Gradient Algorithm
Matrix Factorization 3 Online Algorithm Online-Updating Regularized Kernel Matrix FactorizationModels for Large-Scale Recommender Systems
MF on Hadoop 4 Loss Function We update the factor V for reducing the objective function f with the conventional gradient descendent, as follows, Here we set , so it is reachable , the same to factor matrix U.
MF on Hadoop 4 × = + Left Matrix × = + Right Matrix × = ||
MF on Hadoop 4 where
Recommendation System Thesis Framework 5 • Introduction to recommendation system • My work to KNN • Matrix factorization in recommendation system • MF incremental updating using Hadoop
谢谢观赏! Thanks!