1 / 5

A shot at Netflix Challenge

A shot at Netflix Challenge. Hybrid Recommendation System Priyank Chodisetti. Problem and Approach. A data set of 240,000 users and their ratings for 17770 movies is provided. Given a user ‘p’ and movie ‘m’, we should predict how much the ‘p’ will rate the movie ‘m’

dalia
Download Presentation

A shot at Netflix Challenge

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A shot at Netflix Challenge Hybrid Recommendation System Priyank Chodisetti

  2. Problem and Approach • A data set of 240,000 users and their ratings for 17770 movies is provided. • Given a user ‘p’ and movie ‘m’, we should predict how much the ‘p’ will rate the movie ‘m’ • My Idea: Take entirely two different approaches and merge those results. • Applied Latent Semantic Analysis and Collaborative filtering techniques on the dataset independently • Through LSI, Mapped the dataset to lower dimensional space and tried to extract relation between different movies • Through Collaborative filtering, tried to find the user tastes by comparing with other similar users • Major Problems: • Computationally large, for example one soultion of mine ran for 14 hrs with most diappointing results • Matrix is Sparse for almost ~99 and hence ~99% missing values

  3. Handling Major Problems • Generally missing values are handled by taking the average rating given by the user or overall average rating of all users. But I believe that, $1,000,000 winner will be the one who handles the missing values well. • Adopted method described in [2] which aptly fits in the current situtation. • LSI: • Apply SVD on the Matrix, retain the first ‘k’ higher singular values. It gives us the space in ‘k’ dimensions or best ‘k’ rank approximation • But to How Many Dimensions?? Experiment • To make a prediction for person p's rating for movie m, we would take the mth row of U, matrix multiply with S, and matrix multiply that with the pth column of V(t) • Collaborative Filtering: • Find the kNN and come out with predicted rating. • If we consider Euclidean distance as distance measure, we have 17770 dimensions. So consider Pearson Co-efficient

  4. Implementation • Mixing LSI and Collaborative Filtering • Find kNN in reduced dimension space, and consider euclidean distance as the distance measure. • Used SVDLIBC which used Lanczo method for Singular Value Decomposition • Computational Challenges: • All the files in the training set are converted into one single larget file, so as to reduce disk access and increase the response time • Converted the whole data into sparse text format • Also generated a large data set, which gives in terms of user: movie, his rating format in contrast to given movie: user, his rating format • Using C++ • Future Extensions this Winter • Plans to implement General Hebbian Algorithm, so as to reduce the computation time and will be easier to handle missing values. • Interested and motivated friends can join me this winter

  5. References • M Brand. Fast Online SVD revisions for lightweight recommender systems. In Proc. SIAM International Conference on Data Mining. 2003 • M. W. Berry. Incremental Singular Value Decomposition of uncertain data. In Proceedings, European conference on the SIGIR. ACM. 1999 • B. Sarwar, G. Karypis, J.Konstan, and J.Riedi. Application of Dimensionality Reduction in recommender System - a case study. In ACM WebKDD Workshop, 2000

More Related