100 likes | 217 Views
Overview of KDDCUP 2011. Nathan Liu nliu@cse.ust.hk. KDDCUP 2011 Music Recommendation. KDDCUP is the most prominent data mining competition. In recent years, there have been a number of contest related to movie recommendation: Netflix 2006: predict future ratings
E N D
Overview of KDDCUP 2011 Nathan Liu nliu@cse.ust.hk
KDDCUP 2011 Music Recommendation • KDDCUP is the most prominent data mining competition. • In recent years, there have been a number of contest related to movie recommendation: • Netflix 2006: predict future ratings • KDDCUP 2007: how many ratings and who rated what • CAMRA 2010: context aware movie recommendation • KDDCUP 2011 is organized by yahoo and provides the first and largest music ratings datasets.
KDDCUP 2011 • There are three types of items: songs, artists, albums. • Songs and albums are annotated with genres. • You are given the date, time and scores of each user’s ratings of these different items. • Challenges: • Scale: biggest public dataset ever. 1 million user, 0.6 million items, 300 million ratings • Hierarchical item relation: song belong to albums, albums belong to artists. All of them are annotated with genre tags. • Rich meta data: over 900 genres • Fine temporal resolution: no previous challenge provided time in addition to date. • For the project, you will be provided with a small subset of the data and we will held a mini internal competition to determine which group obtained the best results.
KDDCUP 2011: Task 1 • The test set consists of hold out ratings from users in the training set. Each rating is time stamped. • In the test set, you are given who rated which items at what time. • You are asked to predict the rating scores. • Closely related to Netflix competition, but may require time of day effect consideration. • References: • Koren. Matrix Factorization Techniques for Recommender Systems. (IEEE Computer 2009) • Koren. Collaborative Filtering with Temporal Dynamics (KDD’09) • Xiong. Time-Evolving Collaborative Filtering (SDM’10) • Liu. Online Evolutionary Collaborative Filtering (RECSYS’10)
KDDCUP 2011: Task 2 • The test set consists of hold out ratings from users in the training set. Time has been removed. • In the test set, you are given 6 items for each user. • You are asked to predict which 3 of the 6 are actually rated by the user. • Closely related to KDDCUP 2007 “who rated what” and CAMRA2010 weekly recommendation track • References: • Hu. Collaborative Filtering for Implicit Feedback Datasets (ICDM’08) • Rendle. Bayesian Personalized Ranking from Implicit Feedback (UAI’09) • Cremonesi. Performance of Recommender Algorithms on Top-N Recommendation Tasks (RECSYS’10) • Steck. Training and Testing of Recommender Systems on Data Missing Not at Random (KDD’10)
For The Project • We will extract a subset for you to work on. • We will provide some basic algorithms. • You can choose to work on one of the two tasks. • The minimum requirement is that you should run thorough experiments with the provided algorithms and write a report on your findings about different algorithms. • There are also new things to try….
Things to Try (1): Ensemble • Same algorithm different parameter settings • Different algorithms • Stacking: • What meta learner? Gradient Boosted Decision Tree, Linear Regression • Any meta features? Tail vs. Head segmentation strategy • References: • Bao et. al. Stacking Recommendation Engines with Additional Meta-Features (RECSYS’09) • Jahrer et. al. Combining Predictions for Accurate Recommender Systems (KDD’10)
Things to Try (3): Exploiting Item Relations and Genres • From social network of users to networks of items. • Combining collaborative filtering with genre based prediction for alleviating sparseness. • References: • Ma. Recommender Systems with Social Regularization (WSDM’11) • Agarwal. Regression based Latent Factor Models (KDD’09) • Popescul. Probabilistic Models for Unified Collaborative and Content-based Recommendation in sparse-data environments (UAI’01) • Gunawardana. Tied Boltzman Machines for Cold Start Recommendations (RecSys’08)
Things to Try (2): Temporal Dynamics • Various possible types of temporal dynamics: • Long term effect: people getting pickier over time • Short term effect: festival mood • Time of day effect: day time vs. night time preference • Periodicity: every Friday night is party time • References: • Koren. Collaborative Filtering with Temporal Dynamics (KDD’09)