240 likes | 354 Views
The Wisdom of the Few : A collaborative Filtering Approach Based on Expert Opinions from the Web. Xavier Amatriain, Josep M. Pujol, Nuria Oliver(Telefonica Research,Spain), Neal Lathia (University College of London,UK), Haewoon Kwak(KAIST,Korea)
E N D
The Wisdom of the Few : A collaborative Filtering Approach Based on Expert Opinions from the Web Xavier Amatriain, Josep M. Pujol, Nuria Oliver(Telefonica Research,Spain), Neal Lathia (University College of London,UK), Haewoon Kwak(KAIST,Korea) Session : Recommenders ,SIGIR’09 July 19-23,2009 Intelligent Database Systems Lab School of Computer Science & Engineering Seoul National University, Seoul, Korea 2010-02-10 SungEun Park
Brief Summary User Expert1 User User User Expert2 User Expert3 • Collaborative Filtering Approach • The wisdom of the crowd? • Try expert opinions on CF as an external data source. • Neighbor CF vs. Experts CF • Excellent Point • Experiment and analysis on dataset and the Expert-CF method.
Contents • Goal • Dataset Analysis • Expert Nearest-neighbors Approach • Experiment • Mean Average Error • Top-N Recommendation Precision • User Study • Discussion
Goal • The goal is not to increase CF accuracy • But.. • How can preferences of a large population be predicted by using a very small set of users. • Understand the potential of an independent and uncorrelated data set to generate recommendations. • Analyze whether professional raters are good predictors for general users. • Discuss how this approach addresses some of the traditional pitfalls in CF.
Dataset • Expert data set : Rottentomatoes.com • Aggregates the opinions of movie critics from various media sources. • 169 experts out of 1750 experts :threshold of rating >250 • User data set : Netflix.com movie reviews • 8,000 out of 17,770 movies • Matching movies to ones in Rotten tomatoes • 20% of the movies have one rating.
Dataset Analysis(1) User Ratings Count CDF(cumulative distribution function) : 임의 변수 u의 확률 밀도함수(p.d.f.) P[u]의 누적 적분으로 정의 • Number of Ratings and Data Sparsity • The expert matrix is less sparse than the user matrix and more evenly distributed. • Sparsity Coefficient : Users 0.01 vs. Expert 0.07 • One expert has more ratings than users.
Dataset Analysis(2) Movie User Ratings • Average Rating Distribution • Experts tend to act similarly. • Their overall opinions on the movies is more varied
Dataset Analysis(3) Wider curve • Rating Standard Deviation • Experts tend to agree more than regular users • Lower standard deviation • Experts tend to deviate less from their personal average rating
Expert Nearest-neighbors Approach Adjusting factor for To take into account the # of items co-rated by both users • Similarity between each user and the experts set. • Similarity • a,b : users • Na, Nb : the number of items rated by each user • Na∪b : the number of co-rated items • Look only for the experts whose similarity to the given user is greater than δ. • risk of finding very few neighbors • confidence threshold τ : the minimum number of expert neighbors who must have rated the item in order to trust their prediction.
Expert Nearest-neighbors Approach • sim: V × V → R, we define a set of experts E = {e1, ..., ek} ⊆ V and a set of users U = {u1, ..., uN} ⊆ V . Given a particular user u ⊆ U and a value δ, we find the set of experts E′ ⊆ E such that: ∀e ⊆ E′ ⇒ sim(u, e) ≥ δ. • E′′ ⊆ E′ such that ∀e ⊆ E′′ ⇒ rei ≠ ◦, where rei is the rating of item i by expert e ⊆ E′, and ◦ is the value of the unrated item. • E′′ = e1...en, • No prediction when n < τ • Predicted Rating computed when n ≥ τ • Predicted rating
Experiment Mean Average Error and Coverage Top-N Recommendation Precision User Study
Experiment(1): Mean Average Error and Coverage Interplay between τ and δ τ = 10 and δ = 0.01
Experiment(1): Mean Average Error and Coverage 5-fold cross-validation : user data set (by random sampling) into 80% training - 20% testing sets τ = 10 and δ = 0.01 Worst-case baseline : “critics’ choice” recommendation A significant accuracy improvement than using the experts’ average. Expert-CF shows 0.08 higher MAE but 6% higher coverage
Experiment(1): Mean Average Error and Coverage Expert-CF is better NN-CF is better • Expert-CF is better for users with MAE > 1.0 • NN-CF is better for users with small MAE (less than 0.5) • Minority population around 10% • Expert-CF and NN-CF are similar in the rest of the range.
Experiment(2): Top-N Recommendation Precision • Classify items as being recommendable or not recommendable given a threshold • For a given user, compute all predictions and present those greater or equal than σ to the user • if there is no item in the test set that is worth recommending to a given target user, we simply return an empty list. • For a given user, compute all predictions and present those greater than or equal to σ to the user • For all predicted items that are present in the user’s test set, look at whether it is a true positive (actual user rating greater or equal to σ) or a false positive (actual user rating less than σ). • Compute the precision of our classifications using the classical definition of this measure
Experiment(2): Top-N Recommendation Precision σ For σ = 4, NN-CF clearly outperforms expert-CF. For σ = 3, the precision in both methods is similar. For users willing to accept recommendations for any above average(3) item, the expert-based method appears to behave as well as a standard NN-CF
Experiment(3) : User Study • asked 57 participants to rate 100 preselected movies. • Random List: A random sequence of movie titles • Critics choice: The movies with the highest mean rating given by the experts. • Neighbor-CF: Similar users in the Netflix to each survey respondents’ • Expert-CF: Similar to (3), but using the expert dataset instead of the Netflix ratings. • Generate the recommendation lists based on limited user feedback : The average number of ratings was of 14.5 per participant → Cold Start Condition.
Experiment(3) : User Study 50% overall quality of the recommendation lists Overall quality : average response The expert-CF approach is the only method that obtains an average rating higher than 3 Critics’ choice and expert-CF are the only approaches that are qualified as very good are expert based.
Experiment(3) : User Study • The ratings to the question ”the list contains movies I think I would like or not like.” • an important aspect on the evaluation of a recommender system is how often the user is disappointed by the results • Recommending wrong items mines the user’s assessment of the system and compromises its usability. • The expert-CF approach generates the least negative response when compared to the other methods.
Experiment(3) : User Study P-values • An analysis of variance : test whether the differences between the four recommendation lists are statistically significant or not. • The null hypothesis : the average user evaluation for the four different lists is the same. • P-values smaller than 0.01 : a rejection of the null hypothesis • the differences on the user satisfaction from the three baseline methods is not statistically significant • the differences on the user satisfaction from Expert-CF are statistically meaningful
Discussion • using a limited set of external experts to generate the predictions • Data Sparsity • domain experts are more likely to have rated a large percentage of the items • Noise and Malicious Ratings • reducing noise : Experts are expected to be more consistent and conscious with their ratings • Cold Start Problem • Motivated expert users typically rate a new item entering the collection as soon as they know of its existence and therefore minimize item cold-start.
Discussion • Scalability • Computing the similarity matrix : O(N2M) • for N users in an M-item collection • Experts is much smaller scale than users (169 experts vs. 500, 000 potential neighbors) • Privacy • Only needs the target users’ profile and the current expert ratings.
Conclusion A reduced set of expert ratings can predict the ratings of a large population. The method’s performance is comparable to traditional CF algorithms, even when using an extremely small expert set. addresses some of the shortcomings of traditional CF: data sparsity, scalability, noise in user feed-back, privacy and the cold-start problem.
Q&A Thank you…