280 likes | 415 Views
Rachid Guerraoui , EPFL. R ecommendation systems are good. What is a good recommendation system?. A good recommendation system is one that provides good recommendations. What is a good recommendation?. You know it when you see it.
E N D
Recommendation systems are good What is a good recommendation system?
A good recommendation system is one that provides good recommendations What is a good recommendation?
You know it when you see it • “ I shall not today attempt further to define the kinds of material I understand to be embraced within that shorthand description ["hard-core pornography"]; and perhaps I could never succeed in intelligibly doing so. But I know it when I see it, and the motion picture involved in this case is not that” • Justice Potter Stewart, US Supreme Court, 1964
What is a good recommendation system ? Ideally: Build and deploy your system Pragmatic: Transform past into future
Example • Members of program committee (20) want to evaluate the submitted papers (200) • Nobody has enough time to read all papers • Each researcher is assigned a subset of papers • A recommendation system uses the scores to find the opinion of all members about all papers
What is a good recommendation? It depends on the correlation Theory to the rescue
General recommendation model • nusers • k* nobjects • For each user and object: a grade • The grades of a user form his preference vector • The vectors of users form the preference matrix • Grades may be binary, discrete, continuous
Input? Vectors of grades: v(p) (known partially to the players) Output? Vectors of grades: w(p) (seeking to approximate v(p))
Ideal output w(p) = v(p) Target output • Minimize max |w(p)-v(p)| (Hamming distance)
How to account for the level of correlation? Compare with a perfect on-line algorithm
The perfect on-line algorithm (1) All players know all partial vectors Shared billboard
The perfect on-line algorithm (2) Chooses elements of the partial vectors to fill (B budget) The algorithm assigns initial papers The player is initially indulgent (learning phase)
The perfect on-line algorithm (3) Knows the level of correlation Hamming diameter of a set P
20 pc members; 200 papers Every member can read 10 papers All have the same taste Perfect solution possible?
20 pc members; 200 papers Two clusters of 10 have the same taste Perfect solution possible? Every member needs to read 20
Assume player p can probe B objects How many other players does p need to collaborate with to fill its vector? n/B*k – 1
20 pc members; 200 papers 4 clusters of 5 with diameter 8 Every member reads 20 What is the minimal error rate?
Ideal algorithm (k=1) • A playerp has to use ideas of (n/B)-1 other players to estimate her/his preferences • The rate of error for pdepends on the hamming distance between pand the other (n/B)players • This is with a constant factor of the diameter of these n/B players In the worst case, p cannot do better
Claim For every B-algorithm, there is some distribution of preferences such that (with constant probability)
Proof (sketch) Consider a constant D > 2B Define a preference vector as follows: Let P be a set of players of size n/B • Let p in P with a random preference vector • Assign a random preference vector outside P Choose a set S of D objects. For every player q in P, v(q)=v(p) except in S which is random
Proof (sketch) • Probes outside P provide no information to p • Probes inside P provide no information to pw.r.t S • Since p probes at most B objects and S contains D > 2B objects, there are at least D/2 objects for which p has no information • No algorithm can do better than guess preferences in S • The rate of error is at least D/4 and the diameter of P is less than D
Optimality An algorithm is (B,c)-optimal if for every input set of preferences
So what? The best we can do is find clusters of players that are - Small enough (small diameter) to provide “accurate” preferences And - Big enough to cover all objects • Practically speaking? • - Try different sizes of clusters
Optimality • Assume each player can evaluate B objects. • Given B, and the level of correlation among players, there is a minimum rate of error that can be achieved. • There is an algorithm that obtains a constant approximation of this error-rate, and each player evalutesO(B.Polylog(n)) objects.
Definition of Optimality • An algorithm is asymptotically optimal in terms of error rate, if for every player p we have: • |w(p)-v(p)| < min|P|>n/B-1cD(P) • Where c is a constant and D(P) is the diameter of set P. P can be any set of players with size at least n/B.