Rachid Guerraoui , EPFL

RachidGuerraoui, EPFL

Recommendation systems are good What is a good recommendation system?

A good recommendation system is one that provides good recommendations What is a good recommendation?

You know it when you see it • “ I shall not today attempt further to define the kinds of material I understand to be embraced within that shorthand description ["hard-core pornography"]; and perhaps I could never succeed in intelligibly doing so. But I know it when I see it, and the motion picture involved in this case is not that” • Justice Potter Stewart, US Supreme Court, 1964

What is a good recommendation system ? Ideally: Build and deploy your system Pragmatic: Transform past into future

Example • Members of program committee (20) want to evaluate the submitted papers (200) • Nobody has enough time to read all papers • Each researcher is assigned a subset of papers • A recommendation system uses the scores to find the opinion of all members about all papers

What is a good recommendation? It depends on the correlation Theory to the rescue

General recommendation model • nusers • k* nobjects • For each user and object: a grade • The grades of a user form his preference vector • The vectors of users form the preference matrix • Grades may be binary, discrete, continuous

Input? Vectors of grades: v(p) (known partially to the players) Output? Vectors of grades: w(p) (seeking to approximate v(p))

Ideal output w(p) = v(p) Target output • Minimize max |w(p)-v(p)| (Hamming distance)

How to account for the level of correlation? Compare with a perfect on-line algorithm

The perfect on-line algorithm (1) All players know all partial vectors Shared billboard

The perfect on-line algorithm (2) Chooses elements of the partial vectors to fill (B budget) The algorithm assigns initial papers The player is initially indulgent (learning phase)

The perfect on-line algorithm (3) Knows the level of correlation Hamming diameter of a set P

20 pc members; 200 papers Every member can read 10 papers All have the same taste Perfect solution possible?

20 pc members; 200 papers Two clusters of 10 have the same taste Perfect solution possible? Every member needs to read 20

Assume player p can probe B objects How many other players does p need to collaborate with to fill its vector? n/B*k – 1

20 pc members; 200 papers 4 clusters of 5 with diameter 8 Every member reads 20 What is the minimal error rate?

Ideal algorithm (k=1) • A playerp has to use ideas of (n/B)-1 other players to estimate her/his preferences • The rate of error for pdepends on the hamming distance between pand the other (n/B)players • This is with a constant factor of the diameter of these n/B players In the worst case, p cannot do better

Claim For every B-algorithm, there is some distribution of preferences such that (with constant probability)

Proof (sketch) Consider a constant D > 2B Define a preference vector as follows: Let P be a set of players of size n/B • Let p in P with a random preference vector • Assign a random preference vector outside P Choose a set S of D objects. For every player q in P, v(q)=v(p) except in S which is random

Proof (sketch) • Probes outside P provide no information to p • Probes inside P provide no information to pw.r.t S • Since p probes at most B objects and S contains D > 2B objects, there are at least D/2 objects for which p has no information • No algorithm can do better than guess preferences in S • The rate of error is at least D/4 and the diameter of P is less than D

Optimality An algorithm is (B,c)-optimal if for every input set of preferences

So what? The best we can do is find clusters of players that are - Small enough (small diameter) to provide “accurate” preferences And - Big enough to cover all objects • Practically speaking? • - Try different sizes of clusters

Optimality • Assume each player can evaluate B objects. • Given B, and the level of correlation among players, there is a minimum rate of error that can be achieved. • There is an algorithm that obtains a constant approximation of this error-rate, and each player evalutesO(B.Polylog(n)) objects.

Definition of Optimality • An algorithm is asymptotically optimal in terms of error rate, if for every player p we have: • |w(p)-v(p)| < min|P|>n/B-1cD(P) • Where c is a constant and D(P) is the diameter of set P. P can be any set of players with size at least n/B.

Rachid Guerraoui , EPFL

Rachid Guerraoui , EPFL

Presentation Transcript

Rachid BELKAHIA Moroccan Employers Federation (CGEM)

EPFL June 15, 2012

Eirini Koukovini-Platia EPFL, CERN

Stefania Trovati EPFL - CERN

EPFL EPFL International Relations Marius Burgat

Michael Gr ätzel , YouTube EPFL

EPFL Web

Swiss Experiment EPFL-LSIR Report

Benoit Salvant (EPFL / CERN Switzerland)

Summer Research Institute - EPFL

EPFL - LMTS N. Scheidegger

Communication Systems at EPFL

Eirini Koukovini-Platia EPFL, CERN

Summer Research Institute - EPFL

Mirko Steinle, EPFL and HUG Karl Aberer, EPFL Sarunas Girdzijauskas, EPFL Alexander Lamb, HUG

Rachid Nouicer Brookhaven National Laboratory (BNL)

Rachid Bouhia Bouhia@un