Outline

Privacy-EnhancedCollaborative FilteringPrivacy-Enhanced Personalization workshopJuly 25, 2005, Edinburgh, ScotlandShlomo Berkovsky1, Yaniv Eytani1, Tsvi Kuflik2, Francesco Ricci3 1Computer Science Department, University of Haifa, Israel 2Management Information Systems Department, University of Haifa, Israel3ITC-irst, Trento, Italy This work is supported by the collaboration project between the University of Haifa and ITC/irst

Outline • Collaborative Filtering (CF) • Distributed Privacy-Enhanced CF • Experimental Results • Open Questions

Collaborative Filtering (CF) • Based on assumption that people with similar taste prefer similar items • 3 basic stages: • Similarity computation (Pearson correlation, Cosine, Mean-Squared Difference) • Neighborhood formation (K-Nearest Neighbors) • Personalized prediction generation (Weighted average of neighbors’ ratings)

CF and Privacy • Service providers collect information about their users • Personalization raises the issue of privacy • Prior works: • [Canny] – P2P-based CF, users communities, encryption • [Polat&Du] – partitioning of CF data, data perturbation techniques

P1 P2 … … Pj … … Pm U1 U2 … … Ui … … Un Distributed Privacy-Enhanced CF • Combines the approaches of [Canny] and [Polat&Du] • Distributed and decentralized organization of users maintaining their personal profiles

Recommendation Generation • A user sends his profile and requests a recommendation • Individual users independently decide whether to respond to the request • The responder locally computes and sends similarity and his prediction • The requesting user collects the responses, builds the neighborhood and generates the personalized prediction

Privacy through Obfuscation • User profile might be revealed by malicious attacker through multiple requests • Privacy is increased by obfuscating parts of user profiles • Basic question: “What portion of user profile can be obfuscated while continuing to generate accurate recommendations?”

Experimental Setting • Part of Jester dataset of jokes ratings (-10 .. 10) • Dense dataset of 1024 users x 100 jokes • 3 obfuscation policies: • Default(x) – replace the ratings with x • Uniform – replace the ratings with random values chosen uniformly in the scope of ratings • Bell_curve – replace the ratings with random values chosen according to the distribution of real ratings in the dataset (bell curve distribution)

Experimental Results

Open Questions • Will these results be true for other datasets? • Sparse datasets, e.g. MovieLens • “Extreme” ratings, e.g. edges of the bell curve • Will our approach scale under an organized attack of multiple malicious users?

Open Questions • Can the profile of the active user be also obfuscated to increase privacy? • Can just a portion of user profile be communicated to decrease communication costs and to improve scalability?

Q & A Thank You!

Question • What happens if we simply give a random recommendation?

Outline

Outline

Presentation Transcript

Outline

Outline

Outline

Outline

Outline

Outline

Outline

outline

outline

OUTLINE

Outline

Outline

Outline

Outline

Outline

Outline

Outline

Outline

Outline:

Outline

Outline

OUTLINE: