130 likes | 263 Views
Privacy-Enhanced Collaborative Filtering Privacy-Enhanced Personalization workshop July 25, 2005, Edinburgh, Scotland Shlomo Berkovsky 1 , Yaniv Eytani 1 , Tsvi Kuflik 2 , Francesco Ricci 3.
E N D
Privacy-EnhancedCollaborative FilteringPrivacy-Enhanced Personalization workshopJuly 25, 2005, Edinburgh, ScotlandShlomo Berkovsky1, Yaniv Eytani1, Tsvi Kuflik2, Francesco Ricci3 1Computer Science Department, University of Haifa, Israel 2Management Information Systems Department, University of Haifa, Israel3ITC-irst, Trento, Italy This work is supported by the collaboration project between the University of Haifa and ITC/irst
Outline • Collaborative Filtering (CF) • Distributed Privacy-Enhanced CF • Experimental Results • Open Questions
Collaborative Filtering (CF) • Based on assumption that people with similar taste prefer similar items • 3 basic stages: • Similarity computation (Pearson correlation, Cosine, Mean-Squared Difference) • Neighborhood formation (K-Nearest Neighbors) • Personalized prediction generation (Weighted average of neighbors’ ratings)
CF and Privacy • Service providers collect information about their users • Personalization raises the issue of privacy • Prior works: • [Canny] – P2P-based CF, users communities, encryption • [Polat&Du] – partitioning of CF data, data perturbation techniques
P1 P2 … … Pj … … Pm U1 U2 … … Ui … … Un Distributed Privacy-Enhanced CF • Combines the approaches of [Canny] and [Polat&Du] • Distributed and decentralized organization of users maintaining their personal profiles
Recommendation Generation • A user sends his profile and requests a recommendation • Individual users independently decide whether to respond to the request • The responder locally computes and sends similarity and his prediction • The requesting user collects the responses, builds the neighborhood and generates the personalized prediction
Privacy through Obfuscation • User profile might be revealed by malicious attacker through multiple requests • Privacy is increased by obfuscating parts of user profiles • Basic question: “What portion of user profile can be obfuscated while continuing to generate accurate recommendations?”
Experimental Setting • Part of Jester dataset of jokes ratings (-10 .. 10) • Dense dataset of 1024 users x 100 jokes • 3 obfuscation policies: • Default(x) – replace the ratings with x • Uniform – replace the ratings with random values chosen uniformly in the scope of ratings • Bell_curve – replace the ratings with random values chosen according to the distribution of real ratings in the dataset (bell curve distribution)
Open Questions • Will these results be true for other datasets? • Sparse datasets, e.g. MovieLens • “Extreme” ratings, e.g. edges of the bell curve • Will our approach scale under an organized attack of multiple malicious users?
Open Questions • Can the profile of the active user be also obfuscated to increase privacy? • Can just a portion of user profile be communicated to decrease communication costs and to improve scalability?
Q & A Thank You!
Question • What happens if we simply give a random recommendation?