490 likes | 818 Views
A Privacy-Preserving Framework for Personalized Social Recommendations Zach Jorgensen 1 and Ting Yu 1,2. 1 NC State University Raleigh, NC, USA. 2 Qatar Computing Research Institute Doha, Qatar. EDBT March 24-28, 2014 Athens, Greece. Motivation.
E N D
A Privacy-Preserving Framework for Personalized Social Recommendations Zach Jorgensen1 and Ting Yu1,2 1NC State University Raleigh, NC, USA 2Qatar Computing Research Institute Doha, Qatar EDBT March 24-28, 2014 Athens, Greece
Motivation Item Preferences • Social recommendation task – to predict items a user might like based on the items his/her friends like i4 i2 i3 i5 i1 Social Recommendation System recommendations Social Relations
Motivation Model: Top-nSocial Recommender The utility of recommending item ito user u • Input • Items • Users • Social Graph • Preference Graph • # of recs, n • For every item i • For every user u • Compute μ(i, u) • For every user u • Sort items by utility • Recommend top n items Output A personalized list of top n items (by utility), for each user
Motivation = utility of recommending item ito user u μ 1 if pref. exists 0 otherwise u, i u, v e.g., Common Neighbors Social Graph Social similarity measure
Motivation • Many existing structural similarity measures could be used [Survey: Lu & Zhou, 2011] • We considered • Common Neighbors • Adamic-Adar • Graph Distance • Katz
Motivation Two main privacy problems: • Protect privacy of user data from malicious service provider (i.e., the recommender) • Protect privacy of user data from malicious/curious users • Our focus: preventing disclosure of individual item preferences through the output
Motivation Simple attack on Common Neighbors. Bob listens to Bieber! Bob Alice
Motivation Adversary • Knowledge of all preferences except target edge • Observes all recommendations • Knowledge of the algorithm Goal: to deduce the presence/absence of a single preference edge (the target edge)
Motivation Differential Privacy [Dwork, 2006] • Provides strong, formal privacy guarantees • Informally: guarantees that recommendations will be (almost) the same with/without any one preference edge in the input
Motivation Related work: Machanavajjhalaet al. (VLDB 2011) • Task: For each node, recommend node with highest social similarity (Common Neighbors, Katz). • No distinction between user/items or between preferences/social edges. • Negative theoretical results.
Motivation • We assume that social graph is public • Often true in practice… …
Motivation • Main Contribution: a framework that enables differential privacy guarantees for preference edges • Demonstrate on real data sets that making accurate and private social recommendation is feasible
Outline • Motivation • Differential Privacy • Our Approach • Experimental Results • Conclusions
Differential Privacy A randomized algorithm A gives ε-differential privacy if for anyneighboring data sets D, D’ and any : X1 … Xi … Xn X1 … Xi … Xn Neighboring data sets differ ina single record [Dwork, 2006.]
Achieving Differential Privacy X1 … Xi … Xn noised Global sensitivity of A: 1 Theorem: satisfies ε-differential privacy typically Smaller ε = more noise/privacy
Properties of Differential Privacy • Sequential Composition DP Interface D ... ... -differential privacy • Parallel Composition ... ... ε-differentially private
Outline • Motivation • Differential Privacy • Our Approach • Simplifying observations • Naïve Approaches • Our Approach • Experimental Results • Conclusions
Simplifying Observations Iterations use disjoint inputs • For every item i • For every user u • Compute μ(i, u) • For every user u • Sort items by utility • Recommend top n items Post-processing Our focus: an ε-differentially private procedure for computing μ(i, u), for all users uand a giveni
Naïve Approaches Approach 1: Noise-on-Utilities • For each item i • For every user u • Compute • For each user u • Sort items by utility • Recommend top n items Satisfies ε-differential privacy, but… destroys accuracy!
Naïve Approaches Approach 2: Noise-on-Edges • Add Laplace noise independently to each edge, • Run the non-private algorithm with the resulting sanitized preference graph Example: let Noise will destroy accuracy!
Our Approach StrategyS c1 i u2 1 0 ClusterEdges 1 u1 0 u3 1 u4 1 u5 1 u6 0 u8 u7 c2 c3 For now, assume Srandomly assigns edges to clusters
Our Approach c1 i For each cluster, compute noisy average weight u2 1 0 1 u1 0 u3 1 u4 + noise 1 u5 1 u6 0 u8 u7 c2 c3 noise + noise
Our Approach c1 i Replace edge weights w/ noisy average of respective cluster u2 u1 u3 u4 + noise u5 u6 u8 u7 c2 c3 noise + noise
Our Approach i • For every item i • For each user u • Compute μ(i, u) • For each user u • Sort items by utility • Recommend top n items u2 u1 u3 u4 u5 u6 u8 u7
Our Approach: Rationale • Adding/removing a single preference edge affects one cluster average by at most 1/|ci| • Noise added to average for cluster is • The bigger the cluster, the smaller the noise Example: let ε = 0.1, |c| = 50 edges Intuition: the bigger the cluster, the less sensitive its average weight is to any one preference edge
Our Approach: Rationale • The catch – averaging introduces approximation error! • Need a better clustering strategy that will keep approx. error relatively low • Strategy must not leak privacy.
Our Approach: Clustering Strategy c0 Social Graph u2 u2 u1 u1 u3 u3 Community Detection u4 u4 u5 u5 c1 u6 u6 u8 u8 u7 u7 Cluster the users based on the naturalcommunitystructure of the public social graph.
Our Approach: Clustering Strategy c0 Social Graph u2 u1 u3 CommunityDetection u4 u5 c1 u6 u8 u7 For each item, derive clusters for preference edges based on the user clusters
Our Approach: Clustering Strategy c0 Social Graph u2 u1 u3 Community Detection u4 u5 c1 u6 u8 u7 Note: we only need to cluster the social graph once; resulting clusters used for all items
Our Approach: Clustering Strategy c0 Social Graph u2 u1 u3 Community Detection u4 u5 c1 u6 u8 u7 Key point: clustering based on the publicsocial graph does not leak privacy!
Our Approach: Clustering Strategy • Louvain Method [Blondel et al. 2008] • Greedy modularity maximization • Well-studied and known to produce good communities • Fast enough for graphs with millions of nodes • No parameters to tune
Outline • Motivation • Preliminaries • Our Approach • Experimental Results • Conclusions
Data Sets • 1,892 users • 17,632 items • Avg. user deg. = 13.4 (std. 17.3) • Avg. prefsper user = 48.7 (std. 6.9) • 137,372 users • 48,756 items • Avg. user deg. = 18.5 (std. 31.1) • Avg. prefsper user = 54.8 (std. 218.2) Publicly available: Last.fm <http://ir.ii.uam.es/hetrec2011/datasets> Flixster<http://www.sfu.ca/~sja25/datasets>
Measuring Accuracy • Normalized Discounted Cumulative Gain [Järvelin and Kekäläinen. 2002] • NDCG at n – measures quality of the private recommendations relative to non-private recommendations, taking rank and utility into account • Ranges from 0.0 to 1.0, with 1.0 meaning private recommender achieves ideal ranking • Average over all users in data set
Experiments: Last.fm Avg. Accuracy (NDCG at n=50) vs. Privacy Accuracy Privacy High Low
Experiments: Flixster Avg. NDCG at 50; 10,000 random users Accuracy Note: different y-axis scale Privacy High Low
Experiments: Naïve Approaches • Naïve approaches on Last.fm data set Katz Common Graph Adamic-Adar Nbrs. Dist. Katz Common Graph Adamic-Adar Nbrs. Dist.
Conclusions • Differential privacy guarantees for item preferences • Use clustering and averaging to trade Laplace noise for some approx. error • Clustering via the community structure of the social graph is a useful heuristic for clustering the edges without violating privacy • Personalized social recommendations can be both private and accurate
Accuracy Metric: NDCG • Normalized Discounted Cumulative Gain • items recommended to user u by private recommender; sorted by noisy utility • items recommended to user u by non-privaterecommender; sorted by trueutility • NDCG ranges from 0…1 • Averaged over all users in a data set
Social Similarity Measures • Adamic-Adar • Graph Distance • Katz small dam- ping factor paths of length l between u,v
Experiments: Last.fm NDCG at 10 NDCG at 100
Experiments: Flixster NDCG at 10 NDCG at 100
Comparison of approaches on Last.fm data set. Low Rank Mechanism (LRM) – Yuan et al. PVLDB’12 Group and Smooth (GS) – Kellaris & Papadopoulos. PVLDB’13
Relationship between user degree and accuracy, due to approx. error (Common Neighbors).