170 likes | 319 Views
OptRR: Optimizing Randomized Response Schemes For Privacy-Preserving Data Mining. Zhengli Huang and Wenliang (Kevin) Du Department of EECS Syracuse University. Data Mining/Analysis. Data cannot be published directly because of privacy concern. Background: Randomized Response.
E N D
OptRR: Optimizing Randomized Response Schemes For Privacy-Preserving Data Mining Zhengli Huang and Wenliang (Kevin) Du Department of EECS Syracuse University
Data Mining/Analysis Data cannot be published directly because of privacy concern
Background:Randomized Response The true answer is “Yes” Do you smoke? Yes Head Biased coin: No Tail
RR for Categorical Data Si Si+1 Si+2 Si+3 q1 q2 q3 q4 True Value: Si M
A Generalization • Several RR Matrices have been proposed • [Warner 65] • [R.Agrawal et al. 05], [S. Agrawal et al. 05] • RR Matrix can be arbitrary • Can we find optimal RR matrices?
What is an optimal matrix? • Which of the following is better? Privacy:M2is better Utility:M1 is better So, what is an optimal matrix?
Optimal RR Matrix • An RR matrix M is optimal if no other RR matrix’s privacy and utility are both better than M (i, e, no other matrix dominates M). • Privacy Quantification • Utility Quantification • A number of privacy and utility metrics have been proposed. We use the following: • Privacy: how accurately one can estimate individual info. • Utility: how accurately we can estimate aggregate info.
Optimization Methods • Approach 1: Weighted sum: w1 Privacy + w2 Utility • Approach 2 • Fix Privacy, find M with the optimal Utility. • Fix Utility, find M with the optimal Privacy. • Challenge: Difficult to generate M with a fixed privacy or utility. • Our Approach: Multi-Objective Optimization
Evolutionary Multi-ObjectiveOptimization (EMOO) • Genetic algorithms has difficulty of dealing with multiple objectives. • We use the EMOO algorithm • We use SPEA2.
EMOO • Evolution • Crossover • Mutation • Fitness Assignment (SPEA2) • Strength Value S(M): the number of matrix dominated by M. • Raw fitness F’(M): the sum of the strength of the RR matrices that dominate M. The lower the better. • Density d(M): discriminate the matrices with the same fitness.
Diversity Worse M5 M4 M3 M2 Utility M1 Better Privacy
The Output of Optimization • Pareto Fronts • The optimal set is often plotted in the objective space and the plot is called the Pareto front. Utility (error) 0 Privacy
Experiments For normal distribution with different δ
Summary • We use an evolutionary multi-objective optimization technique to search for optimal RR matrices. • The evaluation shows that our scheme achieves better performance than the existing RR schemes.