1 / 17

OptRR: Optimizing Randomized Response Schemes For Privacy-Preserving Data Mining

OptRR: Optimizing Randomized Response Schemes For Privacy-Preserving Data Mining. Zhengli Huang and Wenliang (Kevin) Du Department of EECS Syracuse University. Data Mining/Analysis. Data cannot be published directly because of privacy concern. Background: Randomized Response.

louie
Download Presentation

OptRR: Optimizing Randomized Response Schemes For Privacy-Preserving Data Mining

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. OptRR: Optimizing Randomized Response Schemes For Privacy-Preserving Data Mining Zhengli Huang and Wenliang (Kevin) Du Department of EECS Syracuse University

  2. Data Mining/Analysis Data cannot be published directly because of privacy concern

  3. Background:Randomized Response The true answer is “Yes” Do you smoke? Yes Head Biased coin: No Tail

  4. RR for Categorical Data Si Si+1 Si+2 Si+3 q1 q2 q3 q4 True Value: Si M

  5. A Generalization • Several RR Matrices have been proposed • [Warner 65] • [R.Agrawal et al. 05], [S. Agrawal et al. 05] • RR Matrix can be arbitrary • Can we find optimal RR matrices?

  6. What is an optimal matrix? • Which of the following is better? Privacy:M2is better Utility:M1 is better So, what is an optimal matrix?

  7. Optimal RR Matrix • An RR matrix M is optimal if no other RR matrix’s privacy and utility are both better than M (i, e, no other matrix dominates M). • Privacy Quantification • Utility Quantification • A number of privacy and utility metrics have been proposed. We use the following: • Privacy: how accurately one can estimate individual info. • Utility: how accurately we can estimate aggregate info.

  8. Optimization Methods • Approach 1: Weighted sum: w1 Privacy + w2 Utility • Approach 2 • Fix Privacy, find M with the optimal Utility. • Fix Utility, find M with the optimal Privacy. • Challenge: Difficult to generate M with a fixed privacy or utility. • Our Approach: Multi-Objective Optimization

  9. Evolutionary Multi-ObjectiveOptimization (EMOO) • Genetic algorithms has difficulty of dealing with multiple objectives. • We use the EMOO algorithm • We use SPEA2.

  10. Our SPEA2-based algorithm

  11. EMOO • Evolution • Crossover • Mutation • Fitness Assignment (SPEA2) • Strength Value S(M): the number of matrix dominated by M. • Raw fitness F’(M): the sum of the strength of the RR matrices that dominate M. The lower the better. • Density d(M): discriminate the matrices with the same fitness.

  12. Diversity Worse M5 M4 M3 M2 Utility M1 Better Privacy

  13. The Output of Optimization • Pareto Fronts • The optimal set is often plotted in the objective space and the plot is called the Pareto front. Utility (error) 0 Privacy

  14. Experiments For normal distribution with different δ

  15. For First attribute of Adult data

  16. For normal distribution (δ=0.75)

  17. Summary • We use an evolutionary multi-objective optimization technique to search for optimal RR matrices. • The evaluation shows that our scheme achieves better performance than the existing RR schemes.

More Related