1 / 68

From Stability to Differential Privacy

From Stability to Differential Privacy. Abhradeep Guha Thakurta Yahoo! Labs , Sunnyvale. Thesis: Stable algorithms yield differentially private algorithms. Differential privacy: A short tutorial. Privacy in Machine Learning Systems. Individuals.

Download Presentation

From Stability to Differential Privacy

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. From Stability to Differential Privacy Abhradeep Guha Thakurta Yahoo! Labs, Sunnyvale

  2. Thesis: Stable algorithms yield differentially private algorithms

  3. Differential privacy: A short tutorial

  4. Privacy in Machine Learning Systems Individuals

  5. Privacy in Machine Learning Systems Individuals Trusted learning Algorithm

  6. Privacy in Machine Learning Systems Individuals Summary statistics Trusted learning Algorithm Users • Classifiers • Clusters • Regression coefficients

  7. Privacy in Machine Learning Systems Individuals Attacker Summary statistics Trusted learning Algorithm Users • Classifiers • Clusters • Regression coefficients

  8. Privacy in Machine Learning Systems • Two conflicting goals: • Utility: Release accurate information • Privacy: Protect privacy of individual entries Users Learning Algorithm • Balancing the tradeoff is a difficult problem: • Netflix prize database attack [NS08] • Facebook advertisement system attack [Korolova11] • Amazon recommendation system attack [CKNFS11] • Data privacy is an active area of research: • Computer science, economics, statistics, biology, social sciences …

  9. Differential Privacy [DMNS06, DKMMN06] • Intuition: • Adversary learns essentially the same thing irrespective of your presence or absence in the data set • andare called neighboring data sets • Require: Neighboring data sets induce close distribution on outputs M() M() M M Random coins Random coins Data set: Data set:

  10. Differential Privacy [DMNS06, DKMMN06] • Definition: • A randomized algorithmMis-differentially private if • for all data sets andthat differ in one element • for all sets of answers

  11. Semantics of Differential Privacy • Differential privacy is a condition on the algorithm • Guarantee is meaningful in the presence of any auxiliary information • Typically, think of privacy parameters: and, where = # of data samples • Composition: ’s and ‘s add up over multiple executions

  12. Laplace Mechanism [DMNS06] Data set and be a function on Sensitivity: S() • Random variable sampled from Lap() • Output Theorem (Privacy): Algorithm is -differentially private

  13. This Talk • Differential privacy via stability arguments: A meta-algorithm • Sample and aggregate framework and private model selection • Non-private sparse linear regression in high-dimensions • Private sparse linear regression with (nearly) optimal rate

  14. Perturbation stability (a.k.a. zero local sensitivity)

  15. Perturbation Stability Function Output

  16. Perturbation Stability Function Output Stability of at : The output does not change on changing any one entry Equivalently, local sensitivity of at is zero

  17. Distance to Instability Property • Definition:A function is stable at a data set if • For any data set, with, • Distance to instability: • Objective: Output while preserving differential privacy All data sets Unstable data sets Distance Stable data sets

  18. Propose-Test-Release (PTR) framework [DL09, KRSY11, Smith T.’13]

  19. A Meta-algorithm: Propose-Test-Release (PTR) Basic tool: Laplace mechanism • If, thenreturn , elsereturn Theorem: The algorithm is differentially private Theorem:If is -stable at , then w.p. the algorithm outputs

  20. This Talk • Differential privacy via stability arguments: A meta-algorithm • Sample and aggregate framework and private model selection • Non-private sparse linear regression in high-dimensions • Private sparse linear regression with (nearly) optimal rate

  21. Sample and aggregate framework[NRS07, Smith11, Smith T.’13]

  22. Sample and Aggregate Framework Data set Subsample Aggregator Algorithm Output

  23. Sample and Aggregate Framework Theorem:If the aggregator is differentially private, then the overall framework is differentially private Assumption: Each entry appears in data blocks Proof: Each data entry affects only one data block

  24. A differentially private aggregator using PTR framework [Smith T.’13]

  25. An differentially Private Aggregator Assumption:discrete possible outputs Vote Vote Count

  26. PTR+Report-Noisy-Max Aggregator Function: Candidate output with the maximum votes • If, thenreturn , elsereturn Observation: is the gap between the counts of highest and the second highest scoring model Observation: The algorithm is always computationally efficient

  27. Analysis of the aggregator under subsampling stability [Smith T.’13]

  28. Subsampling Stability Data set Random subsample with replacement Function Function w.p. Stability:

  29. A Private Aggregator using Subsampling Stability Voting histogram (in expectation) • : Sample each entry from w.p. • Each entry of appears in • data blocks

  30. PTR+Report-Noisy-MaxAggregator • : Sample each entry from w.p. • Each entry of appears in data blocks w.p. • If, thenreturn , elsereturn

  31. A Private Aggregator using Subsampling Stability Theorem:Above algorithm is differentially private Notice: Utility guarantee does not depend on the number of candidate models Theorem:If ,then with probability at least , the true answeris output

  32. This Talk • Differential privacy via stability arguments: A meta-algorithm • Sample and aggregate framework and private model selection • Non-private sparse linear regression in high-dimensions • Private sparse linear regression with (nearly) optimal rate

  33. Sparse linear regression in high-dimensions and the LASSO

  34. Sparse Linear Regression in High-dimensions () • Data set:whereand • Assumption: Data generated by noisy linear system Feature vector Field noise • Data normalization: • is sub-Gaussian Parameter vector

  35. Sparse Linear Regression in High-dimensions () • Data set:whereand • Assumption: Data generated by noisy linear system Response vector Design matrix Field noise Parameter vector

  36. Sparse Linear Regression in High-dimensions () Response vector Design matrix Field noise • Sparsity: has non-zero entries • Bounded norm: for arbitrary small const. • Model selection problem: Find the non-zero coordinates of

  37. Sparse Linear Regression in High-dimensions () Response vector Design matrix Field noise • Model selection: Non-zero coordinates (or the support) of • Solution: LASSO estimator [Tibshirani94,EFJT03,Wainwright06,CT07,ZY07,…]

  38. Consistency of the LASSO Estimator Consistency conditions*[Wainwright06,ZY07]: • Support of the underlying parameter vector Incoherence Restricted Strong Convexity

  39. Consistency of the LASSO Estimator Consistency conditions* [Wainwright06,ZY07]: • Support of the underlying parameter vector Restricted Strong Convexity Incoherence Theorem*:Under proper choice of and , support of the LASSO estimatorequals support of

  40. Stochastic Consistency of the LASSO Consistency conditions* [Wainwright06,ZY07]: • Support of the underlying parameter vector Incoherence Restricted Strong Convexity Theorem [Wainwright06,ZY07]:If each data entry in, then the assumptions above are satisfied w.h.p.

  41. Consistency conditions We show [Smith,T.’13] Perturbation stability Proxy conditions (Efficiently testable with privacy)

  42. This Talk • Differential privacy via stability arguments: A meta-algorithm • Sample and aggregate framework and private model selection • Non-private sparse linear regression in high-dimensions • Private sparse linear regression with (nearly) optimal rate

  43. Interlude: A simple subsampling based private LASSO algorithm [Smith,T.’13]

  44. Notion of Neighboring Data sets Response vector Design matrix Data set =

  45. Notion of Neighboring Data sets Response vector Design matrix Data set = and are neighboring data sets

  46. Recap: Subsampling Stability Data set Random subsample with replacement Function Function w.p. Stability:

  47. Recap: PTR+Report-Noisy-MaxAggregator Assumption: All candidate models Vote Vote Count

  48. Recap: PTR+Report-Noisy-MaxAggregator • : Sample each entry from w.p. • Each entry of appears in data blocks w.p. • Fix • If, thenreturn , elsereturn

  49. Subsampling Stability of the LASSO Stochastic assumptions: Each data entry in Noise Response vector Design matrix Field noise Parameter vector

  50. Subsampling Stability of the LASSO Notice the gap of Scale of Stochastic assumptions: Each data entry in Noise Theorem [Wainwright06,ZY07]:Under proper choice of and , support of the LASSO estimatorequals support of Theorem:Under proper choice of , and , the output of the aggregator equals support of

More Related