From Stability to Differential Privacy

From Stability to Differential Privacy Abhradeep Guha Thakurta Yahoo! Labs, Sunnyvale

Thesis: Stable algorithms yield differentially private algorithms

Differential privacy: A short tutorial

Privacy in Machine Learning Systems Individuals

Privacy in Machine Learning Systems Individuals Trusted learning Algorithm

Privacy in Machine Learning Systems Individuals Summary statistics Trusted learning Algorithm Users • Classifiers • Clusters • Regression coefficients

Privacy in Machine Learning Systems Individuals Attacker Summary statistics Trusted learning Algorithm Users • Classifiers • Clusters • Regression coefficients

Privacy in Machine Learning Systems • Two conflicting goals: • Utility: Release accurate information • Privacy: Protect privacy of individual entries Users Learning Algorithm • Balancing the tradeoff is a difficult problem: • Netflix prize database attack [NS08] • Facebook advertisement system attack [Korolova11] • Amazon recommendation system attack [CKNFS11] • Data privacy is an active area of research: • Computer science, economics, statistics, biology, social sciences …

Differential Privacy [DMNS06, DKMMN06] • Intuition: • Adversary learns essentially the same thing irrespective of your presence or absence in the data set • andare called neighboring data sets • Require: Neighboring data sets induce close distribution on outputs M() M() M M Random coins Random coins Data set: Data set:

Differential Privacy [DMNS06, DKMMN06] • Definition: • A randomized algorithmMis-differentially private if • for all data sets andthat differ in one element • for all sets of answers

Semantics of Differential Privacy • Differential privacy is a condition on the algorithm • Guarantee is meaningful in the presence of any auxiliary information • Typically, think of privacy parameters: and, where = # of data samples • Composition: ’s and ‘s add up over multiple executions

Laplace Mechanism [DMNS06] Data set and be a function on Sensitivity: S() • Random variable sampled from Lap() • Output Theorem (Privacy): Algorithm is -differentially private

This Talk • Differential privacy via stability arguments: A meta-algorithm • Sample and aggregate framework and private model selection • Non-private sparse linear regression in high-dimensions • Private sparse linear regression with (nearly) optimal rate

Perturbation stability (a.k.a. zero local sensitivity)

Perturbation Stability Function Output

Perturbation Stability Function Output Stability of at : The output does not change on changing any one entry Equivalently, local sensitivity of at is zero

Distance to Instability Property • Definition:A function is stable at a data set if • For any data set, with, • Distance to instability: • Objective: Output while preserving differential privacy All data sets Unstable data sets Distance Stable data sets

Propose-Test-Release (PTR) framework [DL09, KRSY11, Smith T.’13]

A Meta-algorithm: Propose-Test-Release (PTR) Basic tool: Laplace mechanism • If, thenreturn , elsereturn Theorem: The algorithm is differentially private Theorem:If is -stable at , then w.p. the algorithm outputs

Sample and aggregate framework[NRS07, Smith11, Smith T.’13]

Sample and Aggregate Framework Data set Subsample Aggregator Algorithm Output

Sample and Aggregate Framework Theorem:If the aggregator is differentially private, then the overall framework is differentially private Assumption: Each entry appears in data blocks Proof: Each data entry affects only one data block

A differentially private aggregator using PTR framework [Smith T.’13]

An differentially Private Aggregator Assumption:discrete possible outputs Vote Vote Count

PTR+Report-Noisy-Max Aggregator Function: Candidate output with the maximum votes • If, thenreturn , elsereturn Observation: is the gap between the counts of highest and the second highest scoring model Observation: The algorithm is always computationally efficient

Analysis of the aggregator under subsampling stability [Smith T.’13]

Subsampling Stability Data set Random subsample with replacement Function Function w.p. Stability:

A Private Aggregator using Subsampling Stability Voting histogram (in expectation) • : Sample each entry from w.p. • Each entry of appears in • data blocks

PTR+Report-Noisy-MaxAggregator • : Sample each entry from w.p. • Each entry of appears in data blocks w.p. • If, thenreturn , elsereturn

A Private Aggregator using Subsampling Stability Theorem:Above algorithm is differentially private Notice: Utility guarantee does not depend on the number of candidate models Theorem:If ,then with probability at least , the true answeris output

Sparse linear regression in high-dimensions and the LASSO

Sparse Linear Regression in High-dimensions () • Data set:whereand • Assumption: Data generated by noisy linear system Feature vector Field noise • Data normalization: • is sub-Gaussian Parameter vector

Sparse Linear Regression in High-dimensions () • Data set:whereand • Assumption: Data generated by noisy linear system Response vector Design matrix Field noise Parameter vector

Sparse Linear Regression in High-dimensions () Response vector Design matrix Field noise • Sparsity: has non-zero entries • Bounded norm: for arbitrary small const. • Model selection problem: Find the non-zero coordinates of

Sparse Linear Regression in High-dimensions () Response vector Design matrix Field noise • Model selection: Non-zero coordinates (or the support) of • Solution: LASSO estimator [Tibshirani94,EFJT03,Wainwright06,CT07,ZY07,…]

Consistency of the LASSO Estimator Consistency conditions*[Wainwright06,ZY07]: • Support of the underlying parameter vector Incoherence Restricted Strong Convexity

Consistency of the LASSO Estimator Consistency conditions* [Wainwright06,ZY07]: • Support of the underlying parameter vector Restricted Strong Convexity Incoherence Theorem*:Under proper choice of and , support of the LASSO estimatorequals support of

Stochastic Consistency of the LASSO Consistency conditions* [Wainwright06,ZY07]: • Support of the underlying parameter vector Incoherence Restricted Strong Convexity Theorem [Wainwright06,ZY07]:If each data entry in, then the assumptions above are satisfied w.h.p.

Consistency conditions We show [Smith,T.’13] Perturbation stability Proxy conditions (Efficiently testable with privacy)

Interlude: A simple subsampling based private LASSO algorithm [Smith,T.’13]

Notion of Neighboring Data sets Response vector Design matrix Data set =

Notion of Neighboring Data sets Response vector Design matrix Data set = and are neighboring data sets

Recap: Subsampling Stability Data set Random subsample with replacement Function Function w.p. Stability:

Recap: PTR+Report-Noisy-MaxAggregator Assumption: All candidate models Vote Vote Count

Recap: PTR+Report-Noisy-MaxAggregator • : Sample each entry from w.p. • Each entry of appears in data blocks w.p. • Fix • If, thenreturn , elsereturn

Subsampling Stability of the LASSO Stochastic assumptions: Each data entry in Noise Response vector Design matrix Field noise Parameter vector

Subsampling Stability of the LASSO Notice the gap of Scale of Stochastic assumptions: Each data entry in Noise Theorem [Wainwright06,ZY07]:Under proper choice of and , support of the LASSO estimatorequals support of Theorem:Under proper choice of , and , the output of the aggregator equals support of

From Stability to Differential Privacy

From Stability to Differential Privacy

Presentation Transcript

Differential Privacy: Case Studies

Mechanism Design via Differential Privacy

Computational Differential Privacy

The Complexity of Differential Privacy

Differential Privacy SIGMOD 2012 Tutorial

Differential Privacy

Differential Privacy in US Census

Differential Privacy Under Fire

Differential Privacy

The Promise of Differential Privacy

Differential Privacy

Differential Privacy (2)

Differential Privacy

Differential Privacy

Differential Privacy

Boosting and Differential Privacy