600 likes | 910 Views
Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies. Ashwin Machanavajjhala ashwin @ cs.duke.edu Collaborators: Daniel Kifer (PSU), Bolin Ding (MSR), Xi He (Duke). Overview of the talk.
E N D
Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies Ashwin Machanavajjhala ashwin @ cs.duke.edu Collaborators: Daniel Kifer (PSU), Bolin Ding (MSR), Xi He (Duke) Summer @ Census, 8/15/2013
Overview of the talk • An inherent trade-off between privacy (confidentiality) of individuals and utility of statistical analyses over data collected from individuals. • Differential privacy has revolutionized how we reason about privacy • Nice tuning knob εfor trading off privacy and utility Summer @ Census, 8/15/2013
Overview of the talk • However, differential privacy only captures a small part of the privacy-utility trade-off space • No Free Lunch Theorem • Differentially private mechanisms may not ensure sufficient utility • Differentially private mechanisms may not ensure sufficient privacy Summer @ Census, 8/15/2013
Overview of the talk • I will present a new privacy framework that allows data publishers to more effectively tradeoff privacy for utility • Better control on what to keep secret and who the adversaries are • Can ensure more utility than differential privacy in many cases • Can ensure privacy where differential privacy fails Summer @ Census, 8/15/2013
Outline • Background • Differential privacy • No Free Lunch [Kifer-M SIGMOD ’11] • No `one privacy notion to rule them all’ • Pufferfish Privacy Framework [Kifer-M PODS’12] • Navigating the space of privacy definitions • Blowfish: Practical privacy using policies [ongoing work] Summer @ Census, 8/15/2013
Data Privacy Problem Utility: Privacy: No breach about any individual Server DB Individual 1 Individual 2 Individual 3 Individual N r1 r2 r3 rN Summer @ Census, 8/15/2013
Data Privacy in the real world Summer @ Census, 8/15/2013
Many definitions & several attacks • Linkage attack • Background knowledge attack • Minimality /Reconstruction attack • de Finetti attack • Composition attack Differential Privacy T-closeness K-Anonymity E-Privacy L-diversity Li et. al ICDE ‘07 Machanavajjhala et. al VLDB ‘09 Sweeney et al.IJUFKS ‘02 Machanavajjhala et. al TKDD ‘07 Dwork et. al ICALP ‘06 Summer @ Census, 8/15/2013
Differential Privacy For every pair of inputs that differ in one value For every output … D1 D2 O Adversary should not be able to distinguish between any D1 and D2 based on any O Pr[A(D1) = O] Pr[A(D2) = O] . log < ε (ε>0) Summer @ Census, 8/15/2013
Algorithms • No deterministic algorithm guarantees differential privacy. • Random sampling does not guarantee differential privacy. • Randomized response satisfies differential privacy. Summer @ Census, 8/15/2013
Laplace Mechanism DatabaseD Query q True answer q(D) q(D) + η Researcher Privacy depends on the λ parameter η h(η) α exp(-η / λ) Mean: 0, Variance: 2 λ2 Summer @ Census, 8/15/2013
Laplace Mechanism [Dwork et al., TCC 2006] Thm: If sensitivity of the query is S, then the following guarantees ε-differential privacy. λ = S/ε Sensitivity: Smallest number s.t. for any D,D’ differing in one entry, || q(D) – q(D’) ||1≤ S(q) Summer @ Census, 8/15/2013
Contingency tables Each tuple takes k=4 different values D Count( , ) Summer @ Census, 8/15/2013
Laplace Mechanism for Contingency Tables Mean : 8 Variance : 8/ε2 D Sensitivity = 2 Summer @ Census, 8/15/2013
Composition Property If algorithms A1, A2, …, Ak use independent randomness and each Ai satisfies εi-differential privacy, resp.Then, outputting all the answers together satisfies differential privacy with ε = ε1 + ε2 + … + εk Privacy Budget Summer @ Census, 8/15/2013
Differential Privacy • Privacy definition that is independent of the attacker’s prior knowledge. • Tolerates many attacks that other definitions are susceptible to. • Avoids composition attacks • Claimed to be tolerant against adversaries with arbitrary background knowledge. • Allows simple, efficient and useful privacy mechanisms • Used in LEHD’s OnTheMap [M et al ICDE ‘08] Summer @ Census, 8/15/2013
Outline • Background • Differential privacy • No Free Lunch [Kifer-M SIGMOD ’11] • No `one privacy notion to rule them all’ • Pufferfish Privacy Framework [Kifer-M PODS’12] • Navigating the space of privacy definitions • Blowfish: Practical privacy using policies[ongoing work] Summer @ Census, 8/15/2013
Differential Privacy & Utility • Differentially private mechanisms may not ensure sufficient utility for many applications. • Sparse Data: Integrated Mean Square Error due to Laplace mechanism canbe worse than returning a random contingency table for typical values of ε (around 1) • Social Networks [M et al PVLDB 2011] Summer @ Census, 8/15/2013
Differential Privacy & Privacy • Differentially private algorithms may not limit the ability of an adversary to learn sensitive information about individuals when records in the data are correlated • Correlations across individuals occur in many ways: • Social Networks • Data with pre-released constraints • Functional Dependencies Summer @ Census, 8/15/2013
Laplace Mechanism and Correlations Auxiliary marginals published for following reasons: Legal: 2002 Supreme Court case Utah v. Evans Contractual: Advertisers must know exact demographics at coarse granularities D Does Laplace mechanism still guarantee privacy? Summer @ Census, 8/15/2013
Laplace Mechanism and Correlations • 2 + Lap(2/ε) • 2 + Lap(2/ε) • 2 + Lap(2/ε) Count ( , ) = 8 + Lap(2/ε) Count ( , ) = 8 – Lap(2/ε) Count ( , ) = 8 – Lap(2/ε) Count ( , ) = 8 + Lap(2/ε) D Summer @ Census, 8/15/2013
Laplace Mechanism and Correlations • 2 + Lap(2/ε) • 2 + Lap(2/ε) • 2 + Lap(2/ε) Mean : 8 Variance : 8/ke2 D can reconstruct the table with high precision for large k Summer @ Census, 8/15/2013
No Free Lunch Theorem It is not possible to guarantee any utility in addition to privacy, without making assumptions about • the data generating distribution • the background knowledge available to an adversary [Kifer-M SIGMOD ‘11] [Dwork-Naor JPC ‘10] Summer @ Census, 8/15/2013
To sum up … • Differential privacy only captures a small part of the privacy-utility trade-off space • No Free Lunch Theorem • Differentially private mechanisms may not ensure sufficient privacy • Differentially private mechanisms may not ensure sufficient utility Summer @ Census, 8/15/2013
Outline • Background • Differential privacy • No Free Lunch [Kifer-M SIGMOD ’11] • No `one privacy notion to rule them all’ • Pufferfish Privacy Framework [Kifer-M PODS’12] • Navigating the space of privacy definitions • Blowfish: Practical privacy using policies[ongoing work] Summer @ Census, 8/15/2013
Pufferfish Framework Summer @ Census, 8/15/2013
Pufferfish Semantics • What is being kept secret? • Who are the adversaries? • How is information disclosure bounded? • (similar to epsilon in differential privacy) Summer @ Census, 8/15/2013
Sensitive Information • Secrets: S be a set of potentially sensitive statements • “individual j’s record is in the data, and j has Cancer” • “individual j’s record is not in the data” • Discriminative Pairs: Mutually exclusive pairs of secrets. • (“Bob is in the table”, “Bob is not in the table”) • (“Bob has cancer”, “Bob has diabetes”) Summer @ Census, 8/15/2013
Adversaries • We assume a Bayesian adversary who is can be completely characterized by his/her prior information about the data • We do not assume computational limits • Data Evolution Scenarios: set of all probability distributions that could have generated the data ( … think adversary’s prior). • No assumptions: All probability distributions over data instances are possible. • I.I.D.: Set of all f such that: P(data = {r1, r2, …, rk}) = f(r1) x f(r2) x…x f(rk) Summer @ Census, 8/15/2013
Information Disclosure • Mechanism M satisfies ε-Pufferfish(S, Spairs, D), if Summer @ Census, 8/15/2013
Pufferfish Semantic Guarantee Posterior odds of s vs s’ Prior odds of s vs s’ Summer @ Census, 8/15/2013
Pufferfish & Differential Privacy • Spairs: • six: record i takes the value x • Attackers should not be able to significantly distinguish between any two values from the domain for any individual record. Summer @ Census, 8/15/2013
Pufferfish & Differential Privacy • Data evolution: • For all θ = [ f1, f2, f3, …, fk ] • Adversary’s prior may be any distribution that makes records independent Summer @ Census, 8/15/2013
Pufferfish & Differential Privacy • Spairs: • six: record i takes the value x • Data evolution: • For all θ = [ f1, f2, f3, …, fk ] A mechanism M satisfies differential privacy if and only if it satisfies Pufferfish instantiated using Spairs and {θ} Summer @ Census, 8/15/2013
Summary of Pufferfish • A semantic approach to defining privacy • Enumerates the information that is secret and the set of adversaries. • Bounds the odds ratio of pairs of mutually exclusive secrets • Helps understand assumptions under which privacy is guaranteed • Differential privacy is one specific choice of secret pairs and adversaries • How should a data publisher use this framework? • Algorithms? Summer @ Census, 8/15/2013
Outline • Background • Differential privacy • No Free Lunch [Kifer-M SIGMOD ’11] • No `one privacy notion to rule them all’ • Pufferfish Privacy Framework [Kifer-M PODS’12] • Navigating the space of privacy definitions • Blowfish: Practical privacy using policies [ongoing work] Summer @ Census, 8/15/2013
Blowfish Privacy • A special class of Pufferfish instantiations Both pufferfish and blowfish are marine fish of the Tetraodontidae family Summer @ Census, 8/15/2013
Blowfish Privacy • A special class of Pufferfish instantiations • Extends differential privacy using policies • Specification of sensitive information • Allows more utility • Specification of publicly known constraints in the data • Ensures privacy in correlated data • Satisfies the composition property Summer @ Census, 8/15/2013
Blowfish Privacy • A special class of Pufferfish instantiations • Extends differential privacy using policies • Specification of sensitive information • Allows more utility • Specification of publicly known constraints in the data • Ensures privacy in correlated data • Satisfies the composition property Summer @ Census, 8/15/2013
Sensitive Information • Secrets: S be a set of potentially sensitive statements • “individual j’s record is in the data, and j has Cancer” • “individual j’s record is not in the data” • Discriminative Pairs: Mutually exclusive pairs of secrets. • (“Bob is in the table”, “Bob is not in the table”) • (“Bob has cancer”, “Bob has diabetes”) Summer @ Census, 8/15/2013
Sensitive information in Differential Privacy • Spairs: • six: record i takes the value x • Attackers should not be able to significantly distinguish between any two values from the domain for any individual record. Summer @ Census, 8/15/2013
Other notions of Sensitive Information • Medical Data • OK to infer whether individual is healthy or not. • E.g., (Bob is Healthy, Bob is Diabetes) is not a discriminative pair of secrets for any individual • Partitioned Sensitive Information: Summer @ Census, 8/15/2013
Other notions of Sensitive Information • Geospatial Data • Do not want the attacker to distinguish between “close-by” points in the space. • May distinguish between “far-away” points • Distance based Sensitive Information Summer @ Census, 8/15/2013
Generalization as a graph • Consider a graph G = (V, E), where V is the set of values that an individual’s record can take. • E encodes the set of discriminative pairs • Same for all records. Summer @ Census, 8/15/2013
Blowfish Privacy + “Policy of Secrets” • A mechanism M satisfy blowfish privacy wrt policy G if • For every set of outputs of the mechanism S • For every pair of datasets that differ in one record, with values x and y s.t. (x,y) ε E Summer @ Census, 8/15/2013
Blowfish Privacy + “Policy of Secrets” • A mechanism M satisfy blowfish privacy wrt policy G if • For every set of outputs of the mechanism S • For every pair of datasets that differ in one record, with values x and y s.t. (x,y) ε E • For any x and y in the domain, Shortest distance between x and y in G Summer @ Census, 8/15/2013
Blowfish Privacy + “Policy of Secrets” • A mechanism M satisfy blowfish privacy wrt policy G if • For every set of outputs of the mechanism S • For every pair of datasets that differ in one record, with values x and y s.t. (x,y) ε E • Adversary is allowed to distinguish between x and y that appear in different disconnected components in G Summer @ Census, 8/15/2013
Algorithms for Blowfish • Consider an ordered 1-D attribute • Dom = {x1,x2,x3,…,xd} • E.g., ranges of Age, Salary, etc. • Suppose our policy is: Adversary should not distinguish whether an individual’s value is xj or xj+1 . x1 x2 x3 xd Summer @ Census, 8/15/2013
Algorithms for Blowfish • Suppose we want to release histogram privately • Number of individuals in each age range • Any differentially private algorithm also satisfies blowfish • Can use Laplace mechanism (with sensitivity 2) C(x1) C(x3) C(xd) x1 x2 x3 xd Summer @ Census, 8/15/2013
Ordered Mechanism • We can answer a different set of queries to get a different private estimator for the histogram. Sd … S3 S2 S1 C(x1) C(x3) C(xd) x1 x2 x3 xd Summer @ Census, 8/15/2013