1 / 39

Foundations of Privacy Lecture 7

Foundations of Privacy Lecture 7. Lecturer: Moni Naor. Pr [response]. Z. Z. Z. Bad Responses:. ( , d ) - Differential Privacy . Sanitizer M gives (, d ) - differential privacy if: for all adjacent D 1 and D 2 , and all A µ range(M):

aurora
Download Presentation

Foundations of Privacy Lecture 7

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Foundations of PrivacyLecture 7 Lecturer:Moni Naor

  2. Pr [response] Z Z Z Bad Responses: (, d)- Differential Privacy Sanitizer Mgives (, d) -differential privacy if: for all adjacentD1and D2, and all Aµrange(M): Pr[M(D1) 2A] ≤ ePr[M(D2) 2A]+ d ratio bounded This course: dnegligible Typical setting and negligible

  3. Example: NO Differential Privacy U set of(name,tag 2{0,1})tuples One counting query: #of participants with tag=1 Sanitizer A: choose and release a few random tags Bad event T: Only my tag is 1, my tag released PrA[A(D+I)2T] ≥ 1/n PrA[A(D-I) 2 T] = 0 • Not ε diff private for any ε! • It is (0,1/n) Differential Private PrA[A(D+I) 2 T] ≤ eε≈1+ε e-ε≤ PrA[A(D-I) 2 T]

  4. Counting Queries Databasexof sizen Counting-queries Qis a setof predicates q: U {0,1} Query: how manyx participants satisfy q? Relaxed accuracy: answer query withinαadditive errorw.h.p Not so bad:someerror anyway inherent in statistical analysis Queryq nindividuals, each contributing a single point in U U Sometimes talk about fraction

  5. Bounds on Achievable Privacy Bounds on the • Accuracy • The responses from the mechanism to all queries are assured to be within α except with probability  • Number of queries t for which we can receive accurate answers • The privacy parameter εfor which ε differential privacy is achievable • Or (ε,) differential privacy is achievable

  6. Composition: t-Fold Suppose we are going to apply a DP mechanism t times. • Perhaps on different databases Want: the combined outcome is differentially private • A value b2{0,1} is chosen • In each of the t rounds: • adversary A picks two adjacent databases D0iand D1iand an -DP mechanism Mi • receives result ziof the -DP mechanism Mi on Dbi • Want to argue: A‘s view is within ’ for both values of b • A‘s view: (z1, z2, …, zt)plus randomness used.

  7. Adversary’s view • A’s view: randomness +(z1, z2, …, zt) • Distribution with b: Vb A D01, D11 D02, D12 … D0t, D1t • M1(Db1) • M2(Db2) • Mt(Dbt) M1 M2 Mt z2 zt z1

  8. Differential Privacy: Composition Last week: • If all mechanisms Miare -DP, then for any view the probability that A gets the view when b=0 and when b=1 are with et • treleases , each -DP, are t¢ -DP • Today: • treleases, each -DP, are (√t+t 2,)-DP (roughly) Therefore results for a single query translate to results on several queries

  9. Privacy Loss as a Random Walk potentially dangerous rounds Number of Steps t Privacy loss 1 -1 1 1 -1 1 1 -1 grows as

  10. The Exponential Mechanism [McSherryTalwar] A general mechanism that yields • Differential privacy • May yield utility/approximation • Is defined and evaluated by considering all possible answers The definition does not yield an efficient way of evaluating it Application/original motivation: Approximate truthfulness of auctions • Collusion resistance • Compatibility

  11. Side bar: Digital Goods Auction • Some product with 0 cost of production • n individuals with valuation v1, v2, … vn • Auctioneer wants to maximize profit Key to truthfulness: what you say should not affect what you pay • What about approximate truthfulness?

  12. Example of the Exponential Mechanism • Data: xi= website visited by student i today • Range: Y = {website names} • For each name y, let q(y, X) = #{i : xi = y} Goal: output the most frequently visited site • Procedure: Given X, Output website ywith probability proportional toeq(y,X) • Popular sites exponentially more likely than rare ones Website scores don’t change too quickly Size of subset

  13. Setting • For input D 2Unwant to find r2R • Base measure  on R - usually uniform • Score function w:Un £R  R assigns any pair (D,r) a real value • Want to maximize it (approximately) The exponential mechanism • Assign output r2R with probability proportional to ew(D,r)(r) Normalizing factor rew(D,r)(r) The reals

  14. The exponential mechanism is private • Let  = maxD,D’,r |w(D,r)-w(D’,r)| Claim: The exponential mechanism yields a 2¢¢ differentially private solution For adjacent databases D and D’ and for all possible outputs r2R • Prob[output = r when input is D] = ew(D,r)(r)/rew(D,r)(r) • Prob[output = rwhen input is D’] = ew(D’,r)(r)/rew(D’,r)(r) sensitivity adjacent Ratio is bounded by e e

  15. Laplace Noise as Exponential Mechanism • On query q:Un→R let w(D,r) = -|q(D)-r| • Prob noise = y e-y /2 ye-y = /2 e-y Laplace distribution Y=Lap(b) has density function Pr[Y=y] =1/2b e-|y|/b y 0 -4 -3 -2 -1 1 2 3 4 5

  16. Any Differentially Private Mechanism is an instance of the Exponential Mechanism • Let M be a differentially private mechanism Take w(D,r) to be log(Prob[M(D) =r]) Remaining issue: Accuracy

  17. Private Ranking • Each element i2 {1, … n} has a real valued score SD(i)based on a data set D. • Goal: Output k elements with highest scores. • Privacy • Data set D consists of n entries in domain D. • Differential privacy: Protects privacy of entries in D. • Condition: Insensitive Scores • for any element i, for any data sets D andD’ that differ in one entry: |SD(i)- SD’(i)| · 1

  18. Approximate ranking • Let Sk be the kth highest score in on data set D. • An output list is  -useful if: Soundness: No element in the output has score ·Sk -  Completeness: Every element with score ¸Sk +  is in the output. Score·Sk -  Sk + ·Score Sk - ·Score·Sk + 

  19. Two Approaches Each input affects all scores • Score perturbation • Perturb the scores of the elements with noise • Pick the top k elements in terms of noisy scores. • Fast and simple implementation Question: what sort of noise should be added? What sort of guarantees? • Exponential sampling • Run the exponential mechanism k times. • more complicated and slower implementation What sort of guarantees?

  20. Exponential Mechanism: Simple Example (almost free) private lunch Database of n individuals, lunch options {1…k},each individual likes or dislikes each option (1 or 0) Goal: output a lunch option that many like For each lunch option j2[k], ℓ(j) is # of individuals who like j Exponential Mechanism:Output j with probability eεℓ(j) Actual probability: eεℓ(j)/(∑ieεℓ(i)) Normalizer

  21. The Net Mechanism • Idea: limit the number of possible outputs • Want |R| to be small • Why is it good? • The good (accurate) output has to compete with a few possible outputs • If there is a guarantee that there is at least one good output, then the total weight of the bad outputs is limited

  22.  Nets A collection N of databases is called an -net of databases for a class of queries C if: • for all possible databases x there exists a y2Nsuch that Maxq2C |q(x) –q(y)| ·  If we use the closest member of N instead of the real database lose at most  In terms of worst query

  23. The Net Mechanism For a class of queries C, privacy  and accuracy , on data base x • Let N be an -net for the class of queries C • Let w(x,y) = - Maxq2C|q(x) –q(y)| • Sample and output according to exponential mechanism with x, w, and R=N • For y2N: Prob[y] proportional to ew(x,y)

  24. Privacy and Utility Claims Privacy: the net mechanism is ¢ differentially private Utility: the net mechanism is (2, ) accurate for any  and  such that • ¸2/ ¢ log (|N|/) Proof: • there is at least one good solution: gets weight at least e- • there are at most |N| (bad) outputs: each get weight at most e-2 • Use the Union Bound Accuracy less than 2

  25. Synthetic DB: Output is a DB ? answer 1 answer 3 answer 2 Sanitizer query 1,query 2,. . . Database Synthetic DB: output is always a DB • Of entries from same universe U • User reconstructs answers to queries by evaluating the query on output DB Software and people compatible Consistent answers

  26. Counting Queries DatabaseDof sizen • Queries with low sensitivity Counting-queries Cis a setof predicates c: U  {0,1} Query: how many D participants satisfy c ? Relaxed accuracy: answer query withinαadditive errorw.h.p Not so bad:error anyway inherent in statistical analysis Assume all queries given in advance Query c U Non-interactive

  27. -Net For Counting Queries If we want to answer many counting queriesCwith differential privacy: Sufficient to come up with an -Net for C Resulting accuracy max{, log (|N|/)/ } Claim: consider the set N consisting of all databases of size m where m = log|C|/2 Consider each element in the set to have weight n/m Then N is an -Net for any collection C of counting queries Error is Õ(n2/3 log|C|)

  28. Remarkable Hope for rich private analysis of small DBs! • Quantitative: #queries >> DB size, • Qualitative: output of sanitizer -synthetic DB-output is a DB itself

  29. The BLR Algorithm For DBs F and Ddist(F,D) = maxq2C |q(F) – q(D)| Intuition: far away DBs get smaller probability Blum Ligett Roth 2008 Algorithm on input DB D: Sample from a distribution on DBs of size m: (m < n)DBF gets picked w.p. /e-ε·dist(F,D)

  30. The BLR Algorithm Idea: • In general: Do not use large DB • Sample and answer accordingly • DB of size m guaranteeing hitting each query with sufficient accuracy

  31. The BLR Algorithm: Error Õ(n2/3 log|C|) Goodness Lemma: there exists Fgood of size m=Õ((n\α)2·log|C|) s.t. dist(Fgood,D) ≤α Proof: construct member of by Fgoodtaking m random samples from U Algorithm on input DB D: Sample from a distribution on DBs of size m: (m < n)DBF gets picked w.p. /e-ε·dist(F,D)

  32. The BLR Algorithm: Error Õ(n2/3 log|C|) Goodness Lemma: there exists Fgood of size m=Õ((n\α)2·log|C|) s.t. dist(Fgood,D) ≤α Pr [Fgood] ~ e-εα For any Fbad with dist2α,Pr [Fbad] ~ e-2εα Union bound: ∑bad DB FbadPr [Fbad]~ |U|me-2εα For α=Õ(n2/3log|C|), Pr [Fgood] >> ∑ Pr [Fbad] Algorithm on input DB D: Sample from a distribution on DBs of size m: (m < n)DBF gets picked w.p. /e-ε·dist(F,D)

  33. The BLR Algorithm: 2ε-Privacy For adjacent D,D’ for every F|dist(F,D) – dist(F,D’)| ≤ 1 Probability ofFby D:e-ε·dist(F,D)/∑G of size m e-ε·dist(G,D) Probability of F by D’:numerator and denominator can change by eε-factor ) 2ε-privacy Algorithm on input DB D: Sample from a distribution on DBs of size m: (m < n) DB Fgets picked w.p. / e-ε·dist(F,D)

  34. The BLR Algorithm: Running Time Generating the distribution by enumeration:Need to enumerate every size-m database,where m= Õ((n\α)2·log|C|) Running time ≈|U|Õ((n\α)2·log|c|) Algorithm on input DB D: Sample from a distribution on DBs of size m: (m < n) DB Fgets picked w.p. /e-ε·dist(F,D)

  35. Conclusion Offline algorithm, 2ε-Differential Privacy for anyset C of counting queries Error α is Õ(n2/3 log|C|/ε) Super-poly running time: |U|Õ((n\α)2·log|C|)

  36. Maintaining State Queryq State = Distribution D

  37. The Multiplicative Weights Algorithm • Powerful tool in algorithms design • Learn a Probability Distribution iteratively • In each round: • either current distribution is good • or get a lot of information on distribution • Update distribution

  38. The PMW Algorithm This is the state. Is completely public! Maintain a distribution D on universe U Initialize Dto be uniform on U Repeat up to ktimes • Set ÃT + Lap() • Repeat while no update occurs: • Receive query q 2Q • Let = x(q) + Lap() • Test: If |q(D)- | ·outputq(D). • Else (update): • Output • Update D[i] /D[i] e±T/4q[i]and re-weight. Algorithm fails if more than k updates The true value the plus or minus are according to the sign of the error

  39. Overview: Privacy Analysis For the query family Q = {0,1}U for (,d,)and t the PMW mechanism is • (,d) –differentially private • (,) accurate for up to t queries where  = Õ(1/( n)1/2) • State = Distribution is privacy preserving for individuals (but not for queries) accuracy Log dependency on |U|, d, and t

More Related