200 likes | 320 Views
When Random Sampling Preserves Privacy. Kamalika Chaudhuri U.C.Berkeley . Nina Mishra U.Virginia . The Problem. Sanitizer. Sanitized Database. Database. Setting: Table : Set of rows Sanitizer: Releases each row with probability p
E N D
When Random Sampling Preserves Privacy Kamalika Chaudhuri U.C.Berkeley Nina Mishra U.Virginia
The Problem Sanitizer Sanitized Database Database • Setting: • Table : Set of rows • Sanitizer: Releases each row with probability p • What are the conditions under which this sanitizer preserves privacy?
Search Data • AOL released user search data: • Replaced usernames with random ids
Search Data Kamalika Cynthia Nina “Berkeley restaurants” “Low degree spanning trees” “Tickets to India” “Privacy sampling” “Airfare Santa Barbara” “Traffic on 101N” “Restaurants Mountain View” “Rank Aggregation” “Memory bound functions” “Crypto registration” “Falafel Charlottesville” “Query Auditing” “Clustering streaming” “Tickets to SFO” “Privacy sampling”
U.S. Census Data • Random sample of preprocessed data: • Removing unique values • Merging cells with less than a threshold number of individuals
Privacy Definition [DMNS06,…] • -Indistinguishability • Two tables T, T’, differ by a single row • S : Output of the sanitizer • Pr[S | T] ≤ (1 + ) Pr[S | T’] S T T’
An Example • Cannot always get -Indistinguishability with random sampling • T : n rows with value 0 • T’ : n-1 rows with value 0, 1 row with value 1 • S : 1 row with value 1, s – 1 rows with value 0 S T T’
Privacy Definition[DKMMiNa06,BDMN05] • (,)-Indistinguishability : • Two tables T, T’, differ by a single row • S : Output of the sanitizer • With probability at least 1 - , • Pr[S | T] ≤ (1 + ) Pr[S | T’] S T T’
An Example • Cannot always get (,)-Indistinguishability for all tables • A table where all rows have unique values S T T’
When does Random Sampling preserve Privacy? • Parameters: • (, )-indistinguishability • k : number of distinct values in T • t : number of values which occur at most log(k/)/ times in T • Theorem: This can be guaranteed if • p < (if t = 0) • p < Õ( /t)
log(k/)/ log(k/)/p Number of rows with value v Classification of Values For (, )-indistinguishability: Rare Value Infrequent Value Common Value
Rare Values • If a rare value v is observed in a random sample, • Pr[S|T’]>(1 + /log(k/d)) Pr[S|T] S T T’
Rare Infrequent Common log(k/)/ log(k/)/p Common Values • For a common value v, • Pr[S|T] ≈ Pr[S|T’] • Typically, the number of rows with a common value is close to its expectation S T T’
Rare Infrequent Common log(k/)/ log(k/)/p Infrequent Values • For an infrequent value v, • Pr[S|T] ≈ Pr[S|T’] • Typically, the number of rows with an infrequent value is at most log(k/) away from its expected value S T T’
Properties of a Good Sample • A sample S is -indistinguishable if: • No rare values • The number of rows with common value v is within a constant factor of expectation • The number of rows with infrequent value v is at most an additive O(log(k/)) more than its expected value
When does Random Sampling preserve Privacy? • Such a sample occurs with probability at least 1 - if • p < (if t=0) • p < Õ( /t)
Utility of Random Sampling • Assuming no rare values: • Error in the frequency of each value : additive 1/√n • [DMNS06] Estimates histogram with an additive error of 1/n in each frequency • Sampling may give a compact representation of the histogram
Conclusions • Random sampling preserves privacy only when there are few rare values • With rare values, the probability of failure can be high • = (1/n) as opposed to 1/2^n [DKMMiNa06, BDMN05] • Error in estimating the frequency of each value can be high • Additive 1/√n as opposed to 1/n of [DMNS06]
The Problem • What are the conditions under which this sanitizer preserves privacy?