1 / 40

The State of the Art

The State of the Art. Cynthia Dwork, Microsoft Research . TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A A A. Pre-Modern Cryptography. Propose. Break. Modern Cryptography. Propose STRONGER. Propose STRONGER Definition.

saima
Download Presentation

The State of the Art

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The State of the Art Cynthia Dwork, Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAAA

  2. Pre-Modern Cryptography Propose Break

  3. Modern Cryptography Propose STRONGER Propose STRONGERDefinition Propose Definition Algs algorithms satisfying definition BreakDefinition Break Definition [GoldwasserMicali82,GoldwasserMicaliRivest85]

  4. Modern Cryptography Propose STRONGER Propose STRONGERDefinition Propose Definition Algs algorithms satisfying definition BreakDefinition Break Definition [GoldwasserMicali82,GoldwasserMicaliRivest85]

  5. No Algorithm? Propose Definition ? Why?

  6. Provably No Algorithm? Propose WEAKER/DIFFDefinition Propose Definition Alg / ? ? BadDefinition

  7. ? The Privacy Dream • Census, financial, medical data; OTC drug purchases; social networks; MOOCs data; call and text records; energy consumption; loan, advertising, and applicant data; ad clicks product correlations, query logs,… C Original Database Sanitized data set

  8. Fundamental Law of Info Recovery “Overly accurate” estimates of “too many” statistics is blatantly non-private DinurNissim03; DworkMcSherryTalwar07; DworkYekhanin08, De12, MuthukrishnanNikolov12,…

  9. Anonymization(aka De-Identification) • Remove “personally identifying information” Name sex DOB zip symptoms previous admissions medicationsfamily history …

  10. Anonymization(aka De-Identification) • Remove “personally identifying information” Name sex DOB zip symptoms previous admissions medicationsfamily history …

  11. Anonymization(aka De-Identification) • Remove “personally identifying information” Name sex DOB zip symptoms previous admissions medicationsfamily history …

  12. Anonymization(aka De-Identification) • Remove “personally identifying information” ? Namesex DOB zip symptoms previous admissions medicationsfamily history …

  13. William Weld’s Medical Records HMO data voter registration data name ethnicity address ZIP visit date date reg. diagnosis birth date procedure party affiliation sex medication total charge last voted Sweeney97

  14. William Weld’s Medical Records HMO data voter registration data name ethnicity Can “name” by (zip, birth, sex) address ZIP visit date date reg. diagnosis birth date procedure party affiliation sex medication total charge last voted Sweeney97

  15. Anonymization(aka De-Identification) • Remove “personally identifying information” ? ? Name sex DOB zip symptoms previous admissions medicationsfamily history …

  16. Anonymization(aka De-Identification) • Remove “personally identifying information” ? ? Name sex DOBzipsymptoms previous admissions medicationsfamilyhistory …

  17. NarayananShmatikov08

  18. Can “name” by 3 (title, approx date) pairs NarayananShmatikov08

  19. Re-ID Not the Only Worry m-invariance t-closeness m,t l-diversity k-anonymity Algs algorithms satisfying definition BreakDefinition Break Definition SamaratiSweeney98, MakanvajjhalaGehrkeKiferVenkitasubramaniam06,XiaoTao07,LiLiVenkatasubmraminan07

  20. Culprit: Diverse Background Info Voter Weld: M, DOB, zip IMDb 2014 2012 2013

  21. Billing for Targeted Advertisements + + [Korolova12]

  22. Product Recommendations • X’s preferences influence Y’s experience • Combining evolving similar items lists with a little knowledge (from your blog) of what you bought, an adversary can infer purchases you did not choose to publicize Blog People who bought this also bought… CalandrinoKilzerNarayananFeltenShmatikov11

  23. GWAS Statistics … … SNP: Single Nucleotide (A,C,G,T) polymorphism T C … … T T SNP statistics of Case Group “Can” Test Case Group Membership using target’s DNA and HapMap Homer+08

  24. Culprit: Diverse Background Info Voter Weld: M, DOB, zip IMDb 2014 2012 HapMap 2013 Blog People who bought this… Billing

  25. How Should We Approach Privacy? • “Computer science got us into this mess, can computer science get us out of it?” (Sweeney, 2012)

  26. How Should We Approach Privacy? • “Computer science got us into this mess, can computer science get us out of it?” (Sweeney, 2012) • Complexity of this type requires a mathematically rigorous theory of privacy and its loss.

  27. How Should We Approach Privacy? • “Computer science got us into this mess, can computer science get us out of it?” (Sweeney, 2012) • Complexity of this type requires a mathematically rigorous theory of privacy and its loss. • We cannot discuss tradeoffs between privacy and statistical utility without a measure that captures cumulative harm over multiple uses. • Other fields -- economics, ethics, policy -- cannot be brought to bear without a “currency,” or measure of privacy, with which to work.

  28. Useful Databases that Teach • Database teaches that smoking causes cancer. • Smoker S’s insurance premiums rise. • Premiums rise even if S not in database! • Learning that smoking causes cancer is the whole point. • Smoker S enrolls in a smoking cessation program. • Differential privacy: limit harms to the teachings, not participation The outcome of any analysis is essentially equally likely, independent of whether any individual joins, or refrains from joining, the database.

  29. Useful Databases that Teach • Database teaches that smoking causes cancer. • Smoker S’s insurance premiums rise. • Premiums rise even if S not in database! • Learning that smoking causes cancer is the whole point. • Smoker S enrolls in a smoking cessation program. • Differential privacy: limit harms to the teachings, not participation The likelihood of any possible harm to ME is essentially independent of whether I join, or refrain from joining, the database.

  30. Useful Databases that Teach • Database teaches that smoking causes cancer. • Smoker S’s insurance premiums rise. • Premiums rise even if S not in database! • Learning that smoking causes cancer is the whole point. • Smoker S enrolls in a smoking cessation program. • Differential privacy: limit harms to the teachings, not participation High premiums, busted, purchases revealed to co-worker… Essentially equally likely when I’m in as when I’m out

  31. Differential Privacy [D.,McSherry,Nissim,Smith ‘06] gives -differential privacy if for all pairs of data sets differing in one element, and all subsets of possible outputs Randomness introduced by Randomness introduced by If a bad event is very unlikely when I’m not in dataset () then it is still very unlikely when I am () Impossible to know the actual probabilities of bad events. Can still control change in risk due to joining the database.

  32. Differential Privacy [D.,McSherry,Nissim,Smith ‘06] gives -differential privacy if for all pairs of data sets differing in one element, and all subsets of possible outputs “Privacy Loss” If a bad event is very unlikely when I’m not in dataset () then it is still very unlikely when I am () Impossible to know the actual probabilities of bad events. Can still control change in risk due to joining the database.

  33. Differential Privacy • Nuanced measure of privacy loss • Captures cumulative harm over multiple uses, multiple databases • Adversary’s background knowledge is irrelevant • Immune to re-identification attacks, etc. • “Programmable” • Construct complicated private analyses from simple private building blocks

  34. Recall: Fundamental Law “Overly accurate” estimates of “too many” statistics is blatantly non-private DinurNissim03; DworkMcSherryTalwar07; DworkYekhanin08, De12, MuthukrishnanNikolov12,…

  35. Answer Only Questions Asked q1 a1 q2 C a2 q3 a3 Database data analysts curator

  36. Intuition • Want to compute • Adding pulls • Add random noise to obscure difference vs

  37. Intuition Algorithms, geometry, learning theory, complexity theory, cryptography, statistics, machine learning, programming languages, verification, databases, economics,… • Want to compute • Adding pulls • Add random noise to obscure difference

  38. Not a Panacea • Fundamental Law of Information Recovery still holds

  39. Challenge: The Meaning of Loss • Sometimes the theory gives exactly the right answer • Large loss in differential privacy translates to “obvious” real life privacy breach, under circumstances known to be plausible • Other times? • Do all large losses translate to such realizable privacy breaches, or is the theory too pessimistic?

  40. Policy Recommendation • Publish all Epsilons! • Penalize when Combines motivation for data breach notification statutes and environmental laws requiring disclosures of toxic releases with an incentive to start using (minimal) differential privacy DworkMulligan14

More Related