1 / 14

Setting the Stage:   How De-Identification Came into U.S. Law, and Why the Debate Matters Today

Explore the history and significance of de-identification in U.S. law, its impact on data privacy, and the ongoing debate surrounding its effectiveness. Learn about different threat models and the potential risks and benefits of anonymization.

felicial
Download Presentation

Setting the Stage:   How De-Identification Came into U.S. Law, and Why the Debate Matters Today

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Setting the Stage:  How De-Identification Came into U.S. Law, and Why the Debate Matters Today Professor Peter Swire Ohio State University/Future of Privacy Forum FPF Conference on DeIdentification National Press Club December 5, 2011

  2. Overview • U.S. history: Census, federal agency statistics, & HIPAA • Why Deidentification (DeID) matters today • The debate – it works or it doesn’t • Three threat models • Analogy to law enforcement • Big picture – useful for many tasks, even with the limits shown by scientists

  3. Census, Statistics & DeID • Many years of Census experience • Highly useful data • Deidentified • Periodic opposition to mandatory reporting • Needed strong confidentiality promises • Suppress small cell size • Only home in a census tract • Fuzz data • Strict rules against release even for national security purposes

  4. Federal Agency Statistics • Codification in Confidential Information Protection & Statistical Efficiency Act of 2002 (CIPSEA) • Good history by Sylvester & Lohr • Basic rule: if collect data for statistical purposes, use only for statistical purposes, don’t ReID • Funny thing: same culture & practice for years in private sector polling (Gallup-style) and market research • Many years of practice here • Perhaps a basic guideline going forward?

  5. HIPAA • 1999-2000 regs informed by Sweeney research • Safe harbor – delete a lot of specified data fields • Expert (I pushed for this) – where statistical basis, can achieve DeID based on risk, not safe harbor • Data use agreements – release for research, with enforceable promise not to ReID • In short: • If scrubbed enough, can release publicly • If scrubbed less, then enforceable promise not to ReID

  6. Why It Matters Today • Now data mining far beyond specialized researchers • The Internet (commercial since only 1993) gives me access to data • Storage & processing on my laptop > mainframe of 25 years ago • Search is way better • The erosion of practical obscurity – “they” really may figure out who “we” are

  7. The Debate is Joined • Ohm (and others) draw on Sweeney-type research • DeID likely to lead to ReID • Yakowitz (and others) respond • Benefits of public data enormous • Practical risk/harm from ReID low • Anonymization creates huge risks or low risks? • Worth doing anonymization/DeID at all? • Today’s conference to shed light on this …

  8. Threat Models – Which Attackers? • Three types of attackers on “anonymized” data: • Insiders “peeping” • Outside hackers intruding • The public who doesn’t get into the database • DeID often effective for first two • Ohm/Yakowitz debate primarily on the third

  9. Insiders Peeping • Swire 2009 Peeping article, at peterswire.net • Threat: employee or employee of sub-contractor sees the data and “peeps” • Sees celebrity information - Clooney • Sees information about friend/family/ex • Sees information to create harm (ID theft, blackmail) • Anonymization useful part of anti-peeping strategy • Employee doesn’t search or stumble upon Clooney • Employee may lack tools to do Sweeney-type analysis • Audit logs catch employees who try • Give employees access to statistical data, not PII

  10. Outside Hackers • Hacker may intrude for a short while • Anonymization may prevent “ah hah” – Clooney • Hacker may download database • If so, then hacker becomes similar to the public • May or may not be good at Sweeney-type tricks • May be focused on specific types of information, and not try to ReID • Less-than-perfect DeID may substantially reduce incidence of ReID

  11. Re-ID by “The Public” • So, masking may help against some threats • The debate, though, is whether “the public” (i.e., the experts) can ReID • Sweeney & other research provides startling & important results of ReID • Can everything be ReIdentified?

  12. ReID & 2 Famous Studies • Date of birth, zip, & gender -> 80%+ unique • Yes • BUT, DOB is off-the-charts different • Gender – splits population in half • DOB = 366 (days) x 80 (years) = over 25,000 cells • Moral – DOB ridiculously strong to ReID • Netflix and can Re-ID over 60% of movie reviews • BUT, takes known ImDB reviewers and matches to Netflix • Can ReID a lot, but not a big effect

  13. Law Enforcement Analogy • So, is ReID generally easy or hard, useful or useless? • Consider cop with a bunch of clues (male, tall, red hair, etc.) • Enough to ReID? No • Helpful to ReID? Yes • A matter of how much legwork, analysis, extra data is available and accurate • Very big range for difficulty of finding the suspect • Same is true for ability of “the public” to ReID, to name the suspect

  14. Conclusion • Issue matters today -- more data potentially available to “the public” • History of useful anonymization in statistics • If collect data for statistical purposes, use only for statistical purposes, store that way, don’t ReID • DeID helps against insider & hacker threats • DeID by “the public” varies widely in the effort needed to find the “suspect” • Our conference today to help policymakers learn where DeID likely to be most useful

More Related