80 likes | 227 Views
“Mortgages, Privacy, and Deidentified Data”. Professor Peter Swire Ohio State University Center for American Progress Consumer Financial Protection Bureau Conference on “New Research on Sustainable Mortgages & Access to Credit” October 6, 2011. Overview.
E N D
“Mortgages, Privacy, and Deidentified Data” Professor Peter Swire Ohio State University Center for American Progress Consumer Financial Protection Bureau Conference on “New Research on Sustainable Mortgages & Access to Credit” October 6, 2011
Overview • Federal experience to date with deidentification (“DeID”) • Why DeID technically harder over time • Technical & administrative measures to protect identity • Court records: public records and privacy • Conclusion: Technology alone often cannot succeed, so the choice becomes make public, keep private, or create effective data use agreements
Federal DeID to Date • 2000 HIPAA rule • Recognized reidentification (“ReID”) is possible • Can scrub 18 data fields; or expert testifies have “very small” risk of ReID • Current HHS study in progress on DeID – similar issues to financial data • Data.gov • Administration push for transparency • Privacy & DeID more challenging than many had hoped • Census data • History of census data sensitivity, required data collection • Suppress small cell size; technical limits on researchers’ access
Why DeID is Harder over Time • Two tech trends • Search vastly improved: Google incorporated in 1999 • Increase in (almost) unique publicly available facts • Mortgages • Street View of each house -- pictures • Public records and likely market values & date of sale of each house • Social networks, blogs, marketing information available for purchase: • “We got our new house today, and Bank X did a great/lousy job” • How hard for forensic, automated efforts to reID? • Sweeney “K-anonymity” and can shrink “deID mortgage” to one or a few properties
Technical Measures • Technical measures to DeID may: • Be subject to ReID (previous slide); • Introduce noise to data; or • Both • Add noise (or subtract signal) • Census approach • Public data set, suppress small cell size, lots of noise; or • Researchers can run regressions using somewhat better data • Cynthia Dwork’s “differential privacy” (Microsoft Research) • Limits queries into database based on tolerance for ReID • Agrawal and other IBM research • “Hippocratic Database” adds noise with goal of allowing analysis but minimizing risk of linkage
Administrative Measures • HIPAA data use agreements • Agreements apply to a “limited data set”, with obvious identifiers (name, address) stripped out • Data use agreement • Contractual guarantees to use data only for limited purposes, such as research • Promise to use appropriate safeguards on data • Promise not to reID the data • 2009 CDT conference report on DeID and health data emphasized importance of administrative safeguards
Public Records & Privacy • Court records have been the subject of intense study on tradeoffs of public records and privacy • Strong reasons for public access • Privacy: juvenile court, financial account info, etc. • Annual Williamsburg conference, each November • Many state task forces on subject
Conclusion • Some records are or should be public • Some records are or should be private • Ability to ReID is large and growing • Technical measures to mask exist but are limited in applicability • Administrative measures often essential for researchers to get meaningful results • Technology alone often cannot succeed, so the choice becomes make public, keep private, or create effective data use agreements