1 / 13

SAFE Method for Anonymizing German Census Data

Learn about SAFE method that ensures data confidentiality by creating anonymous microdata files with advantages & disadvantages in statistical data confidentiality sessions. The method involves a mathematical model and tests with census data sets. Discover how the State Statistical Institute in Berlin-Brandenburg adapts SAFE for different census units, facilitates splitting microdata sets, and maintains data structure. Explore interpretations and results of SAFE solutions.

dgillespie
Download Presentation

SAFE Method for Anonymizing German Census Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SAFE –a method for anonymising the German Census Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality (Tarragona, Spain)

  2. SAFE - a pretabularmethod(1) SAFE creates an anonymous micro data file. What are anonymous micro data? • Micro data are anonymous (confidential), “if they cannot be matched to the concerned person.”(German law of Statistics §16). • Identification can be avoided by ambiguous records in the micro data file. • SAFE - micro data files are confidential because of ambiguity in the record set. State Statistical InstituteBerlin-Brandenburg

  3. SAFE - a pretabularmethod (2) Advantages: • Solution in one step • Analysis can reidentify only the anonymous micro data file • All analysis based on the same source and are consistent • No cell suppression (primary or secondary) necessary Disadvantages: • Change from cell suppression to uncertainty in cell values • New interpretation of tables • No easy extension of analysis to not controlled tables • Calculation effort may be relatively great State Statistical InstituteBerlin-Brandenburg

  4. 96-... 96-... region region 91-95 91-95 86-90 86-90 5 5 4 4 81-85 81-85 3 3 76-80 76-80 2 2 1 1 71-75 71-75 66-70 66-70 61-65 61-65 56-60 56-60 51-55 51-55 46-50 46-50 41-45 41-45 36-40 36-40 31-35 31-35 26-30 26-30 21-25 21-25 16-20 16-20 11-15 11-15 6-10 6-10 0-5 0-5 15 10 5 0 0 5 10 15 15 10 5 0 0 5 10 15 male female male female Idea of SAFE - solution (1) original anonymous • only triplets • data attacks (reidentification) lead to more than two objects State Statistical InstituteBerlin-Brandenburg

  5. 96-100 Inhabitants by region and sex region 91-95 (left: original right: anonymous) original 86-90 80 5 4 81-85 female 3 70 76-80 2 male 71-75 1 60 66-70 50 61-65 56-60 40 persons 51-55 46-50 30 41-45 20 36-40 31-35 10 26-30 0 21-25 16-20 1 2 3 4 5 11-15 female male region 6-10 52% 48% 0-5 15 10 5 0 0 5 10 15 male female 96-100 region 91-95 25 86-90 5 4 inhabitants by age and sex 81-85 3 (left: original right: anonymous) 76-80 2 20 1 71-75 66-70 female 61-65 male 56-60 15 51-55 46-50 persons 41-45 10 36-40 31-35 26-30 5 21-25 16-20 11-15 0 6-10 0-5 6-10 11-15 16-20 21-25 26-30 31-35 36-40 41-45 46-50 51-55 56-60 61-65 66-70 71-75 76-80 81-85 86-90 91-95 96-100 0-5 15 10 5 0 0 5 10 15 male female Age in 5 years Idea of SAFE - solution (2) • no real data, only triplets • but the analysis should be as similar as possible to the original micro data file anonymous State Statistical InstituteBerlin-Brandenburg

  6. Mathematical model of SAFE (1) with: y - vector of frequencies of category combinations in the micro data a - vector of original frequencies in controlled table cells A - linear relation matrix d - vector of deviation trough tabulation of anonymised micro data instead of original data di as deviation in table cell j w - vector of weights for table cells State Statistical InstituteBerlin-Brandenburg

  7. Adaption for the census different units: persons, housings, buildings are different units of the census with hierarchical dependencies. • Splitting the micro data set in different variable blocks(persons: region, age, sex, profession, ...housing: region, heating (oil, gas,...), bath, ...building: construction year, size, ...) • Counting variable for person, housing, building. Create control-table with counting variables. Persons were splitted from housing and building information. Housing and building in one data set with different counting variables. State Statistical InstituteBerlin-Brandenburg

  8. SAFE – tests with census 1987 West Germany (1) micro data file for persons register State Statistical InstituteBerlin-Brandenburg

  9. SAFE – tests with census 1987 West Germany (2) micro data file for persons register State Statistical InstituteBerlin-Brandenburg

  10. SAFE – tests with census 1987 West Germany (1) micro data file for housing and building State Statistical InstituteBerlin-Brandenburg

  11. SAFE – tests with census 1987 West Germany (2) micro data file for housing and building State Statistical InstituteBerlin-Brandenburg

  12. Interpretation of results SAFE solutions are: • Like “noise” added to table cells. • Maximal deviation is known and documented. • Relative deviation in cells decreases in greater table cell values. • Missing combinations are unlikely but not sure not existing. • Unique combinations in table row (line or column) do not allow an information gain (group disclosure problem) through not sure uniqueness in the original data. • Good preservation of structure in the data. No missing information through complementary cell suppression. State Statistical InstituteBerlin-Brandenburg

  13. Thank you for your attention! State Statistical InstituteBerlin-Brandenburg

More Related