1 / 14

The use of protected microdata in tabulation: case of SDC-methods microaggregation and PRAM

The use of protected microdata in tabulation: case of SDC-methods microaggregation and PRAM. Researcher Janika Konnu Manchester, United Kingdom 17-19 December 2007. Outline. Data SDC-methods Results Conclusions Forthcoming research. Data used in the study.

razi
Download Presentation

The use of protected microdata in tabulation: case of SDC-methods microaggregation and PRAM

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The use of protected microdata in tabulation: case of SDC-methods microaggregation and PRAM Researcher Janika Konnu Manchester, United Kingdom 17-19 December 2007

  2. Outline • Data • SDC-methods • Results • Conclusions • Forthcoming research Janika Konnu

  3. Data used in the study • Data of teachers was originally collected for administrative purposes. • Only high schools teachers (N=7798) were included in our study. • Data included information about • teachers: age, gender, position, etc. • the schools those teachers taught in: the location of the school, number of students, etc. Janika Konnu

  4. First data is divided into groups of k observations and the group averages are released instead of original values of variable. MDAV-algorithm was used in grouping: algorithm finds the average observation with respect to the values and forms groups by using the distance from this average observation. Grouping the data is the crucial point for this method: when the most similar observations are contained in the group, information loss will be minimised. In our study microaggregation was used for categorical data although it is intended for numerical data. SDC Methods: Microaggregation Janika Konnu

  5. Method changes values of a variable according to probability matrix (Markov matrix) example: When PRAM is applied, data user must take the probability matrix into account in order to obtain correct results. In our study we were testing usefulness of PRAM when probability matrix is not used in analysis. SDC Methods: The Post RAndomization Method Janika Konnu

  6. Empirical work: -Argus software • Software includes disclosure risk measurement and following methods: global recoding, local suppression, top and bottom coding, PRAM, numerical microaggregation, numerical rank swapping and Sullivan masking. • Software produces protected data if suppressions are allowed. • In our case, only SDC-methods PRAM and numerical microaggregation were studied. No suppressions were made, because we needed information on the difference between original and protected data. Janika Konnu

  7. Results: Data protected by MicroaggregationGroup sizes used in protection are 2, 5, 8, 10 and 15 • Microaggregation does not have an effect on frequencies. • Unfortunately this implies that hardly any change occur in values. • Conclusion: microaggregation does not give strong enough protection when it comes to categorical data. Janika Konnu

  8. Results: Data protected by PRAM (no bandwidth)Changing probabilities: 0.05, 0.10, 0.20, 0.30 and 0.40 • PRAM changes values of variables and that way data will be protected. • Unfortunately PRAM leads to problems when categories have big differences in the frequencies. The larger frequency keeps getting smaller and the other way around. Janika Konnu

  9. Results: Data protected by PRAM (bandwidth is 2)Changing probabilities: 0.05, 0.10, 0.20, 0.30 and 0.40 • Restricting the change of values can not solve problem with difference in frequencies. • Our study shows that frequencies in categories next to the one with largest frequency still grow too fast. Janika Konnu

  10. Results: Data protected by PRAM No bandwidth Bandwidth is 2 Janika Konnu

  11. Conclusion: Microaggregation • Microaggregation perform well with numerical data, but its application for categorical data needs more research. • Data protected by microaggregation includes almost the same information as the original data. • Are we sure that microaggregation is able to protect categorical data properly? Janika Konnu

  12. Conclusion: PRAM • PRAM seems to perform quite well when it comes to protecting data, but there are some issues to overcome. • PRAM can protect data with small changing probabilities, because it is based on uncertainty of identification. • In this case our concern is with information loss. Is the protected data useful without using probability matrix? Janika Konnu

  13. Forthcoming research • Include more methods • rank swapping • noise adding • Include disclosure risk measures • Include more precise measurement for information loss Janika Konnu

  14. Some preferences • Domingo-Ferrer, J., Torra, V. 2001. A Quantitative Comparison of Disclosure Control Methods for Microdata. In Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies. Amsterdam: North-Holland. • Gouweleeuw, J., Kooiman, P., Willenborg, L., and de Wolf, P. 1998. Post Randomisation for Statistical Disclosure Control: Theory and Implementation. Journal of Official Statistics. Vol. 14, No.4, s. 463--478. • Group Crises. 2004. Research Reports: Microaggregation for Privacy Protection in Statistical Databases. In July 2005. <http://vneumann.etse.urv.es/publications/reports/>. Thank You! Janika Konnu

More Related