1 / 20

WP 9 Assessing Disclosure Risk in Microdata using Record Level Measures

This study explores disclosure risk assessment for microdata by analyzing record-level and file-level risk measures to ensure data privacy. The research delves into model sensitivity, bias criteria, model choice, and practical implications in data analysis.

Download Presentation

WP 9 Assessing Disclosure Risk in Microdata using Record Level Measures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. WP 9 Assessing Disclosure Risk in Microdata using Record Level Measures Chris SkinnerUniversity of SouthamptonC.J.Skinner@soton.ac.uk Natalie Shlomo University of SouthamptonOffice for National Statisticsn.shlomo@soton.ac.uk

  2. Disclosure Risk Assessment for Microdata • Assume: • sample • categorical key variables • no measurement error • Seek: • record level risk measures • aggregated to file level measures

  3. Record Level Measures Record with combination of key variable values Sample count with same combination = Population count with same combination = Only consider sample unique records , i.e. = Pr(population unique) = = Pr(correct match)=

  4. Aggregated File-level Measures Expected number of population uniques in sample Expected number of correct matches among sample uniques to the population Note: sample uniques

  5. Estimation Problem • To make inference about: • Record level measures and for sample unique • File level measures and

  6. Log-linear Model • , and independent given • where , sampling fraction Estimate by maximum likelihood , , ,

  7. Some Literature Skinner and Holmes (1998, JOS): good properties of under all two-way interactions log-linear model, where: , Elamir and Skinner (2006, JOS): good properties of and under all two-way interactions model, but no need for term.

  8. Model Sensitivity All two-way interactions model performs well, but… still evidence of some model-dependence of and in neighborhood of this model. Tendency for risk to decrease as model complexity increases.

  9. Model Choice • Goodness of fit tests? • Pearson? • Likelihood ratio? • AIC, BIC? • Problems with very large and sparse tables

  10. Bias Criterion Allow for small departures from Estimate bias of by: Choose model to minimise Similar to choosing model to minimise

  11. Minimising Over- (Under-) Dispersion Model estimates degree of over- or under-dispersion tests hypothesis of equal dispersion Cameron and Trivedi (1998)

  12. Samples from 2001 UK Census Two areas with population of 944,793. ‘Large’ Key: Area (2), Sex (2), Age (101), Marital Status (6), Ethnicity (17), Economic Activity (10) 412,080 cells ‘Small’ Key: same except Age (18) 73,440 cells

  13. Small key, Simple random sample of size 18,896 True values: number of population uniques in sample: sum of over sample uniques:

  14. Large Key, Simple random sample of size 4,724True values,

  15. Model Search Algorithm • Starting solution: all 2-way interactions log-linear model • Search by: • Removing terms • Adding terms • Swapping terms • TABU method of Drezner, Marcoulides and Salhi (1999)

  16. Large key, Simple random sample of size 9,448True values ,

  17. True values ,

  18. Record Level Risk Measures Preferred Model: {ea}{s*a}{s*m}(s*et}{s*ec}{a*m}{a*et}{a*ec}(m*et}{m*ec}True Global Risk: Estimated Global Risk

  19. Record Level Risk Measures Preferred Model: {ea}{s*a}{s*m}(s*et}{s*ec}{a*m}{a*et}{a*ec}(m*et}{m*ec}True Global Risk: Estimated Global Risk

  20. Conclusions • Model selection by assessing over-, under-dispersion • Similar risk estimates for models with nearly Poisson dispersion • Further work: • - stratification of files • - complex survey designs

More Related