170 likes | 184 Views
Demonstrating practical evidence of disclosure protection in 2011 UK Census through intruder testing methodology. Includes considerations, feedback, validation, results, and conclusions from targeted record swapping. Use cases and insights for statistical data confidentiality.
E N D
Intruder Testing: Demonstrating practical evidence of disclosure protection in 2011 UK Census Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality, Ottawa, 28-30 October 2013 Keith Spicer, Caroline Tudor and George Cornish
Forthcoming Attractions • 2011 UK Census • SDC method: targeted record swapping • Sufficient uncertainty • Intruder testing: • Considerations • The intruders • Feedback • Validating Claims • Results • Conclusions
2011 UK Census • Context of user criticism in 2001 • Small cell adjustment • Poor utility of some outputs • Needed additivity and consistency • Evaluation of possible SDC methods • Record swapping selected • Swap households (individuals in communals) • Targeted to ‘risky’ records • Swap rate sufficiently low to maintain utility
Level of Protection: Sufficient Uncertainty • SRSA 2007 – Personal information must not be disclosed • Impossible to get zero risk • There will be 1s, 2s and attribute disclosures in tables • Some will be real • Some will be fake • Census White Paper: “no statistics will be produced that allow the identification of an individual......with a high degree of confidence” • Needs to be “sufficient uncertainty”
ICO Code of Practice • ICO = Information Commissioner’s Office, who oversee interpretation of Data Protection and Freedom of Information Acts • Issued Code of Practice in light of abortions FOI case • Encouraged empirical evidence of disclosure risk • Intruder testing of reconviction data by Ministry of Justice provided a steer
Intruder Testing - Considerations • Recruitment of intruders • Security of Census database • Creation of pre-publication tables • Tables for own Output area (c. 300 population) • Tables for own MSOA (c. 7,500 population) • Maps for local areas • Unrestricted internet access (2nd laptop) • Briefing material • Validating claims • Ethical considerations
Intruder Testing – The intruders • 18 intruders • ONS staff or contractors with security clearance • Few with SDC experience • All with excellent IT skills adept with data • Range of grades up to Divisional Director • Range of local areas in England & Wales • Availability for at least ½ day
Intruder Testing – Other issues • Intruders claims • Only general feedback given • No specific claim confirmed or denied • Checking claims • Potentially of people the checker knows (e.g. A self-identification made by work colleague) • Consent of intruders • Websites • Paying for access • Retaining search details (intruder’s identity) • Laptops wiped after each intruder
Intruder Feedback • For each claim: • Name of person • Address • Table and cell reference • Type of claim: identification or attribute (and which attribute) • Reasoning, variables, tables, websites used • Level of confidence in claim
Intruder Feedback • Intruders took between 1.5 and 6 hours • 16 of 18 intruders made at least one claim • >50 claims made in total • Tables looked sensible for their areas • Swap rate looked low • Generally intruders felt utility preserved
Validating Claims • Cell reference and table used to obtain form id • Form id Census image on the image database (very restricted access) • Correct claim if match name and address • Check of logic used by intruder
Results Level of confidence in intruder’s claim
Results • 48% claims correct overall • Best success rate for claims made with 60-79% Confidence (67% correct) • Self / family 61% correct (v 36% other) • Very few attribute disclosure claims (<10%) • Tables used most: • Age x sex x industry • Age x sex x marital status • Age x sex x economic activity • Sex x industry x economic activity • Age x sex x health x disability
How could so many claims be wrong? • Non-response • Imputation (both person and item) • Capture error (e.g. write-in date of birth) • Processing (esp. coding from free text) • Respondent error • Record swapping • Intruder error
Conclusions for Census • Fewer than half claims correct • Fewer than half “high confidence” claims correct • How much uncertainty is “sufficient”? • ICO have endorsed this work and said “risk is manageable” • Special attention to the most used tables and their “close relatives” • National Statistician content • Communication strategy important
Conclusions for Intruder Testing • Useful for assessing risk empirically • Considerable resource needed • Need lot of support • Wouldn’t suggest doing for every output • Need assessment of what “success” looks like • Use in conjunction with theoretical work