170 likes | 311 Views
Intruder Testing: Demonstrating practical evidence of disclosure protection in 2011 UK Census. Joint UNECE/ Eurostat Work Session on Statistical Data Confidentiality, Ottawa, 28-30 October 2013. Keith Spicer, Caroline Tudor and George Cornish. Forthcoming Attractions. 2011 UK Census
E N D
Intruder Testing: Demonstrating practical evidence of disclosure protection in 2011 UK Census Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality, Ottawa, 28-30 October 2013 Keith Spicer, Caroline Tudor and George Cornish
Forthcoming Attractions • 2011 UK Census • SDC method: targeted record swapping • Sufficient uncertainty • Intruder testing: • Considerations • The intruders • Feedback • Validating Claims • Results • Conclusions
2011 UK Census • Context of user criticism in 2001 • Small cell adjustment • Poor utility of some outputs • Needed additivity and consistency • Evaluation of possible SDC methods • Record swapping selected • Swap households (individuals in communals) • Targeted to ‘risky’ records • Swap rate sufficiently low to maintain utility
Level of Protection: Sufficient Uncertainty • SRSA 2007 – Personal information must not be disclosed • Impossible to get zero risk • There will be 1s, 2s and attribute disclosures in tables • Some will be real • Some will be fake • Census White Paper: “no statistics will be produced that allow the identification of an individual......with a high degree of confidence” • Needs to be “sufficient uncertainty”
ICO Code of Practice • ICO = Information Commissioner’s Office, who oversee interpretation of Data Protection and Freedom of Information Acts • Issued Code of Practice in light of abortions FOI case • Encouraged empirical evidence of disclosure risk • Intruder testing of reconviction data by Ministry of Justice provided a steer
Intruder Testing - Considerations • Recruitment of intruders • Security of Census database • Creation of pre-publication tables • Tables for own Output area (c. 300 population) • Tables for own MSOA (c. 7,500 population) • Maps for local areas • Unrestricted internet access (2nd laptop) • Briefing material • Validating claims • Ethical considerations
Intruder Testing – The intruders • 18 intruders • ONS staff or contractors with security clearance • Few with SDC experience • All with excellent IT skills adept with data • Range of grades up to Divisional Director • Range of local areas in England & Wales • Availability for at least ½ day
Intruder Testing – Other issues • Intruders claims • Only general feedback given • No specific claim confirmed or denied • Checking claims • Potentially of people the checker knows (e.g. A self-identification made by work colleague) • Consent of intruders • Websites • Paying for access • Retaining search details (intruder’s identity) • Laptops wiped after each intruder
Intruder Feedback • For each claim: • Name of person • Address • Table and cell reference • Type of claim: identification or attribute (and which attribute) • Reasoning, variables, tables, websites used • Level of confidence in claim
Intruder Feedback • Intruders took between 1.5 and 6 hours • 16 of 18 intruders made at least one claim • >50 claims made in total • Tables looked sensible for their areas • Swap rate looked low • Generally intruders felt utility preserved
Validating Claims • Cell reference and table used to obtain form id • Form id Census image on the image database (very restricted access) • Correct claim if match name and address • Check of logic used by intruder
Results Level of confidence in intruder’s claim
Results • 48% claims correct overall • Best success rate for claims made with 60-79% Confidence (67% correct) • Self / family 61% correct (v 36% other) • Very few attribute disclosure claims (<10%) • Tables used most: • Age x sex x industry • Age x sex x marital status • Age x sex x economic activity • Sex x industry x economic activity • Age x sex x health x disability
How could so many claims be wrong? • Non-response • Imputation (both person and item) • Capture error (e.g. write-in date of birth) • Processing (esp. coding from free text) • Respondent error • Record swapping • Intruder error
Conclusions for Census • Fewer than half claims correct • Fewer than half “high confidence” claims correct • How much uncertainty is “sufficient”? • ICO have endorsed this work and said “risk is manageable” • Special attention to the most used tables and their “close relatives” • National Statistician content • Communication strategy important
Conclusions for Intruder Testing • Useful for assessing risk empirically • Considerable resource needed • Need lot of support • Wouldn’t suggest doing for every output • Need assessment of what “success” looks like • Use in conjunction with theoretical work