320 likes | 433 Views
WP. 46 Providing access to data and making microdata safe, experiences of the ONS Jane Longhurst Paul Jackson ONS. The Statistical Disclosure Control Problem. Disclosure Risk. Original Data. Maximum Tolerable Risk. Accessed Data. No data. Data Utility.
E N D
WP. 46 Providing access to data and making microdata safe, experiences of the ONSJane LonghurstPaul JacksonONS
The Statistical Disclosure Control Problem Disclosure Risk Original Data Maximum Tolerable Risk Accessed Data No data Data Utility
Protecting and Providing Access to Microdata Legal Issues Policy Issues The Data Test and Evaluate Risk Assessment Risk Management Output
Protecting and Providing Access to Microdata Legal Issues Policy Issues The Data Test and Evaluate Risk Assessment Risk Management Output
Legal Issues Legal Context • No general statistics act • No comprehensive business register • No population register • Registrations of Births, Marriages and Deaths are public – including cause of death • A system of common law • An Information Commissioner – a privacy and an access to information champion with court powers • Data Protection / Human Rights • Freedom of Information
Legal Issues Legal Context continued : • Business Surveys have statutory protection • But ONS has the lawful authority to disclose identified business survey data to any central government department for any purpose, and any local authority for their planning purposes.
Legal Issues Legal Context continued : • Census records have statutory protection • But ONS has lawful authority to disclose personal census information to any person for statistical purposes.
Legal Issues Legal Context continued : • Household survey records are protected by the civil “common law duty of confidence.” • But ONS has lawful authority to disclose identifying household survey data to any person where there is informed consent. • And ONS survey pledges obtain consent for disclosures of ‘detailed but anonymised data’to any genuine researcher.
Legal Issues Legal Context This extraordinary authority to disclose identifying microdata to certain persons, departments and authorities only delays the real issue – • The access needs management – MRP • When it is not ONS applying the SDC standards for outputs, then someone else has to. • Therefore usable standards and guidance are essential
Legal Issues Legal Context • So when ONS has so many options, how does it decide – • i) who should have controlled access under what conditions, and • ii) what ONS or other users’ outputs should look like. So we need Policy
Protecting and Providing Access to Microdata Legal Issues Policy Issues The Data Test and Evaluate Risk Assessment Risk Management Output
Policy Issues So we need Policy • National Statistics Code of Practice for the GSS • Protocol for data access and confidentiality • A Confidentiality Guarantee, • National Statistics are guaranteed not likely to identify an individual, assuming an intruder is prepared to use a proportionate amount of time, effort and expertise . • Departmental policy • Variations according to considerations of : • data source type • risk analysis and management • methodology • access / release options
Protecting and Providing Access to Microdata Legal Issues Policy Issues The Data Test and Evaluate Risk Assessment Risk Management Output
Protecting and Providing Access to Microdata Legal Issues Policy Issues The Data Test and Evaluate Risk Assessment Risk Management Output
Risk Assessment • An element of disclosure risk comes from records that are unique in the sample and in a known population • Several approaches to assessing the disclosure risk in microdata: • Disclosure risk scenarios • Variable checklist • Quantitative risk measures
Disclosure Risk Scenarios • Identify possible situations where disclosure risk could occur • Assumptions concerning prior knowledge of intruder and information available to him, e.g. private database, journalist, nosy neighbour • Identify key variables - indirectly identifying variables • Use this process to decide what needs to be protected against • can be complex • requires discussion and judgement
SDC Checklist for Microdata Release • Level of geography • Ethnic classification • Detail of occupation • Visible variables • Traceable variables • Survey design • Dissemination
Quantitative Risk Assessment • Recognised need for quantitative risk measures • Research project initiated • Need for individual and global risk measures • Problem for sample microdata is that population is an unknown parameter • Different methods for estimating the disclosure risk measures • Heuristics • Probabilistic models
Probabilistic Modelling • Estimate the disclosure risk based on natural assumptions about the distribution of the population • Provides linked estimates of individual and global risk measures • Research focused on • Model selection techniques • Robustness of estimates • Goodness of fit criteria • Tested on ONS social surveys
Heuristics • DIS/SUDA method consists of two elements • DIS - file level assessment of risk • SUDA - grades and orders records within a file according to level of risk • Provide variable and variable value contribution to the risk • Implemented by ONS for 2001 Census SAR
Evaluation of Quantitative Risk Measures • Simulate sample surveys from Census data • Compare risk measures with true risk • Practical considerations • How to set thresholds • Incorporate risk measures into MRP decision process
Protecting and Providing Access to Microdata Legal Issues Policy Issues The Data Test and Evaluate Risk Assessment Risk Management Output
SDC for Microdata • Perturbative methods • Record swapping • Adding noise • Non-perturbative methods • Recoding • Suppression • Sub-sampling • Mixed strategies • ONS mainly implements recoding • PRAM implemented for 2001 Census SAR
Access Options - SPECIALISTS Data Laboratory • Only government can use identifying business micro-data • Identified census data is high risk • Hence the on-site lab and the employment contracts • Only safe data can leave the laboratory. • Approx 150 users/yr
Access Options - GOVERNMENT Access Agreements in central and local government. • UK is a devolved statistical system • ONS discloses identifying survey micro-data to other government departments for statistics and research purposes • Users are professionals like us, subject to the same Code of Practice, and the same laws. • We don’t screen for research validity • We don’t check outputs • Approximately 300 disclosures of confidential micro-data every year • No known breaches of confidentiality.
Access Options - RESEARCH For the academic researchers, the UK Data Archive • If it didn’t exist, we’d have to invent it. • All ONS household survey datasets are deposited with UKDA • Year of birth, regional geography, all other variables (limited coding) • Some large households removed • Academic researchers and government departments can download the dataset upon signing a user license. Takes about an hour. • This year, 16,600 downloads have taken place. Each can have up to 10 users in the institution…. • ONS does not screen the license applications • ONS does not vet the research proposals • ONS does not check outputs • In place for 30 years now • No known instance of wrongful identification.
Access Options The UK Data Archive, con’t But this is not enough. • So ONS has now created the ‘Special License’ • Month of birth • Local authority geography • All households • Still access by downloading the data. • ONS does check each Special License application • But not for valid research, only data needs, • And we still don’t check any outputs
Access Options - PUBLIC For the Public, Freedom of Information • ONS can only withhold microdata where its disclosure to an applicant would be likely to result, in : • A breach of any law it was collected under • An actionable breach of confidence • A breach of a data protection principle • The Scottish Information Commissioner has instructed the Scottish Health Service to disclose to an applicant the counts of Leukaemia in under 14 yr olds by Ward (average ward population approx 4,000) • The table was all 1s and zeros – effectively microdata, and ‘safe‘.
Access Options Are ONS access options and practices reasonable? • They follow the constructs used by the Courts and Information Commissioners, in that policies are written in plain English • Licensed academic users are, in 30 years of experience, not intruders. They are trusted colleagues – and like us they can make mistakes sometimes. • Other civil service professionals are not intruders – they are as reliable and trustworthy as we are. They too have professional codes of conduct, ethics, and moral principles • All statisticians and researchers need clear rules, and should be trusted to follow them.
Protecting and Providing Access to Microdata Legal Issues Policy Issues The Data Test and Evaluate Risk Assessment Risk Management Output
Protecting and Providing Access to Microdata Legal Issues Policy Issues The Data Test and Evaluate Risk Assessment Risk Management Output
OUTPUTS • Whatever access privileges • Whatever research topic • Whoever you are • Outputs must be protected to the same standards • Best research carried out when richest microdata is made available to those that can be trusted to apply these standards for outputs