260 likes | 277 Views
This document provides an overview of the disclosure control measures implemented in the UK Census, including methods such as record swapping and small cell adjustment to protect confidentiality. It also discusses the challenges and lessons learnt in ensuring the privacy of individuals and households.
E N D
Disclosure Control in the UK Census Keith Spicer 11 January 2005
2 Contents National Statistics Code of Practice Background 2001 Census Disclosure Control – tables 2001 Samples of Anonymised Records Summary and lessons learnt
3 “The information you provide is protected by law and treated in strict confidence” 2001 Census form “Precautions will be taken so that published tabulations and abstracts of statistical data do not reveal any information about identifiable individuals or households” 2001 Census White Paper Cm4523, para 120
4 National Statistics Code of Practice “The National Statistician will set standards for protecting confidentiality, including a guarantee that no statistics will be produced that are likely to identify an individual unless specifically agreed with them” “It would take a disproportionate amount of time, effort and expertise for an intruder to identify a statistical unit to others, or to reveal information about that unit not already in the public domain”
5 National Statistics Code of Practice The purpose of disclosure control is to ensure that no unauthorised individual, technically competent with public data and private information could: identify information on an individual that has been supplied in confidence to ONS (such as in census or survey returns) with a reasonable degree of confidence
6 National Statistics Code of Practice Identity Disclosure – the association of a respondent’s identity with a disseminated data record Attribute Disclosure – the association of a respondent with an attribute value in the disseminated data (or an estimated attribute value based on the disseminated data)
7 Background Disclosure Example 1 The table is disclosive because: (1) The person who is Not Econ Active and not LLTI can be identified in the table, both by themselves and others who know all the information (Identity Disclosure) (2) Any of these could then deduce that any other widowed male 45-59, COB=not UK and not Econ Active, has LLTI. For widowed males aged 45-59, COB=not UK
8 Background Disclosure Example 2 The table is disclosive because: If you know someone who is Separated, Widowed or Divorced in Area B, you can deduce they have 1 Car. Information being disclosed (Attribute Disclosure)
9 Background Disclosure Example 3 The tables are disclosive because: Though each table is not disclosive by itself, they are in combination – we can ascertain a similar table for Area E The Area E table would have a 1 for LLTI – Qual cell Disclosure by Differencing. Area C (contains two smaller areas D and E)
10 Background 1991 Census Barnardisation: Adjustment of cells in tables by -1, 0 or +1, so that observed 1s not true 1s for certain However, still a good chance that an observed 1 was a ‘true’ 1 A degree of uncertainty about the accuracy of information apparently disclosed about an individual does not ensure that confidentiality has been completely protected
11 Background Since 1991: Increased risk of disclosure in 2001:- • 2001 Census results more widely accessible, allowing Census data to be downloaded more freely • Electronic storage of other data sets now much easier – increased risk of Census data being matched with other sources
12 Background • More detail in 2001 Census outputs as smaller areas and more flexible boundaries desired by users. Data provided were considerably lower in geographic size than lowest level provided in 1991 • Changing attitudes to trust in which public agencies are held • 2001 Census data 100% coded, as opposed to 10% (for some) in 1991 – the 10% added level of uncertainty to published results
13 2001 Census Disclosure Control PRE-TABULATION Changes made to data records prior to preparing tables. 2001 Census the first to consider pre-tabulation methods as part of disclosure control. Record swapping Entire household record, except geographic variables, swapped with another in neighbouring area (paired on number, sex and grouped age of persons) Within LA - does not affect stats at LA or above No need for additional edit checks Statistical differences less than volume of changes Creates uncertainty about accuracy of identity
14 2001 Census Disclosure Control POST-TABULATION Changes made subsequent to preparing tables. Generally time-consuming as each output has to be checked. Small Cell Adjustment • Only cells containing small counts are adjusted, so level of adjustment considerably less than that imposed under rounding • Adjustment usually has little impact on the conclusions that can be validly drawn from the data • Each table internally additive, though some totals from different breakdowns may be different
15 2001 Census Disclosure Control 2001 Census disclosure control used:- • Record swapping – to introduce a degree of uncertainty into identity without affecting figures at LA and above • Small cell adjustment – in addition, so that highly unusual people and households significantly less visible in the outputs • Thresholds for Output Areas – minimum 40 households, 100 persons (recommended size 125 households); Standard Tables minimum 400 households, 1000 persons • Use of Output Areas as building blocks
16 2001 Census Disclosure Control Effects:- • Small cells in tables will not necessarily be ‘true’ figures • Each table internally additive, but totals may appear inconsistent between different tables • Time consuming for ONS to check each set of tables produced – particularly for Commissioned Output, for small areas; possibility of disclosure by differencing
17 2001 Census Disclosure Control Advice for users • Use highest level of geography with fewest breakdowns and fewest number of cells summed • Sources of error not only in disclosure control but in coverage error, respondent error and other processing error, e.g. One Number Census adjustment, data capture and coding, edit and imputation
18 Samples of Anonymised Records Licensed Samples of Anonymised Records (SARs) from 2001 Census • 3% sample of individual records to Regional level (Version 1 available October 04) • 1% sample of household records to Country level (due to be available Spring 05) • Version 2 of individual SAR due to be available February 05
19 Samples of Anonymised Records • Licensed Individual SAR – available through CCSR • All researchers must sign agreement not to attempt to identify any individual from the SAR • Disclosure may be inadvertent by differencing between a number of tables
20 Samples of Anonymised Records • Initial approach to restrict sample uniques by recoding • Version 1 Individual SAR – • grouped age individual years to 15, 16-18, 8 bands 18-74, individual year 75+, • grouped ethnic group variable to 5 categories, • occupation group to 25 categories, • country of birth E, W, S, NI, Rep Ire, EU, Other • Post-Randomisation (PRAMming) – perturbation of some variables, normally by one category, only on a percentage of ‘risky’ records
21 Samples of Anonymised Records • Any observed ‘1’ in a SAR table is unlikely to be a real population ‘1’: • The 1 is 1 from a 3% sample (members unknown) • PRAMming will have the effect of ‘moving’ members into / out of cells • Version 2 Individual SAR will have:- • 81 occupational categories (25 in Version 1) • the full 16 ethnic group categories (5) • breakdown of country of birth to 16 categories (7) Due February 05
22 Samples of Anonymised Records • In-house Controlled Access SARS with full detail on 3% individuals • Labs in Titchfield and London • Access through application, form available through ONS – applications assessed by Census Research Access Board (CRAB) • All lab outputs assessed for disclosure (normally within one week)
23 Summary and lessons learnt • Tables protected by both pre-tabulation (record swapping) and post-tabulation (small cell adjustment) • SARs available for bespoke analysis • Licensed through CCSR • Controlled access through ONS data lab
24 Lessons learnt • Protection of confidentiality of individual details becomes more difficult with each Census • Disclosure risk assessment should have been carried out earlier to allow earlier consultation and more time to conduct research and develop different options
25 Lessons Learnt • Need to provide users with information about the measurement and other errors that exist within Census data • Review of 2001 disclosure control in preparation for 2011
26 Contact details Keith Spicer Office for National Statistics Segensworth Road Titchfield Fareham PO15 5RR 01329 813062 keith.spicer@ons.gov.uk sars@ons.gov.uk