190 likes | 361 Views
Statistical Disclosure Control (SDC) for 2011 Census Progress Update Keith Spicer – ONS SDC Methodology 23 April 2009. CONTENTS. 2011 Census: Context : Progress Tabular outputs: Short-listed methods Risk Utility Framework and measures Registrars General statements Microdata:
E N D
Statistical Disclosure Control (SDC) for 2011 Census Progress Update Keith Spicer – ONS SDC Methodology 23 April 2009
CONTENTS • 2011 Census: Context • : Progress • Tabular outputs: • Short-listed methods • Risk Utility Framework and measures • Registrars General statements • Microdata: • Reflection on 2001 use of SDC • Issues arising
SDC for 2011 Census outputs is a major concern for users Different SDC methodologies were adopted for tabular 2001 Census outputs across UK Late addition of small cell adjustment by ONS/NISRA resulted in high level of user confusion and dissatisfaction Publicised commitment to aim for a common UK SDC methodology for all 2011 Census outputs 2011 Census - Context
Progress • Development of SDC Strategy • UK SDC working group established to take forward methodological work consisting of representatives from Wales, Northern Ireland and Scotland • UKCDMAC subgroup set up to QA work • Methodological research: • Determine the short-list of SDC methods (Aug ‘07) • Quantitative evaluation of short-list (continuing)
Short-listed methods PRE-TABULAR Record swapping Over-imputation POST-TABULAR IACP (Invariant ABS Cell Perturbation) Using 2001 Census tables to assess SDC methods
Characteristics: Age: 22, Sex: Male, Marital Status: Married No of Cars: 3 Region: Area A Characteristics Age: 22, Sex: Male, Marital Status: Married No of Cars: 1 Region: Area B A B Unique as only person with 3 cars in Area A Area B Matches all variables except No of Cars Swap records Record Swapping Treatment: • Find a different geographical Area • Identify another individual in a different area with virtually all the same characteristics • Swap the records
Select set of records to be protected – either random or targeted Distance based nearest neighbour to use as a donor based on a set of matching variables Over-Imputation Blank out age from record Find a donor to impute age
Based on method developed by Australian Bureau of Statistics (ABS) Perturb each cell value in a table to create uncertainty around the true value This new post-tabular method preserves consistency: same cell value in different tables always the same – however small inconsistencies when cells broken down further Invariant ABS Cell Perturbation (IACP) Method
Risk Utility Framework Minimising risk of disclosure is important (in fact probably the most important aspect of SDC) But so is maintaining utility of data………
The Statistical Disclosure Control Problem Disclosure Risk: Information about confidential units Original Data Maximum Tolerable Risk Released Data No data Data Utility: Information about legitimate items
Risk and Utility Measures Risk measures (original v protected): Attribute disclosure - % protected Group disclosure Within group disclosure Negative attribute disclosure % of zeros left unchanged Identity disclosure - % small cells unperturbed
Risk and Utility Measures Utility measures (original v protected table): Ratio of variances across variables Association between variables – Cramers V Hellingers Distance metric Absolute Deviation – Relative & Absolute Impact on totals & sub-totals
Registrars General statements • Commitment to aim for common UK SDC methodology • Small counts could be included in publicly disseminated tables provided that • Sufficient uncertainty that count is true value • Creating that uncertainty does not significantly damage the data • Key risk for 2011 output is attribute disclosure • Their preference is for pre-tabular method
SDC for Tabular Outputs: Next steps Intention to go to UKCC in July 2009 with broad strategy Additional work on level of protection necessary
Microdata: Issues arising I • Protection through either access (CAMS), data perturbation (EUL samples) or bit of both (SL-HSAR) • PRAM involved post-randomisation of variables – transition probability matrix; most values perturbed, if at all, by one or two categories – goal to treat sample uniques that are also population uniques • How much protection is offered by EUL, SDS, VML • Onus on researchers to comply with conditions as well as ONS to provide access
Microdata: Issues arising II • Smaller sample does help (uncertainty that an individual or household is in the microdata) • Want tabular outputs to provide “sufficient uncertainty” at all geographies – c.f. record swapping in Scotland 2001 • Over-imputation and IACP would offer some protection to microdata • After decision on tabular outputs, need to consider any additional SDC needed for microdata products
Summary • UK SDC Working Group in mid-June; UKCC in late July to agree strategy for tabular outputs • Three short-listed methods • Effect on microdata is among assessment criteria • Choice of method for tables will influence how we protect microdata • Likely to be a range of microdata samples – making use of either/both SDC and access conditions • Work on specific SDC methods for microdata will progress further after decision on tabular methods
Thank you Any Questions ?