1 / 19

Statistical Disclosure Control (SDC) for 2011 Census Progress Update

Statistical Disclosure Control (SDC) for 2011 Census Progress Update Keith Spicer – ONS SDC Methodology 23 April 2009. CONTENTS. 2011 Census: Context : Progress Tabular outputs: Short-listed methods Risk Utility Framework and measures Registrars General statements Microdata:

alessa
Download Presentation

Statistical Disclosure Control (SDC) for 2011 Census Progress Update

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistical Disclosure Control (SDC) for 2011 Census Progress Update Keith Spicer – ONS SDC Methodology 23 April 2009

  2. CONTENTS • 2011 Census: Context • : Progress • Tabular outputs: • Short-listed methods • Risk Utility Framework and measures • Registrars General statements • Microdata: • Reflection on 2001 use of SDC • Issues arising

  3. SDC for 2011 Census outputs is a major concern for users Different SDC methodologies were adopted for tabular 2001 Census outputs across UK Late addition of small cell adjustment by ONS/NISRA resulted in high level of user confusion and dissatisfaction Publicised commitment to aim for a common UK SDC methodology for all 2011 Census outputs 2011 Census - Context

  4. Progress • Development of SDC Strategy • UK SDC working group established to take forward methodological work consisting of representatives from Wales, Northern Ireland and Scotland • UKCDMAC subgroup set up to QA work • Methodological research: • Determine the short-list of SDC methods (Aug ‘07) • Quantitative evaluation of short-list (continuing)

  5. Short-listed methods PRE-TABULAR Record swapping Over-imputation POST-TABULAR IACP (Invariant ABS Cell Perturbation) Using 2001 Census tables to assess SDC methods

  6. Characteristics: Age: 22, Sex: Male, Marital Status: Married No of Cars: 3 Region: Area A Characteristics Age: 22, Sex: Male, Marital Status: Married No of Cars: 1 Region: Area B A B Unique as only person with 3 cars in Area A Area B Matches all variables except No of Cars Swap records Record Swapping Treatment: • Find a different geographical Area • Identify another individual in a different area with virtually all the same characteristics • Swap the records

  7. Select set of records to be protected – either random or targeted Distance based nearest neighbour to use as a donor based on a set of matching variables Over-Imputation Blank out age from record Find a donor to impute age

  8. Based on method developed by Australian Bureau of Statistics (ABS) Perturb each cell value in a table to create uncertainty around the true value This new post-tabular method preserves consistency: same cell value in different tables always the same – however small inconsistencies when cells broken down further Invariant ABS Cell Perturbation (IACP) Method

  9. Risk Utility Framework Minimising risk of disclosure is important (in fact probably the most important aspect of SDC) But so is maintaining utility of data………

  10. The Statistical Disclosure Control Problem Disclosure Risk: Information about confidential units Original Data Maximum Tolerable Risk Released Data No data Data Utility: Information about legitimate items

  11. Risk and Utility Measures Risk measures (original v protected): Attribute disclosure - % protected Group disclosure Within group disclosure Negative attribute disclosure % of zeros left unchanged Identity disclosure - % small cells unperturbed

  12. Risk and Utility Measures Utility measures (original v protected table): Ratio of variances across variables Association between variables – Cramers V Hellingers Distance metric Absolute Deviation – Relative & Absolute Impact on totals & sub-totals

  13. Registrars General statements • Commitment to aim for common UK SDC methodology • Small counts could be included in publicly disseminated tables provided that • Sufficient uncertainty that count is true value • Creating that uncertainty does not significantly damage the data • Key risk for 2011 output is attribute disclosure • Their preference is for pre-tabular method

  14. SDC for Tabular Outputs: Next steps Intention to go to UKCC in July 2009 with broad strategy Additional work on level of protection necessary

  15. Microdata: reflection on 2001 use of SDC

  16. Microdata: Issues arising I • Protection through either access (CAMS), data perturbation (EUL samples) or bit of both (SL-HSAR) • PRAM involved post-randomisation of variables – transition probability matrix; most values perturbed, if at all, by one or two categories – goal to treat sample uniques that are also population uniques • How much protection is offered by EUL, SDS, VML • Onus on researchers to comply with conditions as well as ONS to provide access

  17. Microdata: Issues arising II • Smaller sample does help (uncertainty that an individual or household is in the microdata) • Want tabular outputs to provide “sufficient uncertainty” at all geographies – c.f. record swapping in Scotland 2001 • Over-imputation and IACP would offer some protection to microdata • After decision on tabular outputs, need to consider any additional SDC needed for microdata products

  18. Summary • UK SDC Working Group in mid-June; UKCC in late July to agree strategy for tabular outputs • Three short-listed methods • Effect on microdata is among assessment criteria • Choice of method for tables will influence how we protect microdata • Likely to be a range of microdata samples – making use of either/both SDC and access conditions • Work on specific SDC methods for microdata will progress further after decision on tabular methods

  19. Thank you Any Questions ?

More Related