350 likes | 460 Views
Statistical Disclosure Control for the 2011 UK Census. Jane Longhurst, Caroline Young and Caroline Miller (ONS). Outline. Context Workplan Progress Short-listing the SDC Methods Quantitative Evaluation Description of the Methods (Advantages and Disadvantages)
E N D
Statistical Disclosure Control for the 2011 UK Census Jane Longhurst, Caroline Young and Caroline Miller (ONS)
Outline • Context • Workplan • Progress • Short-listing the SDC Methods • Quantitative Evaluation • Description of the Methods (Advantages and Disadvantages) • Example Evaluation (Risk-Utility Framework) • Summary
Context The UK takes a census every 10 years. Next census due in 2011. This will comprise separate, simultaneous Censuses for England & Wales (ONS), Scotland (GROS) and Northern Ireland (NISRA).
Context • SDC for 2011 Census outputs is a major concern for users • Different SDC methodologies were adopted for standard tabular 2001 Census outputs across UK • Late addition of small cell adjustment by ONS/NISRA resulted in high level of user confusion and dissatisfaction • Publicised commitment to aim for a common UK SDC methodology for all 2011 Census outputs
Workplan • Phase 1 (March ’06 – Jan ’07) • UK agreement of key SDC policy issues • Phase 2 (Jan ’07 – Sept ’08) • Evaluation of all methods complying with agreed SDC policy position in terms of risk/utility framework and feasibility of implementation • Phase 3 (Sept ’08 – Spring/Summer ’09) • Recommendations and UK agreement of SDC methodologies for 2011 Census tabular outputs • Phase 4 (Feb ’09 onwards) • Evaluate and develop SDC methods for microdata, future work on output specification, system specification, development and testing
Progress • The UK SDC Policy Position (Nov ‘06) highlighted: • Key risk is attribute disclosure • Consideration of pre-tabular and post-tabular methods • Small cell counts can be included in tables provided uncertainty about the true value is created • Different access agreements for tabular outputs that are seriously compromised by SDC • Tolerable threshold not yet determined, but steer towards less conservative approach
Progress • Development of SDC Strategy • UK SDC working group established to take forward methodological work • UKCDMAC subgroup set up to QA work • Initial stage of methodological research: • Review of SDC in census context (May ’07) • Qualitative evaluation of SDC methods for 2011 Census outputs • Focus on tabular outputs whilst considering impact on other outputs
Progress • UK SDC working group met in August • Produced short-list of SDC methods • SDC methods assessed against criteria in line with Registrars General policy statement • Formal QA and sign-off of criteria and short-listed SDC methods • Short-listed methods will undergo thorough quantitative evaluation and should maximise data utility whilst minimising disclosure risk
Short-listing: Criteria • Method should: • prevent new information being derived • prevent disclosure by differencing and enable flexible table generation • Could use special access arrangements if disclosure control seriously comprises some tabular outputs • Table design methods applied alongside chosen method
Short-listing: Criteria • Trade off between risk and utility needs to be evaluated quantitatively • Many potential SDC methods which could be used but not possible to conduct quantitative evaluation of each method • Need to consider qualitative aspect using high-level review of advantages and disadvantages of SDC methods • Qualitative and subsequent quantitative evaluations used in combination to establish recommended SDC method(s) for 2011 Census
Short-listing: Criteria • Each method assessed against a set of 7 qualitative criteria (primary and secondary): • Primary criteria • Additivity and consistency • Overall user acceptability • Protection against differencing • Feasibility of implementation • Secondary criteria • Impact on microdata releases • Simple to understand • Easy to account for in analyses
Short-listing: Scoring • Following methods considered for short-listing: • Record Swapping • Over-Imputation • Data Switching • Post Randomisation Method (PRAM) • Sampling • Conventional Rounding • Random Rounding • Small Cell adjustment • Controlled Rounding • Semi-Controlled Rounding • Suppression • Barnardisation • ABS Cell Perturbation Method
Short-listing: Scoring • For each criteria, method assigned score: • 0 = method not meet criteria • 1 = method partly meets criteria • 2 = method does meet criteria • Primary criteria given double weighting • Overall score and ranking assigned to each method • Methods failing on primary criteria were discounted
Short-listing: Scoring • Majority of SDC methods failed primary criteria and were discounted from short-list. • For example: • PRAM - difficult to implement and not proven for Census data • Sampling – lowuser acceptance of weighted tables • Rounding – low user acceptance of rounding methods • Suppression – extremely difficult to implement to protect against differencing
Short-listed SDC Methods • Record swapping • Over-imputation • ABS Cell Perturbation method • Small cell adjustment with record swapping (to provide comparison with 2001)
Quantitative Evaluation • Examine how methods protect and manage risk and how they impact on data utility • Plan to use range of 2001 Census tables, varying parameters, different geographies • Information Loss software will be used to evaluate each short-listed method • Consideration will be given to other issues, e.g. comparisons over time, communal establishments, imputation rates
What do the methods do? The short-list Record Swapping ABS Cell Perturbation Over-imputation
Record Swapping - Summary • 2001 Random Record Swapping method: • % households swapped across OAs • Swap within LA to preserve marginal distributions at this level • Matches found using control variables • Age • Gender • Hard to Count Index (census enumeration) • Household Size • All non-geographic fields swapped • Random /Targeted
ABS Cell Perturbation - Summary • Developed by the Australian Bureau of Statistics • In use for their 2006 Census data • Based on random numbers assigned to each record • Then each table is adjusted independently in two stages: • (1) Adding perturbations to each cell • (2) Restoring additivity of whole table
ABS Cell Perturbation - Summary • Assign each microdata record a random number between 1 and m called an rkey • For each cell in a particular table: • Calculate the cell key according to a function of the rkeys • Using a look-up table, read off the perturbation to add where ckeys are the columns and original values are the rows of the lookup table • Perturbation added to original cell value • ABS additivity module not yet evaluated
Over-imputation - Summary • Involves randomly selecting a percentage of microdata records which then have certain variables erased. • Select donors matching on control variables and the erased variables are then imputed • Various approaches to over-imputation will be considered
Quantitative Evaluation • An example of how the quantitative evaluation will be carried out…. • Preliminary study comparing swapping and ABS cell perturbation using ideas developed by Natalie Shlomo (framework of balancing risk and utility)
Preliminary Evaluation: Tables used • 2001 UK Census Tables • EA: Southampton, Eastleigh, Test Valley (SJ)
Measuring Disclosure Risk • Main risk • small cells in tables • small cells in differenced tables • Disclosure Risk = proportion of records in the small cells that have not been perturbed
Measuring Information Loss Utility (information loss) measures compare statistical quality of original and protected tables • Measure distortion to internal cell distributions • Compare variance of cell counts • Measure impact on rank correlations
Summary • Ongoing progress made for 2011 Census • Thorough quantitative evaluation of short-list over next year, using 2001 method as benchmark • Important to strike balance between minimising disclosure risk and maximising data utility • Qualitative and quantitative evaluations used in combination to establish recommended approach to SDC for 2011 Census • User communication and consultation will take place throughout the work programme
Contact Details • Jane.Longhurst@ons.gov.uk • Caroline.Miller@ons.gov.uk • Caroline.Young@ons.gov.uk