1 / 15

Using Targeted Perturbation of Microdata to Protect Against Intelligent Linkage

Using Targeted Perturbation of Microdata to Protect Against Intelligent Linkage. Mark Elliot, University of Manchester Mark.Elliot@manchester.ac.uk Cathie Marsh Centre for Census and Survey Research, University of Manchester . Overview. Linkage experiments using:

Download Presentation

Using Targeted Perturbation of Microdata to Protect Against Intelligent Linkage

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using Targeted Perturbation of Microdata to Protect Against Intelligent Linkage Mark Elliot, University of Manchester Mark.Elliot@manchester.ac.uk Cathie Marsh Centre for Census and Survey Research, University of Manchester 

  2. Overview • Linkage experiments using: • Individual microdata from the UK census • microdata from the UK Labour Force Survey (LFS). • Objective • to assess the disclosure risk impact of the statistical disclosure control methods on the used on the census microdata in 2001

  3. Data • Three datasets were used in this study: • The spring 2001 quarter of the standard Labour Force Survey (LFS). • The standard release version of the 2001 individual level SAR (post-SDC SAR). • The pre-SDC version of the 2001 individual level SAR (Pre-SDC SAR).

  4. Background • The 2001 SARS were subject to extensive statistical disclosure risk assessment and targeted control methods. The risk assessment was carried through collaboration of Manchester University and ONS using a variety of approaches. • The disclosure control was a mixture of global recoding and local suppression and reimputation based on a variant of the PRAM (post randomisation) method.

  5. Procedure 1) The variables were selected and then the codings of these variables on the different datasets were harmonised. 2) The matching was conducted. 3) The SUDA Software was run over the SARs to obtain DIS-SUDA scores for the matches. 4) All the unique matches (one-to-one) were sent to ONS. 5) The matches were verified by ONS Census and LFS divisions 6) The matches were returned with an indicator placed on the match file indicating whether the match was true or false. 7) The proportion of correct matches was generated under several different assumptions.

  6. Variables • Age (95 categories for the pre SDC SARS-LFS match, 44 categories for the post-SDC SARs-LFS match) • Sex (2 Categories) • Marital Status (5 Categories) • Region of residence. (11 Categories) • Number of Residents (7 categories) • Primary economic status (9 categories) • Country of birth (14 Categories) • Ethnic group (15 categories) • Tenure (5 categories)

  7. Matching • In principle, we could have used fuzzy matching methods to allow for data divergence • However, the number of direct one to one matches was very large on both files and therefore this was deemed to cause an unnecessary administrative burden at the match verification stage. • Therefore, a simple combine and sort algorithm was used for the matching.

  8. Matching • 6085 one to one matches between the pre-SDC SAR and the LFS • 3130 one to one matches between the released SAR and the LFS. • matches sent to ONS for verification.

  9. Verification problem • a significant number of matches there was no address linkable to the LFS identifying variables. • This affected 1602 matches (26.32%) against the pre-SDC SARS file and 895 matches (28.95%) against the post-SDC SARS file. • However, no strong relationships with match key variables.

  10. Results • 2.74% correct match rate with PRE-SDC SARS • 1.63% correct match rate with PRE-SDC SARS

  11. Pre SDC-SAR

  12. Pre SDC-SAR

  13. Post SDC-SAR

  14. Post SDC-SAR

  15. Concluding Remarks • Study provides evidence that disclosure control method use in the 2001 SARS provided protection against targeted intrusion. Caveats: • Data divergence, coverage, secondary attacks • Alternative method for identifying risky records

More Related