1 / 15

Access routes to 2001 UK Census Microdata: Issues and Solutions

Access routes to 2001 UK Census Microdata: Issues and Solutions. Jo Wathan SARs support Unit, CCSR University of Manchester, UK Jo.wathan@manchester.ac.uk. UK Census context. Traditional 10 yearly census at present Medium length form (c. 30 person questions, c. 10 household questions)

stacie
Download Presentation

Access routes to 2001 UK Census Microdata: Issues and Solutions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Access routes to 2001 UK Census Microdata: Issues and Solutions Jo Wathan SARs support Unit, CCSR University of Manchester, UK Jo.wathan@manchester.ac.uk

  2. UK Census context • Traditional 10 yearly census at present • Medium length form (c. 30 person questions, c. 10 household questions) • Ethnicity + optional religion question • No income question • Legal framework in GB is Census Act 1920 • No statistics Act • Legislation only deals with confidentiality restrictions – up to 2years imprisonment!

  3. 1991 SARs • Samples of Anonymised Records (SARs) from 1991 were first to be released • Highly successful. c. 400 research papers used the data between 1993 & 2002. Also used in teaching. • SARs are a commissioned output, paid for by UK Economic and Social Research Council. • SARs support unit at CCSR represent client, disseminate and support the data.

  4. Disclosure Control 1991 After work had been undertaken to demonstrate low risk of disclosure • Users had to register to use them • some ‘broadbanding’ or grouping of rare categories • Very large household had individual detail suppressed (12+ residents) • 2 non-overlapping files for different interest groups: • One for geographers • One for sociologists/demographers

  5. What did the 91 SARs look like? Household SAR Hhd hierarchy 1% (c. 0.6M cases) Regional Individual year of age 10 ethnicity categories 358 categories of occupation Individual SAR Individual level file 2% (c. 1.2M cases) Geography population threshold 120k = 278 SAR areas Individual year of age 10 ethnicity categories 73 categories of occupation

  6. Request for 2001 SARs • New work on disclosure control showed that we had previously overestimated the risk of disclosure • Requested larger sample size • Slightly more geography • A 3rd SAR for small areas • However new stricter interpretation of degree of disclosure risk required • Initial level of detail available would not provide files of sufficient use for research

  7. Why? • Census Office concerns: • Perceived increased levels of concern amongst respondents • Increased data processing power • Increased levels of storage of personal information that might be used to match to the data • Major strategic review of data stewardship issues at the time that Census outputs due for release

  8. Principles • Ongoing need for user consultation • Recognise different users require different levels of detail (and may be able to accept different conditions) – trading detail/access against each other • Trading different types of detail against each other: geog against socio/demographic etc. • Flexible approach to combining a range of access and disclosure approaches: • Safe Data • Safe Users • Safe Setting • International role models were very helpful

  9. Where we are now • Have succeeded in obtaining access to • End User License- Safe Data2 Datasets which are accessible in the same way as in 1991: less detail on some variables, but with enough detail for research purposes • Special License – Safe Users1 Dataset available for distribution but with extra access conditions • Controlled Access Microdata- Safe SettingMuch more detailed versions of 2 datasets available in a safe setting

  10. Safe Data: End User License Files • Standard online application procedure for those with electronic signature (otherwise equivalent paper system)Not public data! • Available for very low risk files • Risk reduced by • Broadbanding (e.g. age, geography) • Perturbing data

  11. EUL Files Individual SAR Individual level file 3% (c. 1.8M cases) Regional (13 categories Ages 16-74 banded 16 categories of ethnicity 81 categories of occupation Small area microdata Individual level file 5% (c. 3 M cases) Local authority geography (< 90k) 13 Age bands (c. 10 years) 13 categories of ethnicity Only broad social class variable (economic activity 3 groups)

  12. Safe Users: The 2001 S-L Household SAR • Additional Complexity of a household SAR required special license No geography at all & not available for Northern Ireland or Scotland Age in 2-year bands of 16 categories of ethnicity 81 categories of occupation

  13. Safe setting • To compensate for loss of detail in the end user and special license files • Same records as Individual and Household SARs but with MUCH more detail • Managed by the Census offices • Access currently at only a handful of census office sites • Virtual microdata laboratory environment, outputs manually checked prior to release to user • Access only permitted if this is the only available data source, for work in keeping with the aims of the Census Office

  14. Controlled Access Microdata Individual CAM Individual level file 3% (c. 1.2M cases) Local authority – with context at lower level Individual year of age to 90+ 16 ethncity categories Over 200 categories of occupation Household CAM Hhd hierarchy 1% (c. 0.6M cases) Local authority – with context at lower level Individual year of age to 90+ 16 ethnicity categories Over 200 categories of occupation

  15. Conclusion • Have a range of research worthy datasets by treating different user groups differently • Traded off: • Safe data • Safe users • Safe setting • http://www.ccsr.ac.uk/sars

More Related