150 likes | 302 Views
Access routes to 2001 UK Census Microdata: Issues and Solutions. Jo Wathan SARs support Unit, CCSR University of Manchester, UK Jo.wathan@manchester.ac.uk. UK Census context. Traditional 10 yearly census at present Medium length form (c. 30 person questions, c. 10 household questions)
E N D
Access routes to 2001 UK Census Microdata: Issues and Solutions Jo Wathan SARs support Unit, CCSR University of Manchester, UK Jo.wathan@manchester.ac.uk
UK Census context • Traditional 10 yearly census at present • Medium length form (c. 30 person questions, c. 10 household questions) • Ethnicity + optional religion question • No income question • Legal framework in GB is Census Act 1920 • No statistics Act • Legislation only deals with confidentiality restrictions – up to 2years imprisonment!
1991 SARs • Samples of Anonymised Records (SARs) from 1991 were first to be released • Highly successful. c. 400 research papers used the data between 1993 & 2002. Also used in teaching. • SARs are a commissioned output, paid for by UK Economic and Social Research Council. • SARs support unit at CCSR represent client, disseminate and support the data.
Disclosure Control 1991 After work had been undertaken to demonstrate low risk of disclosure • Users had to register to use them • some ‘broadbanding’ or grouping of rare categories • Very large household had individual detail suppressed (12+ residents) • 2 non-overlapping files for different interest groups: • One for geographers • One for sociologists/demographers
What did the 91 SARs look like? Household SAR Hhd hierarchy 1% (c. 0.6M cases) Regional Individual year of age 10 ethnicity categories 358 categories of occupation Individual SAR Individual level file 2% (c. 1.2M cases) Geography population threshold 120k = 278 SAR areas Individual year of age 10 ethnicity categories 73 categories of occupation
Request for 2001 SARs • New work on disclosure control showed that we had previously overestimated the risk of disclosure • Requested larger sample size • Slightly more geography • A 3rd SAR for small areas • However new stricter interpretation of degree of disclosure risk required • Initial level of detail available would not provide files of sufficient use for research
Why? • Census Office concerns: • Perceived increased levels of concern amongst respondents • Increased data processing power • Increased levels of storage of personal information that might be used to match to the data • Major strategic review of data stewardship issues at the time that Census outputs due for release
Principles • Ongoing need for user consultation • Recognise different users require different levels of detail (and may be able to accept different conditions) – trading detail/access against each other • Trading different types of detail against each other: geog against socio/demographic etc. • Flexible approach to combining a range of access and disclosure approaches: • Safe Data • Safe Users • Safe Setting • International role models were very helpful
Where we are now • Have succeeded in obtaining access to • End User License- Safe Data2 Datasets which are accessible in the same way as in 1991: less detail on some variables, but with enough detail for research purposes • Special License – Safe Users1 Dataset available for distribution but with extra access conditions • Controlled Access Microdata- Safe SettingMuch more detailed versions of 2 datasets available in a safe setting
Safe Data: End User License Files • Standard online application procedure for those with electronic signature (otherwise equivalent paper system)Not public data! • Available for very low risk files • Risk reduced by • Broadbanding (e.g. age, geography) • Perturbing data
EUL Files Individual SAR Individual level file 3% (c. 1.8M cases) Regional (13 categories Ages 16-74 banded 16 categories of ethnicity 81 categories of occupation Small area microdata Individual level file 5% (c. 3 M cases) Local authority geography (< 90k) 13 Age bands (c. 10 years) 13 categories of ethnicity Only broad social class variable (economic activity 3 groups)
Safe Users: The 2001 S-L Household SAR • Additional Complexity of a household SAR required special license No geography at all & not available for Northern Ireland or Scotland Age in 2-year bands of 16 categories of ethnicity 81 categories of occupation
Safe setting • To compensate for loss of detail in the end user and special license files • Same records as Individual and Household SARs but with MUCH more detail • Managed by the Census offices • Access currently at only a handful of census office sites • Virtual microdata laboratory environment, outputs manually checked prior to release to user • Access only permitted if this is the only available data source, for work in keeping with the aims of the Census Office
Controlled Access Microdata Individual CAM Individual level file 3% (c. 1.2M cases) Local authority – with context at lower level Individual year of age to 90+ 16 ethncity categories Over 200 categories of occupation Household CAM Hhd hierarchy 1% (c. 0.6M cases) Local authority – with context at lower level Individual year of age to 90+ 16 ethnicity categories Over 200 categories of occupation
Conclusion • Have a range of research worthy datasets by treating different user groups differently • Traded off: • Safe data • Safe users • Safe setting • http://www.ccsr.ac.uk/sars