1.04k likes | 1.18k Views
Large-scale Microdata workshop: An introduction to the SARs and ESDS Government Surveys. University of Plymouth 15 April 2005 Jo Wathan & Reza Afkhami. Today. SARs: Introduction to 2001 Individual Licensed SARs Hands-on: Accessing the SARs in Nesstar Lunch 12:30
E N D
Large-scale Microdata workshop:An introduction to the SARs and ESDS Government Surveys University of Plymouth 15 April 2005 Jo Wathan & Reza Afkhami
Today SARs: • Introduction to 2001 Individual Licensed SARs • Hands-on: Accessing the SARs in Nesstar • Lunch 12:30 • Working with the Individual Licensed SAR: Data quality and analysis issues • Hands-on: The SARs in SPSS • Coffee 14: 45 • Further SARs Issues – CAMs, Household data, SAMs files, User support ESDS Government Data End 16:00
Introduction to the 2001 Licensed Individual SAR Background to data development Licensing Accessing the data
Census Microdata • Census outputs have historically been aggregate tables – safe but inflexible • Can be obtained from: • ONS: http://www.statistics.gov.uk • Casweb: http://www.census.ac.uk/cdu • Well suited to analyses at small geographical detail • Microdata permits more flexibility • Longitudinal Survey links data from 1971 good for process but has to be securehttp://www.celsius.lshtm.ac.uk/ • Demand for a cross-sectional dataset that can be used on own desktop
The 1991 Samples of Anonymised Records • Available for the first time after research into the confidentiality risk • Two samples • Individual SARDetailed geog (large LAs)2% Sample • Household SARHierarchical, linked individuals- Detailed occupational information1% Sample
The Request for the 2001 Individual SAR • Request sent in autumn 2001 • Following consultation with users and confidentiality assessment, we asked for similar detail as 1991, e.g: • 16 categories of ethnic group (or national equivalent) • SOC 2000 minor (81 categories) • But with a 3% sample and more LADs • ONS greater concerns over confidentiality • ‘Controlled Access Microdata Sample’ more detailed available in safe setting
Safe Data • Subject to extensive disclosure control • Broad banding • Special uniques analysis • Further recodes • Less detail than 1991 on: • Geography • Industry/occupation • Age • Country of birth • Released October 2004
Second version of SARs • ONS reconsidered confidentiality of SARs • Current version of data is version 2: contains more detail than version 1 • Users must undertake to destroy version 1 before downloading version 2
Licensed file content - geographical • Regional Geography • GOR Region PLUS • Inner/Outer London • Northern Ireland • Scotland • Wales • Country of birth • 16 categories • Increased from version 1
Licensed file contents: demographic • Age banded v.2 • Individual year to 15 • 16-19; 20-24; 25-29; 30-44; • 45-59; 60-64; 65-69; • 70-74; 75-94 single years; 95+ • Ethnic group v.2 • 16 categories (E and W) • 14 Scotland • 2 N. Ireland
Licensed file content:Socio-economic • Occupation • 2000 SOC Minor categories • NS-SEC 38 valid categories • Industry • 15 categories A-O, P, Q • Hours of work – single hours to 80+
New or Improved Data • Improved highest qualification • 4 categories • Religion – varies considerably by nation v.2 • 9 categories in England and Wales • 7 in Scotland – current only • 7 in Northern Ireland, plus religion brought up in • General health • Good / fairly good / not good • Caring • Hours caring, 3 bands • Number of carers in household
Research value • Ability to recode variables as wished • Ability to select populations and variables • Ability to conduct multivariate analysis • Learning and Teaching • Preliminary work before using in-house file (CAMS)
The Licence • All users need to be licensed • Academics complete license as part of the Census Registration System Process • Non-academic users sign license as part of the data registration process • Cannot pass the data to an unlicensed user • Cannot attempt to identify an individual
The licence – good practice • Keep your data password protected • Destroy your data when you have finished using it • Remove SAR files before passing on your PC to someone else • Tell CCSR about your publications • Tell CCSR if you leave your institution
Access Arrangements • Data distributed by CCSR • Academics, no charge • Register for the data under Census Registration System • Access the data online from CCSR website • Non-academics • Not for profit £500 per file • Business users £1000 per file • 10 users per application, incl. software • Download End User License from web
Accessing the data • Non-academic users • Data available in NSDstat • Other formats available on CD • Can arrange direct download • Academic users • Direct download (SPSS/Stata/tab delimited) • Nesstar, explore online and subset (wider range of formats available) • NSDstat available
Working with the 2001 Licensed Individual SAR Coverage and quality SAR data issues Analysing SAR data Software
Census coverage • Major effort to improve coverage in 2001 • One Number Census • Use of large Census Coverage Survey to correct census results, 300K households • Design independent of census; • Used matched census and CCS data to estimate total population in each area, • adjusted all results for census non-response using imputation of households and individuals • Results in final database for UK adjusted for non-response
Census coverage • Coverage before imputation: • 94% households returned forms, with another 4% estimated to be in households identified by enumerators. • Response rate lowest for • Young people in their early 20s (men aged 20-24 resp. rate of 87%) • Inner London (resp rate of 78%) • Once imputed cases are included estimated to be 100% coverage
Population base • One population base: usual residents • differs from 1991 when user had to chose either present or usual resident base • Students enumerated at term time address • Communal establishments are included
Implications for 2001 SARs • 1991 SARs selected from 10% sample • Did not include imputed households • 96% coverage • 2001 SARs selected from 100% ONC database • 94% response; 6% imputed • Imputed individuals/hholds are identified • Imputed items are flagged
Two kinds of imputation • Entire individual or household may be imputed as part of ONC • Complete records copied from enumerated individuals/hhold • Variable oncperim • Variables imputed when information missing
Edit • 13.7 million edit procedures undertaken • 28% population had 1+ items imputed • Common: • Missing prof quals set to none • Carer set to no where missing (unless economic activity also missing) • Travel to work set to ‘work mainly at/from home’ where workplace was ‘mainly at/from home • Others • 14k people multi-ticked ‘sex’ (so imputed) • 6k children had marital status changed to single • impossible values set to missing then imputed • Missing values are imputed on the basis of similar local cases • does not remove unlikely values
Item imputation For census output database as a whole: • One or more items imputed for 28% of the population • Employment variables most affected: • Industry ever worked: 18% • Occupation ever worked: 14% • Workplace size: 9% • Under-enumerated groups are most imputed, esp. single people
Can I tell what/who has been imputed? • Oncperim records whether an individual has been imputed as part of the ONC • Copies entire record from census database • ‘z’ variables identify whether individual has imputed information on a specific variable • Parallel set of variables • zethew, zage0