180 likes | 190 Views
Explore the complexity and methodologies of the 2011 UK Census data processing, including questionnaire topics, internet data collection, downstream processes, multiple response resolution, and disclosure control options.
E N D
The 2011 Census: Estimating the Population Alexa Courtney
Overview • Background • New topics on questionnaire • Internet Data Collection • Delivery of data • Overview of 2011 processing • Discuss “downstream” processes
2011 Census • 27th March 2011 • Census Rehearsal: 11th October 2009 • Most complex UK Census • More questions and topics than previous censuses • Range of delivery and completion options • Similar to 2001 – keep what worked • Some new/modified methodologies • Operational and statistical • Wide range of outputs
Questionnaire • New topics • Citizenship • Second address • National identity • Language • Full census returns from short-term migrants • In UK for 3 months or more • Identified through intention to stay question
Questionnaire completion • Internet completion being offered for first time • Internet Access Code provided on front of paper questionnaire • Offers opportunities to improve data quality and reduce respondent burden • Automatic routing • Validation rules • Use of radio buttons • No unnecessary changes from paper questionnaire to minimise modal bias • Advantages and disadvantages • Reduces amount of editing required • Increases possibility of multiple responses
Data delivery • Can be split into three groups • Questionnaires returned within 6 weeks of Census day • Majority of data • Fully processed across UK • Matched to CCS • Questionnaires returned within 10 weeks of Census day • Fully processed in England, Wales & Northern Ireland • Questionnaires returned more than 10 weeks after Census day • May be used in coverage adjustment
Removing false persons • Problem identified in 2001 Census • Records created in error • Pages crossed out • Dust on scanner • “Two of Five” rule • Name (from individual questions) or Date of Birth AND • One of: Name (from individual questions), Date of Birth, Sex, Marital Status, or Name (from household members table) • Important for data quality and matching
Multiple response resolution • Overcount • Several types of multiple response • Two questionnaires from same household • Two paper questionnaires • Paper and Internet • Person on same questionnaire twice (or more!) • Person on Household and Individual questionnaire • Person on Household and Internet questionnaire • Needs to be a quick process
Multiple response resolution • Duplicate households identified when receipted • Questionnaire tracking for England, Wales & Northern Ireland • Matched questionnaire IDs and address in Scotland • Resolved by matching people within household • Key variables: Name (or soundex), Date of Birth, Sex • If Age <30, name must match exactly • Minimise risk of matching twins • If no people match, two household records created • If any people match, questionnaires merged
Multiple response resolution • Merging questionnaires • “Most complete” response kept • Missing variables copied from duplicate record(s) • Priority given to individual questionnaires • Process for within postcode multiples • People completing neighbour’s questionnaire • Similar principles for resolution
Filter Rules • Based on 2001 Rules • Used to identify incorrect/unnecessary responses • Deterministic – based on other responses • Used to prepare data for main edit & imputation • e.g. Person aged <16, economically inactive (student) • e.g. Person employed, not looking for work
Edit and Imputation • Will use CANCEIS system • Resolves inconsistent data • Probabilistic • Programmed with all possible inconsistencies • Impute missing data • Based on complete records • Searches for similar donor • Ensures complete and consistent data
Output flags • Non-standard outputs possible for England & Wales • Use information on Second Residences • Population staying in UK 3-12 months identified • Exclusion from standard outputs • Production of specific outputs • England, Wales and Northern Ireland only • Considering including this population in coverage adjustment • Mark records now to enable easy production of these outputs
Census Coverage Survey 2011 Census Matching Quality Assurance Estimation Adjustment Coverage assessment process
Disclosure Control - Options • Necessary to protect confidentiality of respondents • Three options were short-listed: • Pre-tabular: • Record swapping (pre-tabular) • Small number of records swapped across areas • Adds uncertainty to “unique” records • Over-imputation (pre-tabular) • Some variables deleted and re-imputed • Post-tabular: • Invariant ABS Cell Perturbation (IACP) • Small counts can be altered • Two stage process to ensure “additivity”
Disclosure Control – Chosen Methodology • Pre-tabular method recommended • User preference for consistency between tables • IACP method rejected • Record swapping chosen instead of over-imputation • No persons or data items removed • Outputs at national level and high geographies unaffected
Outputs • Main base will be Usual Residents • All people living in UK for 12 months or more • Consistent across UK • First outputs – September 2012 • Other standard outputs by Spring 2013 • ONS producing non-standard outputs • e.g. Weekday population, Majority of time • Consultation to decide exactly what • Outputs on short-term migrant population • All people living in UK for 3-12 months • England, Wales and Northern Ireland