1 / 34

Beyond 2011 The future for population statistics? IMA Mathematics 2012 Pete Benton

Beyond 2011 The future for population statistics? IMA Mathematics 2012 Pete Benton Beyond 2011 Programme Director Office for National Statistics. Outline. Background to the Census The Beyond 2011 Programme Statistical options for the future Key mathematical challenges Timeframes

stesha
Download Presentation

Beyond 2011 The future for population statistics? IMA Mathematics 2012 Pete Benton

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Beyond 2011 The future for population statistics? IMA Mathematics 2012 Pete Benton Beyond 2011 Programme Director Office for National Statistics

  2. Outline • Background to the Census • The Beyond 2011 Programme • Statistical options for the future • Key mathematical challenges • Timeframes • Next steps

  3. The purpose of the census • The basis for national decision making: • Service planning • where to locate schools, hospitals, etc. • housing plans • transport • Resource allocation • health and local govt • £100bn each per year • Policy making and monitoring • Equality – age, sex, ethnicity, disability • Ageing population – pensions etc • Academic and social research

  4. Key Census outputs • Benchmark statistics on: • Population units: • people and housing • with key demographics (age, sex, ethnicity) • Population structures: • households, families • Population and housing attributes • For small areas and small population groups • With multivariate analysis • Consistent and comparable

  5. The 2011 Census • Very successful • - 94% response overall • - Over 90% across London overall • - Over 80% response in every Local Authority • Significant improvement in key Local Authorities • The result of extensive mathematical modelling • - Response targets to achieve required output quality • - Predicted initial response from key groups / areas • - Numbers of field staff required to reach final targets • - Daily live response rate modelling to support operational decisions

  6. The Beyond 2011 Programme • Why change? – Why look beyond 2011? • Rapidly changing society • Evolving user requirements • New opportunities – data sharing • Traditional census – costly and infrequent?? • UK Statistics Authority to Minister for Cabinet Office • “As a Board we have been concerned about the increasing costs and difficulties of traditional Census-taking. We have therefore already instructed the ONS to work urgently on the alternatives, with the intention that the 2011 Census will be the last of its kind.”

  7. Beyond 2011 : Statistical options Traditional Census (long form to everyone) • Rolling Census (over 5/10 year period) Census options • Short Form (everyone), Long form (Sample) • Short Form + Annual Survey (US model) • Aggregate analysis Administrative data options • (Intermediate) Sample linkage e.g. 1% of postcodes • 100% linkage to create ‘statistical population spine’ Survey option(s) • Address register + Survey

  8. Beyond 2011 – statistical options FRAME SOURCES DATA OUTPUTS ESTIMATION All National to Small Area Population Data Address Admin Source CENSUS Register Population estimates Admin Source Admin Source Adjusting for missing data and error ?? Coverage Assessment Quality measurement Population distribution provides weighting for attributes incl. under & over-coverage - by survey and admin data? Comm Source increasing later? Household structure etc Commercial sources? Adjusting for non response bias in survey (or sources) Socio demographic Attribute Data Household Attribute estimates Socio demographic Survey(s) Communal Longitudinal data Maintained national address gazetteer – provides frame for population data & surveys Interactional Analysis E.g. TTWA Surveys to fill gaps

  9. Potential data sources • Population data • NHS Patient Register • DWP/HMRC Customer Information System • Electoral roll (> 17 yrs) • School Census (5-16 yrs) • Higher Education Statistics Agency data (Students) • Birth and Death registrations • Socio-demographic sources • Surveys • DVLA? • Commercial sources? • Utilities? • TV licensing?

  10. DWP CIS population counts compared with ONS Mid Year population estimates

  11. Patient Register population counts compared with ONS Mid Year population estimates

  12. Electoral Roll population counts compared with ONS Mid Year population estimates

  13. Customer Information System UK Driving Licence School Census Coverage Of Main Administrative Sources Electoral Roll Patient Register Data Higher Education Students Missing includes: Migrants not (yet) registered Newborn babies Some private only patients Missing includes: Non higher education students Independent University students Missing includes: Non-drivers Under 17’s Some foreign-licence holders Missing includes: Some migrant worker dependants Some international students Undocumented asylum seekers Missing includes: Under 17s Ineligible voters Non responders Missing includes: Non school aged people Independent school children Home schooled children HESA DVLA DVLA ER CIS SC SC Resident Population PRD ER Extras includes: Some duplicates International students on short-term courses Students ceased studying, not formally deregistered Extras includes: Short-term migrant children Extras includes: Some duplicates Some ex-pats Some deceased Short-term migrants Extras includes: Some ex-pats Some deceased Extras includes: Multiple registrations Some ex-pats Some deceased Short-term migrants Extras includes: Some ex-pats Some deceased Short-term migrants PRD CIS

  14. Key risks of non census alternatives • Public opinion • Technical challenge • Changes in administrative datasets • UK harmonisation • Getting a decision

  15. Key mathematical challenges • Methods for Production of statistics • Coverage assessment and adjustment • Data matching • Correcting for missing data • Small area population attribute modelling • Methods for Protection of confidentiality • Data pre-processing and encryption • Statistical Disclosure Control • Evaluation • Quantifying financial benefits • Defining what is an ‘acceptable’ level of quality

  16. Coverage assessment • How many fish in your pond? • Day 1, catch 100, tag them, put them back • Day 2, catch 50, find 25 already tagged • How many fish in your pond? • Answer: 200 (ish) • According to day 2, half in the pond are marked • We marked 100, so there must be about 200 altogether • “Dual System Estimation”

  17. Application to the census • We ‘fish’ twice, in 1% of postcodes • Census • Then census coverage survey (CCS) 6 weeks later • No need for tags • They have names, addresses, dates of birth • We match the two separate lists of people (500k) to work out • What percentage of people in the CCS had first been ‘caught’ in the census • Thus, the total population in each postcode

  18. Coverage adjustment • Apply the adjustment factor to the other 99% of postcodes where we did no CCS • With appropriate stratification • Add ‘synthetic’ records • Extra households • Extra people • With the right key characteristics • In roughly the right locations • Using ‘Donor imputation’ to complete each record • So that all the final tables add up to the right number

  19. Dual system estimation - formulae • Counted By CCS? • Yes No TOTAL • Counted Yes n11 n10 n1+ • By Census? No n01n00n0+ • TOTAL n+1n+0n++ • Total population n++ = n1+ n+1 • n11 • We can make life very complicated for people who aren’t mathematicians!

  20. Application to administrative data • Administrative data sources also have undercount • But the bigger problems are due to time lags • - Emigration; deaths • Results in overcount in administrative sources • - Internal migration • Results in people recorded in the wrong location • - overcount in one area, undercount in another • Just applying Dual System Estimation would result in significant over-estimation

  21. Potential overcount estimation approaches (1) • Redesigned coverage survey asking: • who usually lives here? • when did you move in? • where are you registered to vote? • where are you registered with a GP? • who lived here before you? • where do they live now? • does John Smith still live here? • Increasing • sensitivity • Reducing • appropriateness • / legality

  22. Potential overcount estimation approaches (2) • Match new coverage survey to admin data • Measure coverage patterns, develop models • Intermediate model • Match records only in CS postcodes • Full linkage model • Match records in all sources across all postcodes • Keep records if same location on all datasets • => more likely to be correct • Particularly if recently recorded ‘activity’ • Develop intelligent rules to resolve residual records • Reduces scale of overcount - but increases undercount

  23. Small Area Estimation • Surveys only give sufficient precision at relatively high levels of geography • Users require information at lower levels • Census ‘output area’ ~ 125 households / 300 people • SAE - family of methods to increase precision of survey estimates at lower geographies • by “borrowing strength” from other, more detailed data sources, or neighbouring areas • Widely used by National Statistical Institutes • e.g. unemployment, income, households in poverty • - but generally univariate, estimating means

  24. Precision of direct survey outputs

  25. Potential components • (Very?) Large survey • Administrative sources • aggregate (area based) or unit record • available for lower geographic levels than survey outputs • Possible models • Generalised Linear Models (GLM): • multi-level models • spatial / temporal extensions can add power • Bayesian or frequentist estimation frameworks • Micro-simulation

  26. Small area modelling - issues • Quality of ancillary data is absolutely critical • Most existing applications use census covariates • More powerful models incorporate time and space effects, but are more complex • Every variable is different, and requires different models • There’s often no substitute for geography as a predictor • ‘similar people gather in similar areas’ • BUT clear academic view – the methods exist, it just depends on data

  27. TRADITIONAL CENSUS SOLUTION 2015 2016 2017 2018 2019 2020 2021 2022 2023 procure / develop detailed develop /test rehearse run outputs design 2011 2012 2013 2014 research / definition initiation BEYOND 2011 ‘Phase 1’ population population detailed procure / develop / characteristics estimates design develop test outputs 2015 2016 2017 2018 2019 2020 2021 2022 2023 ADMIN DATA SOLUTION Beyond 2011 - Timeline - the key decision Sept 2014 recommendation & decision point

  28. Beyond 2011 - Timeline (non census solution) 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 population population detailed procure / develop / research / definition initiation characteristics estimates design develop test outputs coverage surveys continuous assessment testing attribute surveys info from existing surveys – e.g. labour force survey, integrated household survey etc linkage increasing linkage over time modelling increasing modelling over time address register required on an ongoing basis – ideally the National Address Gazetteer – subject to confirmation of quality public sector & commercial ? admin sources developing over time supplemented by new targeted surveys as required test

  29. Beyond 2011 - and into the future 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 regular production of population and attribute estimates ongoing methodology refinement address register required on an ongoing basis and be added & develop over time administrative sources will change and disappear continuous coveragesurvey need for attributesurveys declines over time ? existing surveys increasing linkage over time increasing modelling over time

  30. 2013 2021 2031 accuracy of population estimates accuracy of characteristics estimates range of topics small area detail multivariate small area detail Improving quality & quantity experimental statistics develop to become national statistics

  31. Census Alternativemethod Benefit 2011 2021 2031 2041 Statistical benefit profile loss loss gain gain

  32. Census Cost ???Alternative method 2011 2021 2031 2041 Cost profile (real terms)

  33. Next steps • Research potential methods and models • Using census data • To understand coverage patterns in admin data • To simulate new survey designs • As a gold standard – how well can we replicate census results? • Assess quality, costs, benefits, risks • Discuss with stakeholders (!) • Public acceptability research • Report progress every six months • Make recommendations in 2014

  34. Advice and assistance very welcome!

More Related