Enhancing Census Accuracy: 2011 Survey Improvements Review

2011 CENSUS Coverage Assessment – What’s new? OWEN ABBOTT

AGENDA • Background • Coverage in the 2001 Census • 2011 Methodology overview • Key changes • Summary

WHAT IS THE PROBLEM? • Despite best efforts, census won’t count every household or person • It will also count some people twice • Users need robust census estimates - counts not enough • In 2001: • One Number Census (ONC) methodology was developed to measure undercount • estimated 1.5 million households missed • 3 million persons missed (most from the missing households but some from counted households) • Subsequent studies estimated a further 0.3 million missed • In 2011 we want to build on the ONC, as broadly it was successful

2001 CENSUS UNDERCOUNT BY AGE-SEX

RESPONSE RATES BY LOCAL AUTHORITY

Census Coverage Survey 2011 Census Matching Quality Assurance Estimation Adjustment COVERAGE ASSESSMENT PROCESS OVERVIEW

AREAS OF IMPROVEMENT • Elements of CCS Design • Estimation methodology • Measuring overcount • Adjustments for bias in DSE • Imputation • Motivated by: • lessons learnt from 2001 • 2011 Census design e.g. use of internet

THE CCS DESIGN • Similar to 2001 CCS: • 300,000 Households • Sample of small areas (postcodes) • 6 weeks after Census Day • Fieldwork almost identical • Improvements: • Designed at LA level, not for LA groups • Refined Hard to Count index (5 levels) using up to date data sources • Use Output Areas as PSUs • Select 3 postcodes per OA • Revised allocation of sample (using 2001 patterns)

THE CCS DESIGN (2) • What does this mean? • Each LA will have its own sample – at least 1 OA for each hard to count level • Sample is more skewed to LAs with ‘hardest to count’ populations (with an upper limit of 60 OAs) • More LAs will have estimates based on their own data • Especially in London and for big cities • HtC index will be ‘up to date’ • Most LAs will have 3 HtC levels • Most London areas only had one in 2001 • Looking at a 40%, 40%, 10%, 8%, 2% distribution

ESTIMATION • Obtained lots of data from 2001 to be able to explore whether improvements can be made • One key issue was whether we should group LAs by geography or by ‘type’ • Improvements: • Confirmed that using DSE at OA level is sensible • Confirmed that we should group LAs by geography • Use simple Ratio estimator • Confirmed that LA estimation method is still best

ESTIMATION (2) • What does this mean? • The estimation methodology is much the same as it was • Should be slightly easier to explain • We will group LAs that don’t have enough sample with their neighbours until that group has enough sample • More LAs will have enough sample to produce direct estimates

OVERCOUNT • In 2001, estimated around 0.4% overcount (duplication) • No adjustments made • Not integrated into methodology • For 2011, expecting overcount to be higher • More complex population • Use of internet in 2011 Census • Strategy is to: • A) identify and remove obvious cases (multiple response resolution) • B) measure and make net adjustments on the remainder • i.e. for the latter we are NOT removing duplicates

OVERCOUNT (2) • Methodology: • Select targeted samples of census records • Second residences • Students • Children • Very large sample (~600,000k records) • Automatic matching algorithm to identify duplicates • Clerical checking of matches • expect to see ~13,000 duplicates • Also use the LS to QA the estimates • Estimation of duplication rates by GOR and characteristics • estimating which is the correct record • Why not do whole database and remove them? • High risk of making false positives and thus removing too many!

OVERCOUNT (3) • What does this mean? • Population estimates will be reduced where there is overcount • We will be able to say how much adjustment was made due to overcount • The duplicates will still be in the data, we just won’t impute as much for undercount

DSE BIAS ADJUSTMENTS • Assumptions underpinning DSE: • Homogeneity • Independence • Accurate Matching • Closure • DSEs usually have some bias, mostly due to failure of homogeneity assumption • In 2001 Census we made a ‘dependence’ adjustment • This showed that we need to have a strategy for measuring this

DSE BIAS ADJUSTMENTS (2) • Mitigate as much as possible: • Post-stratify DSE so heterogeneity is minimised • Independence in CCS field processes • Design Matching to get accuracy • Collect CCS on same basis as Census • Measure remaining bias • Specific adjustments – e.g. Movers, Overcount • Residual biases global adjustment • Improved adjustment using Census address register • Looking at improving age-sex distribution

DSE BIAS ADJUSTMENTS (3) • What does this mean? • We will be making adjustments to the estimates based on plausible external data • Household counts • Sex ratios • This will be part of the methodology • Also can be used if QA determines estimates are implausible

COVERAGE ADJUSTMENT • Imputation methodology had problems converging • Sometimes resulted in poor quality results • Improvements: • Model characteristics at higher geographies • Allows more details to be modelled • Some additional topics in the CCS included in models: • Migration variable (internal, international) • Country of birth (UK and non-UK) • Non-controlled variables imputed by CANCEIS • What does this mean? • Better Imputation quality • Characteristics of imputed improved

SUMMARY • Coverage assessment is an integral part of the 2011 Census • It will again define the key census outputs (estimates at LA level by age and sex) and adjust the database • We learnt a lot of lessons in 2001 and have been working to address them

Questions? owen.abbott@ons.gov.uk

Enhancing Census Accuracy: 2011 Survey Improvements Review