310 likes | 503 Views
Measuring Coverage: Post Enumeration Surveys Owen Abbott Office for National Statistics, UK. Agenda. Introduction Why have a PES? Essential features of a PES Survey Design Fieldwork Analysing the data Matching Estimation Results from 2001 UK Census Discussion. Why do we need a PES?.
E N D
Measuring Coverage:Post Enumeration SurveysOwen AbbottOffice for National Statistics, UK
Agenda • Introduction • Why have a PES? • Essential features of a PES • Survey Design • Fieldwork • Analysing the data • Matching • Estimation • Results from 2001 UK Census • Discussion
Why do we need a PES? • Census won’t count every household or person • Undercount causes bias in estimates • In the UK in 2001, we estimated that 3 million persons (6%) did not fill in the form • Increasing problem from 1981 to 1991 to 2001 • The undercount is not evenly spread • Inner Cities • Deprived areas • Young persons
Why do we need a PES? • Census counts alone not good enough • UK Users demand robust census population estimates • Central Government resource allocation • Yearly demographic population estimates • Government Policy • So we need to measure how many households and persons the census misses, and work out: • where they are missed from • their characteristics
Basic Methodology • PES - Census Coverage Survey (CCS) in UK • In the UK approx 1% population • Match the PES to the Census • Use the people the PES sees that the census didn’t to estimate how many missed • where and characteristics • Add to the Census counts (either at aggregate level or impute (UK))
Post Enumeration Survey Key features: A - Design • Sample survey • Sample size dependent on accuracy (and geographic level) requirements B - Fieldwork • Conducted after the census has finished • Independent re-enumeration • Area based • Door to door interview • Focused on measuring coverage
Post Enumeration Survey - Design • Multi-stage Stratified sample • Select a sample of (small) geographical areas that can be re-enumerated • UK uses Postcodes (about 20 hhs) • US uses blocks (about ????100 hhs) • Sample stratified by: • Geography • Area type • Demography
2001 UK PES Design Geographical Strata: • Local Authorities (mean pop 120k) grouped into contiguous groups called Estimation areas (EAs), each having 500k pop Area Type and Demographic strata: • Within every EA a sample of 1991 Enumeration Districts was selected, stratified using a hard-to-count index and the 1991 age-sex structure • (1991 EDs have about 200 households)
2001 UK PES Design • Hard to count index was a national stratification using a combination of variables associated with undercount e.g: • Unemployed • Multi-occupied • Private rented • Language difficulty • 3 level index, split into 40%, 40%, 20% nationally • Within each selected ED a sample of 3, 4 or 5 postcodes was selected
Post Enumeration Survey - Field • Aim: enumerate all the people and households in the sampled areas • Carry out the survey after the Census • Census fieldwork finished • Independence critical (see later) • Interview based • Independent re-enumeration • Separate fieldforce and management • No address list (UK have address list for Census) • Difficult if doing quality at same time, as not independent
Post Enumeration Survey - Field • In UK, focused on measuring coverage • Previously measured quality as well • Found that separate surveys more effective • Can focus on getting maximal response in sampled areas • UK 2001 PES used very short interview • key household and demographic questions only • Accommodation type • Tenure • Name • Gender • Date of Birth (or Age) • Student • Ethnicity • Activity last week
Post Enumeration Survey - Field • Other initiatives to maximise response: • Pairwork and teamwork • Refusal avoidance training • Calling strategy • Up to 10 attempts to interview • Last attempt deliver form to return in post
Post Enumeration Survey • Interviewer Duties: • Establish the postcode boundaries • Conduct independent listing of all residential and non-residential addresses • Seek out obscure accommodation • Deliver advance notification cards • Identify/probe for all households at an address • Make contact with householders • Conduct doorstep interviews • Persuade potential refusals • Report Progress
Post Enumeration Survey • Map
Post Enumeration Survey • Property Listing
Analysing the data - Matching • Match Census returns to CCS returns • Require very high quality • Minimise false negative matches (missed matches, see later) • In 2001, we used hierarchical nature of data to help match • Match within sampled areas (geographical blocking) • First match household • Then match persons within households
Analysing the data - Matching • Used a five stage strategy, designed to minimise false negative matches: • Exact matching • High probability matching • Clerical assisted probability matching • Clerical matching • Final expert review of non-matches • Developed our own in-house system • Allowed access to scanned form images (this was crucial)
PO155RR PO155RR 29 29 ERIC SMITH 13 MALE SINGLE ERIC SMITH 13 MALE SINGLE
Analysing the data - Matching • Output: • Match between Census and CCS • Census only • CCS only
Analysing the data – Estimation • Dual System Estimation (DSE) • Capture-recapture as used for wildlife • Simple example: How many fish in a lake? • Catch as many as possible on day 1 • Count them (N1) • Mark with a red dot • Return them to the lake • Catch as many as possible on day 2 • Count them (N2) • Count how many have red dots (N12) • Number of fish in lake= (N1 * N2)/N12
Analysing the data - Estimation • Use matched Census+CCS data • DSE estimates adjustment for those missed in both Census and CCS Counted By CCS Yes No Counted Yes n11 n10 n1+ By Census No n01n00n0+ n+1n+0n++ DSE count (for a postcode): n++ = n1+ x n+1 n11
Analysing the data - Estimation • DSE assumptions • Independence • Homogeneity of capture probabilities • Perfect matching • Closure • No list inflation • Violation of these assumptions leads to bias (in both directions) • Lots of literature on DSE
Analysing the data – Estimation • DSE can only be used within the sample • Need additional step to get to population totals • In 2001, we used DSE at postcode level • Then used a ratio estimator to predict for non-sampled postcodes (again lots of literature)
Analysis – Getting to small areas • Ratio estimator produced estimates for 500k population blocks • Needed estimates for Local Authorities (about 120k population) • Sample size not sufficient to do directly • So used small area estimation techniques • these borrow strength across areas • We used a fixed effect to model LA differences • LA population estimates from the model then constrained to EA totals
Quick summary of 2001 UK method • In 2001, One Number Census methodology was developed • Large CCS (320,000 households) • Matching • Capture Recapture • Modified ratio estimator • Small area estimation to get LA totals • Imputation • Estimated 1.5 million households missed • 3 million persons missed (most from the missing households but some from counted households)
Results • England and Wales population about 50m individuals in 20m households • Estimated 1.5 million households missed • 3 million persons missed (most from the missing households but some from counted households)
Summary • Fundamental that the census is good • This does not make a bad census good, it makes a good census better! • US, Australia, NZ, Canada, UK all measure coverage (and most use a PES) • All aim at measuring coverage for assessing census quality, most do not fully adjust the outputs • Coverage for most is around 96-98% • Increasing problems of overcoverage • The design and fieldwork of the PES are important to get right
More info • Brown, J.J., Diamond, I.D., Chambers, R.L., Buckner, L.J., and Teague, A.D. (1999), “A methodological strategy for a one-number census in the UK,” Journal of the Royal Statistical Society A, 162, 247-267. • www.statistics.gov.uk/census2001/onc.asp • owen.abbott@ons.gov.uk