240 likes | 393 Views
Matching of administrative data to validate the 2011 Census in England and Wales NRS & RSS Edinburgh , October 2012. AGENDA. Context: 2011 Census quality assurance and the role of administrative data Data matching challenges and solutions Data to be matched
E N D
Matching of administrative data to validate the 2011 Census in England and WalesNRS & RSS Edinburgh, October 2012
AGENDA • Context: 2011 Census quality assurance and the role of administrative data • Data matching challenges and solutions • Data to be matched • Matching methods and interpretation • Substantive results so far . . .
An overview of the methods Method Product Quality assurance DSE Bias adj Overcount 5 yr age/sex CCS areas Core checks 5 yr age/sex EA /LA level Ratio estimator Nat adj Supplementary analysis 1 yr age/sex OA level Coverage imputation QA Review and sign-off First Release Main QA Panel High Level QA Panel
Methods • Data cleaning, de-duplication, standardisation, quality analysis • Definitional alignment with Census enumeration base • Exact matching (dwelling: Address/ person: name, DoB, gender and postcode) • Score-based address matching • Probabilistic person matching • Clerical resolution of candidate pairs from automatch • Clerical search for unmatched residuals • Resolution of unmatched residuals against the Address Register History file and Census ‘associated addresses’ • Evidence-based assessment of residuals
Female students living in halls in April 2011 by NHS Authority acceptance date
Male students living in halls in April 2011 by NHS Authority acceptance date
LA summary: proportion of F4s and proportion unresolved, within CCS postcode clusters
Further investigations • Planned analysis of the PR residuals’ addresses and households to identify ‘ghost’ records • Longitudinal matching of the 2012 Patient Register to 2011 data to identify registrations that have been cancelled by GP practices in the year following Census • Cluster analysis of all E&W LAs to see whether the typology of LAs identified through matching is mirrored in list inflation patterns nationally • Multi-level modelling to summarise results, with individual and area level explanatory variables