200 likes | 206 Views
This article discusses the Census Data Enhancement Project, the value gained from linkage, and issues to consider. It also explores the outcomes of linking census and administrative data, including health, education, and labor force outcomes.
E N D
Enhancing the usefulness of census data through linking census and administrative data Dr Paul JelfsAssistant StatisticianAustralian Bureau of Statistics
Scope • Census background • Census Data Enhancement Project • Results • Value gained from linkage • Issues to consider in linkage • Admin data and outcomes • Linkage outcomes to date
Census background • Census of Population and Housing - every 5 years,cross sectional view • Census data - valued by community, planners, researchers, media, govt • What is missing are transitions through time – i.e. a longitudinal view • Census is a large data collection enterprise that balances many demands to collect community data • some topics don't make e.g. transport, attitudes, behaviours • Linkage applied to census, survey and administrative data provide options • ABS doing pilot work to build a longitudinal census data set allow for linkage to other administrative data sets
Census data enhancement project • Initiated before the 2006 Census • options were put forward to stakeholders • retain the full Census and use it for data linkage purposes • PIA community were more comfortable with a sample approach • Establish a 5% sample Census file – intention linking Censuses 2006-2011 and beyond. • Pilot tests to establish a linkage approach using names and other identifiers vs non identifier matching approach • 5% Census linked with • Census dress rehearsal • Post enumeration survey • Deaths data • Labour force survey data
Census linkage standards • Gold Standard - name, address, MB and other variables • SilverStandard - hash-encoded name, MB and other variables • BronzeStandard - MB and other variables
Results of CDE • Use of names, date of birth and other socio-demographic techniques has a better match rate and level of specificity than using non-identified matching • That specific groups in the population are less represented in the non-identifier based matching approach than others • Unemployed, Youth, Australian born, employed in Agriculture, Indigenous • That post linkage techniques of weighting cannot compensate for poor original data linkage • Quality of pre-processing + the quality of the original data have a strong influence over the final linkage outcomes
Issues to consider in linking census and administrative data • Beyond the usual technical issues these were the key issues: • Validity of data • ensure data collected in appropriate manner and are good quality • Time frames and scope • ensure time frames of the data correspond • reasonable chance for the same population to be captured • Legislation and privacy • ensure linkage respects the privacy of individuals • community is made aware of the linkage activity • that there is a clear public benefit. • ensure linkage adheres to legislation, guidelines, protocols and ethics that govern the data
Value gained from linkage • Combining census and administrative data • Extension of census and administrative data • Collection of additional variables • Strengthening depth of variables • Potentially reducing further collections of data • Use combined data to answer policy questions • Use capacity to extend data collections • Improved validation and completion of data • Cross check on vital status • Cross check on key variables (including fill in) • Contrast variables (e.g. Indigenous status)
Admin data and outcomes • Health – linkage of hospital data to census data • determine risk patterns by socioeconomic status, indigenous status, movt towards major hospital establishments • Education – linkage of schools and preschool data to census data • identify transport patterns, income differentials for families by public and private school attendance, qualifications of preschool or school teachers in various locations • linkage to standardised numeracy and literacy tests assist in understanding the relationship of school performance and other local characteristics • Labour force – linkage of the census data with labour force survey • identify long term transitions through education, employment/unemployment and income levels contrasted with socioeconomic characteristics and geographical location
CDE Outcomes to date • Labour force survey, Census post enumeration survey, Dress rehearsal • Deaths-census data linkage study • Examine how well indigenous deaths are reported testing against census data • The context for this study is that indigenous: • deaths are under-reported differentially • people identify differently dependent on the data collection instrument • populations change their name between their indigenous title and aliases • that date of birth recording for indigenous persons is highly variable • Key findings were that neither the census nor the deaths data identified all indigenous persons independently • Approx. 60% overlap - remaining 40% drawn from either the census or deaths data • Consequences • analysing deaths data alone shows death rates significantly below the “truth” • the denominator population under-enumerated inflating death rates
The gain • The value of this analysis type of analysis... • level of under enumeration in deaths and population can now be reasonably estimated (maintenance over time and distributional issues not withstanding) • this understanding transferred to the mortality analysis • lessons learnt from this level of under-enumeration can then be transferred to the collection instruments and their proponents who may then seek to put in place more robust or specialised collection mechanisms.
Outputs • Base data files to facilitate research • Utilising the data effectively and communicating these data are critical • Data managed/delivered to researchers guided by legislation, protocols or guidelines to manage privacy or sensitivities or restrict the level of disaggregation • Manage the analysis in house alongside the data linkage • Make these files available through data laboratories • Analysis and reporting • appropriate skills and knowledge of the use of linked data is critical • reporting on these analyses is critical, placing findings in the public domain, using the information to inform policy and programs and to support evaluations is an important return on the investment in the data linkage program.
Conclusion • The use of Census and administrative data is fraught with challenges, however there are significant benefits of bringing these data sets together. • Not only are there technical benefits as described, there are organisational benefits of bringing representatives from various organisations that administer administrative data sets together with the central statistical agencies. • This stakeholder engagement has potentials beyond the analysis and results, but into influence around policy and program delivery.
Record Pairs Links Non-Links Matches Non-matches incorrect non-links =P(non-link|match) false links =P(link|non-match)
Census Data Enhancement • Formation of Statistical Longitudinal Census Data set • 5% 2006 Census linked to • 2011, 2016,... • augmented with 5% sample • intercensal births, immigrants • Statistical studies • approved projects involving linking SLCD and other data sets • Births and deaths • Long-term migrations • Disease registers
Quality Studies • Quality studies use the whole Census • with name and address during census processing period • without name and address at other times • Two types during the 2006 census processing period • Assess feasibility and quality of linking without names and addresses • Improve ABS outputs
Quality Studies for 2006 • Indigenous mortality • Linking deaths since Census and Census • Jeff Wright Thursday • Assessing automatic matching for Post Enumeration Survey 2011 • Linking 2006 PES and Census • Undercoverage in Labour Force Survey • Linking LFS August 2006 and Census • Conditions of entry and settlement outcomes for immigrants • Linking migrant settlements database and Census. • Simulation of SLCD formation