110 likes | 126 Views
This project focuses on developing statistical matching algorithms to integrate socio-economic data, improve data quality, and provide comprehensive and coherent socio-economic statistics. The project aims to identify suitable criteria for assessing validity and produce methodological guidelines for implementation.
E N D
Statistical Matching in the framework of the modernization of social statisticsAura Leulescu& Emilio Di MeglioEUROSTAT Unit F3 - Living conditions and social protection statistics
Key priorities in the EU context • to respond to cross-cutting and complex user needs by providing broad indicators on economic well-being and Quality of Life (Stiglitz Report, Europe 2020, GDP and beyond communication, OECD initiative on measuring well-being, etc.); • Demand for a comprehensive and coherent system of socio-economic statistics • to go beyond aggregates and capture heterogeneity in the population: multivariate distributions, sub-national statistics, vulnerable sub-groups; • Demand for micro-level statistical information that encompasses both social and economic aspects 2 2 2
Premises No single survey can provide all the necessary information No common identifiers allow record linkage at EU level Need for micro (meso)-level integrated statistical information from a coordinated network of surveys and data collection processes at EU level
Statistical matching? • High potential benefits: • Increased and better use of existing data at minimum costs, • Enhanced conceptual and statistical consistency across surveys, • Development of in house expertise in the domains of data matching transferable to other projects. • But also high risks: • Inherent limitations of statistical matching techniques and model-based imputation; • Need to consider both micro level data matching and meso-level data matching (small sub-populations could also be matched). 4
Matching project: 1) Scope This project should: • carry-out methodological work, identify and test statistical matching algorithms based on the “fitness for purpose” principle; • identify suitable criteria for assessing validity of findings based on both input quality and the robustness of the matching methods proposed; • produce methodological guidelines and recommendations for further implementation in Eurostat and/or MSs. 5 5
Matching project: 2) Investigation streams The project should assess the quality of the results and the relevance of the approach to cover specific needs: • Material well-being estimates based on wealth, consumption and income (matching of HFCS, HBS and SILC); • Quality of Life indicators that go beyond monetary resources (matching of SILC with LFS and EHIS and outside sources, such as ESS and EQLS); • Poverty estimates at regional level, linked to the monitoring of Europe 2020 (matching of data from SILC, EHIS and LFS). 6 6 6
Matching project: 3) Timeline • I phase: some preliminary analysis focused especially on setting the boundaries for the project • Dec 2010- July 2011 External contract for matching EU-SILC, ESS and EQLS • Dec 2010- April 2011 In-house matching exercise (review state of the art & preliminary analysis focused on the reconciliation datasets) • II phase • May 2011- Dec 2012 Follow upof the in-house exercise • May 2011 Launch call of tender (according to preliminary results of the three investigation streams) • November 2011 Signature contract(s) • December 2012 Recommendations for implementation 7
Matching project: 4) Organizational aspects The project is expected: to draw on both external contracts and the development of in-house expertise on matching techniques; to involve various stakeholders: concerned units in Eurostat, ECB, Eurofound, Commission users (DG EMPL, DG SANCO, DG REGIO) and academic experts; to develop synergies with ESS initiatives: Core social variables ESSnet on Data Integration ESSnet on Small Area Estimation
Matching exercise: ex-ante reconciliation 1 Main purpose: identify specific realistic objectives Identify target variables a) Income, consumption and wealth HFCS:value of assets and liabilities; EU-SILC: material deprivation, detailed income; HBS: food expenditure, leisure goods and services, transport expenditure; b) Quality of life indicators EQLS/ESS: social capital, quality of society, satisfaction variables LFS: job quality, training... SILC: standards of living c) Regional estimates Impute household disposable equivalized income in LFS
Matching exercise ex-ante reconciliation 2 Select matching/ stratification variables Predictive power (econometric models, correlations, multivariate analysis) Data quality Consistency of concepts and statistical content Deal with different weights from the various surveys Define the observation level Individual Household Sub-population What type of auxiliary information we can use to validate results? overlap samples (NL); (partial) overlap variables (income classes in EQLS; some material deprivation; food consumption in HFCS)
Matching exercise: methods and quality assessment - Preliminary ideas • Matching algorithms • Hot deck techniques, regression based, multiple imputation? • Deal with complex survey designs (constraints) • Create synthetic datasets versus estimate parameters (e.g. estimate frequencies by class of income & wealth); • How to assess quality/validity? • Checking the marginal and joint distributions of the donor/fused dataset; • Assess probability of good match (ex.: distribution distances donor-recipient) • Need to assess the sensitivity of the results to changes in assumptions: • Simulation exercises; auxiliary information; theoretical validation; • Some applications: SPSD Canada (Liu& Kovacevic, 1997), ISTAT (Coli et al, 2006) 11