250 likes | 375 Views
Challenges in small area estimation of poverty indicators. Risto Lehtonen, Ari Veijanen, Maria Valaste (University of Helsinki) , and Mikko Myrskylä ( Max Planck Institute for Demographic Research, Rostock). Ameli 2010 Conference, 25-26 February 2010, Vienna. Outline. Background
E N D
Challenges in small area estimation of poverty indicators Risto Lehtonen, Ari Veijanen, Maria Valaste (University of Helsinki) , andMikko Myrskylä (Max Planck Institute for Demographic Research, Rostock) Ameli 2010 Conference, 25-26 February 2010, Vienna
Outline • Background • Material and methods • Results • Discussion • References
EU/FP7 Project AMELI • Advanced Methodology for European Laeken Indicators (2008-2011) • The project is supported by European Commission funding from the Seventh Framework Programme for Research • DoW: The study will include research on data quality including • Measurement of quality • Treatment of outliers and nonresponse • Small area estimation • The measurement of development over time
Material and methods • Investigation of statistical properties (bias and accuracy) of estimators of selected Laeken indicators for population subgroups or domains and small areas • Method: Design-based Monte Carlo simulation experiments based on real data • Data: Statistical register data based on merging of administrative register data at the unit level (Finland)
Laeken indicators based on binary variables • At-risk-of poverty rate • Direct estimators • Horvitz-Thompson estimators HT • Indirect estimators • Model-assisted GREG and MC estimators • Model-based EBLUP and EB estimators • Modelling framework • Generalized linear mixed models GLMM • Lehtonen and Veijanen (2009) • Rao (2003), Jiang and Lahiri (2006)
Laeken indicators based on medians or quantiles • Indicators based on medians or quantiles of cumulative distribution function of the underlying continuous variable • Relative median at-risk-of poverty gap • Quintile share ratio (S20/S80 ratio) • Gini coefficient • Direct estimators DEFAULT • Synthetic estimators SYN • Expanded prediction SYN estimators EP-SYN • Composite estimators COMP • Simulation-based methods
Poverty gap for domains • Relative median at-risk-of poverty gap • Poverty gap in domain d describes the difference between the poor people's median income and the at-risk-of-poverty threshold t
Monte Carlo simulation • Fixed finite population of 1,000,000 persons • D = 70 domains of interest • Cross-classification of NUTS 3 with sex and age group (7x2x5) • Y-variables • Equivalized income (based on register data) • Binary indicator for persons in poverty • X-variables (binary or continuous variables) • house _owner (binary) • education_level (7 classes) and educ_thh • lfs_code (3 classes) and empmohh • socstrat (6 classes) • sex_class and age_class (5 age classes) • NUTS3
Sampling designs • SRSWOR sampling • Sample size n = 5,000 persons • Stratified SRSWOR • Sample size n = 5,000 persons • Stratification by education level of HH head • H = 7 strata • Unequal inclusion probabilities • Design weights vary between strata • Min: 185, Max: 783 • K = 1000 independent samples
Design bias Absolute relative bias ARB (%) Accuracy Relative root mean squared error RRMSE (%) Quality measures of estimators
Discussion: Poverty rate • Indirect design-based estimator MLGREG • Design unbiased • Large variance in small domains • Small variance in large domains • Indirect model-based estimator EB • Design biased • Small variance also in small domains • Accuracy: EB outperformed MLGREG • Might be the best choice at least for small domains unless it is important to avoid design bias
Discussion: Poverty gap • Direct estimator DEFAULT • Small design bias but large variance • Indirect model-based SYN • Very large bias but small variance • Indirect model-based EP-SYN based on expanded predictions • Much smaller bias and variance than in SYN • Composite (DEFAULT with EP-SYN) • Small domains: good compromise • Large domains: bias can still dominate the MSE