Small Area Models for Unemployment Rate Estimation at Sub-Provincial Areas in Italy

Small Area Models for Unemployment Rate Estimationat Sub-Provincial Areas in Italy D’Aló M., Di Consiglio L., Falorsi S., Solari F. ~ Istat Pratesi M., Salvati N. ~ University of Pisa Ranalli M.G. ~ University of Perugia

OUTLINE • Italian Labour Force Survey • Standard and current small area estimators • Enhanced Small area estimators • Experimental study • Analysis of results • Final remarks

Labour Force Survey description • Labour Force Survey (LFS) is a quarterly two stage survey with partial overlap of sampling units according to a rotation scheme of type (2-2-2). • In each province the municipalities are classified as Self-Representing Areas (SRAs) and the Non Self-Representing Areas (NSRAs). • From each SRAs a sample of households is selected. • In NSRAs the sample is based on a stratified two stage sampling design. The municipalities are the primary sampling units (PSUs), while the households are the Secondary Sampling Units (SSUs). • For each quarterly sample about 1350 municipalities and 200,000 individuals are involved.

Small area estimation on LFS • Since 2000, ISTAT disseminates yearly LFS estimates of employed and unemployed counts related to the 784 Local Labour Market Areas (LLMAs). • LLMAs are unplanned domains obtained as clusters of municipalities cutting acrossprovinces which are the LFS finest planned domains. • The direct estimates are unstable due to very small LLMA sample sizes (more than 100 LLMAs have zero sample size). SAE methods are necessary. • Until 2003, a design based composite type estimator was adopted. • Starting from 2004, after the redesign of LFS sampling strategy, a unit-level EBLUP estimator with spatially autocorrelated random area effects has been introduced.

Standard small area estimators – design based Direct and GREG estimator • The direct estimator is given by • The GREG estimator is based on the standard linear model: and can be expressed as an adjustment of the direct estimator for differences between the sample and population area means of covariates

Standard small area estimators – model based Unit level EBLUP • The EBLUP assumes a standard linear mixed model with unit-specific auxiliary variables, random area-specific effects and errors independently normally distributed and is given by where

Standard small area estimators – model based Unit level Synthetic estimator Two Synthetic estimators have been considered. • The first assumes a standard linear model as in GREG • The second a linear mixed model with unit-specific auxiliary variables as for the EBLUP In both cases it is given by

Enhanced small area estimators 1. Unit level EBLUP with spatial correlation of area effects • The SEBLUP estimator is based on the following unit level linear mixed model: • The matrix A depends on the distances among the areas and on an unknown • parameter connected to the spatial correlation coefficient among the areas.

Enhanced small area estimators 2. Model Based Direct Estimator (Chambers & Chandra, 2006) • The MBDE estimator is based on a unit level linear mixed model and is given by where the weights are such that is the (E)BLUP of under the model (Royall, 1976). • Calibrated with respect to the total of x. • Reduces bias vs EBLUP • Does not allow estimation for non-sampled areas • Less efficient than EBLUP

Enhanced small area estimators 3. Nonparametric EBLUP (Opsomer et al., 2008) In the literature there are many nonparametric regression methods (kernel, local polynomial, wavelets…) BUT difficult to incorporate in a Small area model Methods based on penalized splines(Eilers e Marx, 1996; Ruppert et al., 2003) can be estimated by means of mixed models -> promising candidate for SAE methods • Great Flexibility in definition of model • Estimable with existing software using REML • Hard to estimate efficiency and test for terms significance (via bootstrap?)

Enhanced small area estimators 4. Logistic models The estimator of the mean value has the following form: The estimation of the probability of being unemployed has been obtained considering different models – both fixed effect models and mixed effect models

LFS empirical study The simulation study on LFS has been carried out to estimate the unemployment rate at LLMA level • 500 two-stage LFS sample have been drawn from 2001 census data set. • The performances of the methods have been evaluated for the estimation of the unemployment rate in the 127 LLMAs belonging to the geographical area “Center of Italy ”. • GREG, Synthetic, EBLUP have been applied considering two different sets of auxiliary variables LFS = real covariates - sex by 14 age classes + employment indicator at previous census; LFS+C = real covariates + geographic coordinates(latitude and longitude of the municipality the sampling unit belongs to).

Enhanced Small area estimators • SEBLUP:Aspatial correlation in the variance matrix of the random effects has been considered in LFS model • MBDE:Model based direct estimation is performed on sampled LLMAs, while synthetic estimators based on unit level linear mixed model is considered for non sampled LLMAs (LFS covariates) • Nonparametric EBLUP:twosemiparametric representations based on penalized splines have been applied (fitted as additional random effects): • geographical coordinates of the municipality (EBLUP-LFS+SplineC): this allows for a finer representation of the spatial component vs SEBLUP-LFS (at municipality level instead of LLMA). • age (EBLUP-SplineA & SEBLUP-SplineA) • LOGIT:Logistic models are implemented with LFS covariates in both a fixed and mixed effects model and with a Spline for age in a fixed effects model.

Evaluation Criteria • % Relative Bias: • % Relative Root Mean Squared Error: Average Absolute RB: Average RRMSE: Maximum Absolute RB: Maximum RRMSE:

Results: LFS=LFS covariates; LFS+C= LFS+ geog. coord. mun.

Analysis of results • The results show that the DIRECT estimator has the lowest AARB and the highest ARRMSE as expected. • With respect to the Direct, the two GREG estimators increase bias and decrease variance. • When geographical information is considered as fixed effect the estimators display better performances in terms of bias. • The purely model based SYNTH-LFS estimator displays worse performances than mixed model based synthetic estimators. • The EBLUP estimators show a larger bias but a much lower variance respect to the GREG estimators and a lower bias respect to the corresponding SYNTH-EB-LFS and SYNTH-EB-LFS+C. • The close performances between EBLUP and SYNTH-EB estimators highlight the importance of the introduction of random area effects in the model.

Analysis of results • The performance of the MBDE-LFS estimator is a compromise between GREG-LFS and EBLUP-LFS. • The estimators EBLUP-LFS+C, SEBLUP-LFS and EBLUP-LFS+SplineC, including the spatial information in different ways, display similar results in terms bias and MSE. EBLUP-LFS+C is to be preferred for its simplicity and because it seems to keep more under control the bias. • Similar outcomes are exhibited by EBLUP-LFS and EBLUP-SplineA, but the latter is more parsimonious. • LOGIT-LFS underperforms the corresponding SYNTH-LFS. • LOGIT-SplineA shows very similar results to that of LOGIT-LFS showing again that a simpler model than LFS should be detected. • With respect to the EBLUP-LFS estimator, the MLOGIT-LFS estimator is better in terms of bias, but performs poorly in terms of MSE

Final remarks • The model group is a small portion of Italy (center); hence the area specific effects are smaller than they could be if an overall model was considered for all the country: the introduction of geographical information should be analyzed considering a larger model level group • Sensitivity to smoothing parameters’ choice in the splines approach has to be investigated. • The introduction of the sampling weights should be considered to try to achieve benchmarking with direct estimates produced at regional level

Small Area Models for Unemployment Rate Estimation at Sub-Provincial Areas in Italy

Small Area Models for Unemployment Rate Estimation at Sub-Provincial Areas in Italy

Presentation Transcript

Semiparametric Mixed Models in Small Area Estimation

Small Area Estimation for Monitoring the MDGs at the Subnational Level

Calculating the unemployment rate

Calculating the unemployment rate

Unemployment Rate (Percent)

Spatial microsimulation: A method for small area level estimation

Unemployment rate

What is … small area estimation

ESSnet on Small Area Estimation

M w Estimation for Regional Seismic Events in the Friuli Area (NE Italy)

Session 1: Small area estimation introduction

Provincial Models in Gauteng

ESSnet on Small Area Estimation

Small Area Estimation of Public Safety Indicators in the Netherlands

Challenges in small area estimation of poverty indicators

Poverty Estimation in Small Areas

ESSnet on Small Area Estimation

Generalised Structure Preserving (GSPREE) Models in Small Area Estimation.

ESSnet on Small Area Estimation

Natural Rate of Unemployment