190 likes | 208 Views
Variance estimation for Generalized Entropy and Atkinson inequality indices: the complex survey data case. Martin Biewen (Goethe University Frankfurt) Stephen Jenkins (University of Essex). Presentation at 4th German Stata User Group Meeting, Mannheim, 31 March 2006.
E N D
Variance estimation for Generalized Entropy and Atkinson inequality indices: the complex survey data case Martin Biewen (Goethe University Frankfurt) Stephen Jenkins (University of Essex) Presentation at 4th German Stata User Group Meeting, Mannheim, 31 March 2006
Inequality indices: measures of the dispersion of a distribution • Imposition of a small number of axioms substantially restricts functional form that indices may have • Axioms for • Anonymity • Scale invariance • Replication invariance • Normalization • Principle of Transfers: mean preserving spread in increases
Classes of inequality measures satisfying the axioms for • Generalized Entropy • Advantage: subgroup decomposability transfer sensitivity
Classes of inequality measures satisfying the axioms • Atkinson index • Advantage: welfare interpretation • Gini coefficient • Advantage: most well-known inequality index inequality aversion
Estimation of inequality indices • These indices are routinely calculated by many analysts … • The most commonly-used programs among Stata users are ineqdeco and inequal7 (available usingssc) • But only rarely do analysts report estimates of the associated sampling variances (or SEs) of the esti-mates!
Estimation of inequality indices • Analytical derivations to date have omitted some important situations (and indices) • Most derivations assume i.i.d. observations (cf. survey clustering or other sample dependencies!), and don‘t consider probability weighting (cf. strati-fication!) • The methods that do exist are not ‘well known’ • Lack of available software • But cf. geivars (Cowell (1989), linearization methods; i.i.d. assumptions) and ineqerr (bootstrap), both available using ssc
What we provide • Estimates of indices and associated sampling varian-ces for all members of the GE and Atkinson classes, while also … • Accounting for clustering and stratification, and for the i.i.d. case • Analytical results (see our paper) and new Stata programs (version 8.2): svygei and svyatk • Based on Taylor-series linearization methods com-bined with a result from Woodruff (JASA, 1971). • Results don‘t apply to Gini coefficient.
Overview of analytical derivation • Write estimator of each index as a function of popula-tion totals (involves sums over clusters, weights etc.) • (Taylor-series approximation) Variance of each esti-mator can be approximated by variance of 1st order ‘residual’ • As is, each expression is not easily calculated … • But (Woodruff): reversing order of summation in ‘residual’→ estimation is equivalent to derivation of a sampling variance of a total estimator for which one can apply standard svymethods
The programs: svygei and svyatk svygei varname [if exp] [in range] [,alpha(#) subpop(varname) level(#) svyatk varname [if exp] [in range] [,epsilon(#) subpop(varname) level(#) • Where, of course, the data have first been svyset. • How data are organised, and described using svyset is of crucial importance … Calculations for (use alpha(#) option to chose one other than ) Calculations for (use epsilon(#) option to chose one other than )
Survey data set-up for estimation of inequality among individuals 1) Observation unit is person; sampling unit is household; all persons in each household attributed with the equivalised income of the house-hold to which they belong; individual sample weight available (‘xwgt’) but no information about PSU or strata: 2) As 1), except also know PSU and strata information (includes allowance for within-household correlation): 3) Observation unit is household; sampling unit is household; weight (‘xhhwgt’)= household sample weight household size; no information about PSU or strata svyset [pw=xwgt], psu(hh_id) svyset [pw=xwgt], psu(PSU_id) strata(STRATA_id) svyset [pw=xhhwgt] → i.i.d. case
Illustration • German Socio-Economic Panel (GSOEP), wave 18 data (2001) used as a cross-section • 12,939 individuals in 5,195 households; 1004 PSUs (‘psu’), 169 strata (‘strata’) • Equivalized (‘square-root equivalence scale’) post-tax post-benefit household income (‘eq’) • Each individual attributed with the equivalised income of her household (→ ‘clustering’ within households) • Even if survey does not include PSU and strata identifiers, you should account for this (use house-hold identifier as PSU variable)
Generalized Entropy indices . ssc install svygei_svyatk . version 8.2 . svyset [pweight=xwgt], psu(psu) strata(strata) . svygei eq Complex survey estimates of Generalized Entropy inequality indices pweight: xwgt Number of obs = 12939 Strata: strata Number of strata = 169 PSU: psu Number of PSUs = 1004 Population size = 31487411 --------------------------------------------------------------------------- Index | Estimate Std. Err. z P>|z| [95% Conf. Interval] ---------+----------------------------------------------------------------- GE(-1) | .1179647 .00614786 19.19 0.000 .1059151 .1300143 MLD | .1020797 .00495919 20.58 0.000 .0923599 .1117996 Theil | .1027892 .0058706 17.51 0.000 .091283 .1142954 GE(2) | .1201693 .00962991 12.48 0.000 .101295 .1390436 GE(3) | .1713159 .02301064 7.45 0.000 .1262159 .2164159 ---------------------------------------------------------------------------
Atkinson indices . svyset [pweight=xwgt], psu(psu) strata(strata) . svyatk eq Complex survey estimates of Atkinson inequality indices pweight: xwgt Number of obs = 12939 Strata: strata Number of strata = 169 PSU: psu Number of PSUs = 1004 Population size = 31487411 --------------------------------------------------------------------------- Index | Estimate Std. Err. z P>|z| [95% Conf. Interval] ---------+----------------------------------------------------------------- A(0.5) | .0496963 .0025263 19.67 0.000 .0447448 .0546477 A(1) | .0970424 .00447794 21.67 0.000 .0882658 .105819 A(1.5) | .1434968 .00616915 23.26 0.000 .1314055 .1555881 A(2) | .1908923 .00804946 23.71 0.000 .1751157 .206669 A(2.5) | .2432834 .01237288 19.66 0.000 .219033 .2675338 ---------------------------------------------------------------------------
Subpopulation option . gen female = sex==2 . svygei eq, subpop(female) Complex survey estimates of Generalized Entropy inequality indices pweight: xwgt Number of obs = 12939 Strata: strata Number of strata = 169 PSU: psu Number of PSUs = 1004 Population size = 31487411 Subpop: female, subpop. size = 16499055 --------------------------------------------------------------------------- Index | Estimate Std. Err. z P>|z| [95% Conf. Interval] ---------+----------------------------------------------------------------- GE(-1) | .112828 .00573308 19.68 0.000 .1015914 .1240646 MLD | .0994741 .00471331 21.10 0.000 .0902362 .1087121 Theil | .0998958 .00543287 18.39 0.000 .0892476 .110544 GE(2) | .1151464 .00877057 13.13 0.000 .0979564 .1323364 GE(3) | .1596125 .02029283 7.87 0.000 .1198392 .1993857 ---------------------------------------------------------------------------
Empirical illustration in our paper • GSOEP income data for 2001 (same as used here) • British Household Panel Survey for 2001 (9,979 indi-viduals in 4,058 households; 250 PSUs, 75 strata) • Results: • Inequality larger in Britain than in Germany, for all indices, and difference is statistically significant • z-ratios (index SE) vary from 7.5 to 23.9 (DE) and 5.1 to 31.9 (GB), being smallest for top-sensi-tive indices and largest for middle-sensitive indices • Although sample larger in Germany, z-ratios are not always smaller (→ different sample designs)
Empirical illustration (ctd.) • Effects of different assumptions about survey design on sampling variance estimates? • For each index, the estimated standard error is larger if one accounts for survey clustering and stratification (unsurprising), but … • Results suggest that accounting for survey design features per se have little (additional) effect on variance estimates as long as the replication of incomes within multi-person households is ac-counted for
Conclusions • Researchers now have the means to estimate samp-ling variances for most of the inequality indices in common use, accomodating a range of potential assumptions about design effects Topics for future research: • GE indices are additively decomposable by popula-tion subgroup (→ineqdeco): extend results here to the components of decompositions • Extend results to Gini coefficient and other measures based on order-statistics (Lorenz curves etc.)
Selected references • Biewen, M. and Jenkins S.P. (2006): Estimation of Generalized Entropy and Atkinson indices from com-plex survey data, forthcoming in: Oxford Bulletin of Economics and Statistics • Cowell, F.A. (2000): Measurement of inequality, in A.B. Atkinson and F. Bourguignon (eds), Handbook of Income Distribution, Vol. 1, Elsevier, Amsterdam • Woodruff, R.S. (1971): A simple method for approxi-mating the variance of a complicated estimate, Jour-nal of the American Statistical Association, 66, 411-4