150 likes | 267 Views
Regression method (basic level) Jo z e Sambt NTA Hands-On Workshop Berkeley, CA January 1 4 , 2009. Why do we need a regression method?. Education and health care expenditures are usually reported at the household level, but in NTA context everything has to be assigned to individuals.
E N D
Regression method(basic level)Joze Sambt NTA Hands-On WorkshopBerkeley, CA January 14, 2009
Why do we need a regression method? Education and health care expenditures are usually reported at the household level, but in NTA context everything has to be assigned to individuals
The idea of the regression analysis For example: • The slope coefficient is about 0.50, suggesting that an increase in real income of 1 dollar is leading, on average, to an increase of about 50 cents in real consumption expenditure. • Constant: about 679 dollars is the level of autonomous consumption (in the case that person receives no income, i.e. if the value of the independent variable is 0).
Allocating health expenditure Private health expenditure of household j is regressed on the number of household members in each age group x To use broader age groups could be a good idea (because of degrees of freedom, small number of observations in some age groups). Don’t worry, your age profile will most likely not look like stairs because of that.
STATA code • Grouping into 5-year age groups: gen agegrp=age recode agegrp (0/4=2.5) (5/9=7.5) (10/14=12.5) (15/19=17.5) … (90/max=90) • Calculating the number of individuals in each age group (by households): by hhid: egen p4=sum(agegrp==2.5) by hhid: egen p9=sum(agegrp==7.5) by hhid: egen p14=sum(agegrp==12.5) by hhid: egen p19=sum(agegrp==17.5) by hhid: egen p24=sum(agegrp==22.5) … by hhid: egen p90=sum(agegrp==90)
Household health expenditures are regressed on the number of individuals in each age group within a household (without an intercept), reg cfhc p4 p9 p14 p19 p24 p29 p34 p39 p44 p49 p54 p59 p64 p69 p74 p79 p84 p89 p90 [w=weight], robust noconstant intercept supressed! … and coefficients are stored for a future use gen bp4=_b[p4] gen bp9=_b[p9] gen bp14=_b[p14] … gen bp90=_b[p90]
However, • summing up obtained values for all members of the household results in different amount of health expenditures than reported in the survey (at the household level). • Therefore: we need further adjustment whereby we use only relative size of those coefficients between household members, i.e. we consider them as within household shares (weights).
For example: Assume a household with three individuals: • child, aged 6 years • mother, aged 33 years • father, aged 36 years Let’s further assume that the obtained (from the regression) coefficient for the age group 5-9 years is 20, for the age group of 30-34 years it is 80, and for the age group of 35-39 years is 100. This would sum up to 200. However, in the survey household has reported 300 dollars for health expenditures. This means, we have to rescale those values, so they will match 300:
STATA code Coefficients are assigned corresponding age groups (coefficients by age groups are multiplied with the number of household members by age groups): gen hp4=bp4*(agegrp==2.5) gen hp9=bp9*(agegrp==7.5) gen hp14=bp14*(agegrp==12.5) … gen hp90=bp90*(agegrp==90) egen sum=rsum(hp4 hp9 hp14 hp19 hp24 hp29 hp34 hp39 hp44 hp49 hp54 hp59 hp64 hp69 hp74 hp79 hp84 hp89 hp90) Sum of weights by households are calculated (i.e. household estimated expenditures on health) : by hhid:egen total=sum(sum)
Relative shares of weights (of individuals) in total sum of household weights are used to distribute reported health expenditures (CFHj) among household members
… rewritten as STATA code: • Relative share for each household member is calculated (by dividing individuals’ coefficients with a total sum of coefficients of all household members): gen rhp4=hp4/total replace rhp4=0 if rhp4==. gen rhp9=hp9/total replace rhp9=0 if rhp9==. … gen rhp90=hp90/total replace rhp90=0 if rhp90==. • Finally, relative shares of household members of each household are multiplied with reported health expenditures of that household to obtain health expenditures by individuals: gen th4=cfhc*rhp4 gen th9=cfhc*rhp9 … gen th90=cfhc*rhp90
It has already been said during the workshop: if you have information… • about the subcategories of the expenditures (for example information on household members using inpatient care (IN) or out-patient (OUT) services), use that information: • about household members being enrolled (E) and non-enrolled (NE) into the educational process, use that information:
… or if you… • have external profile of per capita utilization by age (U) and number of household members by age (M), use that information: • have detailed data available (for example separately reported expenditures on primary, secondary and tertiary education level), use them: limit the analysis only to those age groups, for which the expenditures are relevant.This is especially relevant for education expenditures.
Some final details • There is no constant term (i.e. estimated in homogeneous form), so the household consumption is fully allocated. • Constrain negative values of the coefficients to zero. • In the case of education – do not apply smoothing. • You might want to have two age groups (age 0 and age 1-4) instead of the 0-4 age group, to capture higher health expenditures in the first year of age, reported in countries where such detailed data is available.