630 likes | 802 Views
Research Method. Lecture 6 (Ch7) Multiple regression with qualitative variables. Dummy variables. Often, our data contain qualitative variables, such as gender. These are not quantitative variable. They are qualitative variables.
E N D
Research Method Lecture 6 (Ch7) Multiple regression with qualitative variables
Dummy variables • Often, our data contain qualitative variables, such as gender. These are not quantitative variable. They are qualitative variables.
However, such qualitative variables are also important in analyzing data. For example, you may want to answer the following question: “Is there any gender wage gap?”
To incorporate such a qualitative variable into the OLS equation, we first convert qualitative information into a quantitative variable called a “dummy variable”. • If you would like to incorporate gender information in your model, create the following dummy variable: Female =1 if the person is female =0 if the person is male
Incorporating dummy variable as an independent variable • Suppose you are interested in gender wage gap, then you include the dummy variable for female as Log(wage)=β0+δ0(female)+β1(experience)+u where wage is hourly wage rate, and experience is in years. • Then δ0 shows the wage difference between male and female who have the same experience. To understand this, see the next slide.
For male, the predicted log wage at a given experience is For female, the predicted log wage at a given experience is Therefore, the gender difference in wage at a given experience is given by If female earns less than male, will be negative
Using a graph, the gender wage gap is described as an intercept shift because: Intercept for male Intercept for female • Assuming that female earns lower salary, (that is is negative), the predicted wage experience profiles would look like the ones in the next slide
The estimated wage-experience profiles by gender Log(wage) Male Female Note that is usually negative, so experience salary profile for female lies below the male’s. Experience
The base group • When you include (female), you do not include (male). • The predicted wage for males is given by setting female=0. • Thus the wage gap is estimated relative to males. • This means that, in our example, we set males as the base group. We often call this group as, the benchmark group, excluded group, or the excluded category.
Example • Use Wage1.dta, estimate the following model. Is there any gender wage gap? How big is the wage gap? Log(wage)=β0+δ0(female)+β1(experience)+u
Female earns 39% lower wage than male after controlling for experience.
Policy analysis using a dummy variable • State of Michigan provided a job training program for manufacturing companies. Did this grant helped firms providing more training to their employees? • To answer to this question, you may estimate the following model. (Hours of training per employee)=β0+δ0(grant)+β1log(sales)+u Where (grant) is a dummy variable taking the value 1 if the firm received the grant, and 0 otherwise.
Grant appears to have a significant effect on employee training.
Using dummy variables for multiple categories • When you compare gender gap, there are only two groups: males or females. • However, in some situation, there are more than 2 categories. For example, you may want to examine the gender differences among the following four groups Married men Married women Single men Single women
Then, solution is to create dummy variables for all the categories except one category. For example, you estimate Log(wage)=β0+δ0(Married men) +δ1(Married women) +δ2(Single women) +β1(Education) +β2(experience) +β3(experience)2+u The excluded group is the single male. So the differences in wage among the four groups are estimated relative to single males
Exercise, using WAGE1.dta, estimate the model in the previous page.
Married men earns 24.6% more than single male. Married women earns 21.8% less than single male. Single women earns 12.1% less than single male.
Here is the do file I used to obtain the results. use "D:\My Documents\IUJ_teaching\Research Methodology\Wooldridge Econometrics resources\data\WAGE1.DTA", clear ******************* * Create dummy for* * married men * ******************* gen marriedmen=0 replace marriedmen=1 if female==0 & married==1 ******************* * Create dummy for* * married women * ******************* gen marriedwomen=0 replace marriedwomen=1 if female==1 & married==1 ******************* * Create dummy for* * single women * ******************* gen singlewomen=0 replace singlewomen=1 if female==1 & married==0 ********************* * Estimate the model* ********************* reg lwage marriedmen marriedwomen singlewomen educ exper expersq
Incorporating ordinary information by using dummy variables • Some information is ordinary, like the credit rating or the law school rankings. • For concreteness, consider to estimate the effect of municipal credit rating on the municipal bond interest • You have credit rating variable that takes values from 1 to 5. The rating 1 is the worst rating, and 5 is the best rating.
How do we incorporate this information? One possibility is to estimate (Municipal bond interest rate) =β0+β1(Credit rating)+(other factors) Then β1 shows the change in municipal bond interest when credit rating increases by 1.
But this assume that the effect of improving credit rating from 1 to 2 is the same as the effect of improving the rating from 2 to 3, and so on. • But there is no reason why the improvement from 1 to 2 should be the same as 2 to 3. • In this situation, it is better to create dummy variables for each rating, excluding one category, then include them in the model.
That is, create the following 4 dummies CR1 =1 if credit rating=1 =0 if otherwise CR2=1 if credit rating=2 =0 if otherwise CR3 =1 if credit rating=3 =0 if otherwise CR4=1 if credit rating =4 =0 if otherwise The excluded category is credit rating=5
(Municipal bond interest rate) =β0+β1CR1+β2CR2+β3CR3+β4CR4 +(other factors) Then, β1 shows the effect of getting credit rating 1 on the bond interest rate relative to credit rating 5. Other coefficients are interpreted in the same way.
Exercise • Use beauty.dta, examine if one’s physical attractiveness would affect wage. Use the variable for `below average looks’ and `above average looks’. Include other variables where it makes sense to do so. Try also to estimate separately for male and female.
Interactions involving dummy variables Example 1 • Suppose that you are interested in gender wage gap, but you suspect that gender wage gap may change with experience. • Then you would estimate the following. Log(wage)=β0+δ0(female) +δ1(female)(experience) +β1(experience)+u
Then male wage at given experience is written as Female wage at given experience is written as Thus, the gender gap at a given experience is:
Thus is the gender wage gap at hiring (i.e, experience=0). Usually it is negative. So, if the coefficient for the interaction term, , is positive, then the gender gap is decreasing with experience. If is negative, the gender gap is increasing with experience. • The case where gender gap is increasing with experience is described in the following slide.
Log(wage) Male • Case where gender gap is increasing with experience: (i.e., is negative) Gender gap at a given experience = Female Experience
Exercise • Use Wage1.dta estimate the following model. Log(wage)=β0+δ0(female) +δ1(female)(experience) +β1(experience)+u Q1. Is the gender gap increasing or decreasing with experience? Q2. What is the gender gap at hiring (exp=0) Q3. What is the gender gap at experience equal to 10? Is the gender gap significant at this experience?
Answer • Gender gap is increasing with experience since the coefficient on the interaction term is negative • Gender gap at hiring =-0.29 • Gender gap at experience equal to 10 = -0.293+(-.00586)*10=-0.35 This gap is significant at 5% level.
The interaction between two dummy variables • Suppose that you are interested in if gender wage gap is concentrated in particular group of people. For example, you want to know if gender wage gap is concentrated in married people.
Then you can estimate the following model. Log(wage)=β0+δ0(female) +δ1(female)(married) +β1(experience) +β2(married) +u • Then we have the following Gender gap for married people =δ0+δ1 Gender gap for single people = δ0
Exercise • Using Wage1.dta, estimate the following model. Log(wage)=β0+δ0(female) +δ1(female)(married) +β1(experience) +β2(married) +u • What is the gender wage gap within married people? Is it statistically significant? • What is the gender wage gap within single people? Is it statistically significant?
1. Gender wage gap within married people = (-0.133)+ (-0.372)=-0.505. It is significant at 5% level. 2. Gender wage gap within single people = -0.133. It is significant at 5% level. (This is based on the usual t-test. )
Testing for differences in regression functions across groups (The Chow test) • Consider initially that you are interested in examining the determinants of GPA of college students. So you have the following equation in mind. (Cumulative GPA) =β0+β1(SAT)+β2(Hispanic)+β3(total hours)+u Where SAT is the SAT score, Hispanic is the dummy for Hispanics and (total hours) is the total hours of college courses.
But suppose that you wonder if all the explanatory variables have different effects on GPA depending on gender. • That is, you wonder if males and females have different coefficients. • We can test if this is the case by estimating the following model.
(Cumulative GPA) =β0+β1(SAT)+β2(Hispanic)+β3(total hours) +δ0(female) +δ1(female)(SAT) +δ2(female)(Hispanic) +δ3(female)(Total hours)+u Then we can test of if males and females have different coefficients by testing the following hypotheses using F-test. H0: δ0=0, δ1=0, δ2=0, δ3=0 H1: H0 is not true
This particular F-test is called the Chow test. • Now, using GPA3.dta, conduct the Chow test described above.
We reject the null hypothesis that male and female have the same functional form at 5% significance level.
Chow test: What to do when you have a lot of variables. • Chow test is easy when your initial model contains 3 or 4 variables. • But if your model contains many variables, creating interaction terms takes a lot of time. • Here is another way to do the same Chow test.
The equivalent procedure of Chow test: (Let me explain this by using the same example) Step 1: Estimate the initial model using only the male sample. (Cumulative GPA) =β0+β1(SAT)+β2(Hispanic)+β3(total hours)+u The obtain SSR. Call this SSR1.
Step 2: Estimate the initial model using only the female sample. (Cumulative GPA) =β0+β1(SAT)+β2(Hispanic)+β3(total hours)+u The obtain SSR. Call this SSR2.
Step 3: Estimate the initial model using pooled sample (both males and females included) (Cumulative GPA) =β0+β1(SAT)+β2(Hispanic)+β3(total hours)+u The obtain SSR. Call this SSRp.
k is the number of slope parameters in the initial model. Note k does not include female. So in our example, k=3. Step 4: Compute the following statistic n is the number of the observations. This F-statistic follows F distribution with degree of freedom equal to [k+1, n-2(k+1)] You reject the null hypothesis that males and females have the same coefficients if F-stat falls in the rejection region. This particular F-stat is called Chow statistic. This F-stat will be the same as the F-stat when you include the interaction terms as described before.
Exercise • Conduct Chow test again using the alternative method described above.
Male only sample SSR1 Female only sample SSR2
Pooled sample (both male and female) SSRP This follows F[3+1, 724-2(3+1)]=F(4, 716) The cutoff at 5% significance level is 2.37. Thus we reject the null hypothesis that males and females have the same coefficients. Also note that this F-stat is the same as the F-stat you obtained by using the other method.
Always think whether the policy variable is endogenous or not • Consider that you are interested in estimating the effects of employee training grants on the employee productivity. Then you may estimate (Productivity)=β0+β1(grant)+β2(sales)+(Other factors)+u