270 likes | 445 Views
Multiple Regression Applications. Lecture 15. Today’s Plan. Two topics and how they relate to multiple regression Multicollinearity Dummy variables. Multicollinearity. Suppose we have the following regression equation: Y = a + b 1 X 1 + b 2 X 2 + e
E N D
Multiple Regression Applications Lecture 15
Today’s Plan • Two topics and how they relate to multiple regression • Multicollinearity • Dummy variables
Multicollinearity • Suppose we have the following regression equation: Y = a + b1X1 + b2X2 + e • Multicollinearity occurs when some or all of the independent X variables are linearly related • Different forms of multicollinearity: • Perfect: OLS estimation will not work • Non-perfect: comes out of applied work - presents problems for inference and interpretation of the results. • No test for detection - only possible to compare alternative specified forms of the model.
Multicollinearity Example • Again we’ll use returns to education where: • the dependent variable Y is (log) wages • the independent variables (X’s) are age, experience, and years of schooling • Experience is defined as years in the labor force, or the difference between age and years of schooling • this can be written: Experience = Age - Years of school • What’s the problem with this?
Multicollinearity Example (2) • Note that we’ve expressed experience as the difference of two of our other independent variables • by constructing experience in this manner we create a collinear dependence between age and experience • the relationship between age and experience is a linear relationship such that: as age increases, for given years of schooling, experience also increases • We can write our regression equation for this example: Ln(Wages) = a + b1Experience + b2Age + e
Multicollinearity Example (3) • Recall that our estimate for b1 is Where x1 = experience and x2 = age • The problem is that x1 and x2 are linearly related • as we get closer to perfect collinearity, the denominator will go to zero. • OLS won’t work!
Multicollinearity Example (4) • Recall that the estimated variance for is: • So as x1 and x2 approach perfect collinearity, the denominator will go to zero and the expression for the the estimated variance of will increase • Implications: • with multicollinearity, you will get large standard errors on partial coefficients • your t-ratios, given the null hypothesis that the value of the coefficient is zero, will be small
More Multicollinearity Examples • In L15_1.xls we have individual data on age, years of education, weekly earnings, school age, and experience • we can perform a regression to calculate returns given age and experience • we can also estimate bivariate models including only age, only experience, and only years of schooling • we expect that the problem is that experience is related to age (to test this, we can regress age on experience) • if the slope coefficient on experience is 1, there is perfect multicollinearity
More Multicollinearity Examples (2) • On L15_2.xls there is a made-up example of perfect multicollinearity • OLS is unable to calculate the slope coefficients • calculating the products and cross-products, we find that the denominator for the slope coefficients is zero as predicted • If we have is an applied problem with these properties: 1) OLS is still unbiased 2) Large variance, standard errors, and difficult hypothesis testing 3) Few significant coefficients but a high R2
More Multicollinearity Examples (3) • What to do with L15_1.xls? • There’s simply not enough variation • We can collect more data or rethink the model • We can test for partial correlations between the X variables. • Always try specification checks. • Alternatively, try to re-scale variables so that the correlation is not the same.
Dummy variables • Dummy variables allow you to include qualitative variables (or variables that otherwise cannot be quantified) in your regression • examples include: gender, race, marital status, and religion • also becomes important when looking at “regime shifts” which may be new policy initiatives, economic change, or seasonality • We will look at some examples: • using female as a qualitative variable • using marital status as a qualitative variable • using the Phillips curve to demonstrate a regime shift
Qualitative example: female • We’ll construct a dummy variable: Di = 0 if not female i = 1, …n Di = 1 if female • We can do this with any qualitative variable • Note: assigning the values for the dummy variable is an arbitrary choice • On L15_3.xls there is a sample from the current CPS • to create the dummy variable “female” we assign the value one and zero to the CPS’ value of two and one for sex, respectively • we can include the dummy variable in the regression equation like we would any other variable
Qualitative example: female (2) • We estimate the following equation: • Now we can ask: what are the expected earnings given that a person is male? • Similarly, what are the expected earnings given that a person is female? E(Yi | Di = 1) = a + b(1) = a + b = 5.975 - 0.485 = 5.490
Qualitative example: female (4) • We can use other variables to extend our analysis • for example we can include age to get the equation: Y = a + b1Di + b2Xi + e • where Xi can be any or all relevant variables • Di and the related coefficient b1 will indicate how much, on average, females earn less than males • for males the intercept will be • for females the intercept will be
Qualitative example: female (5) • The estimated regression found on the spreadsheet is • The expected weekly earnings for men are: • The expected weekly earnings for women are:
Qualitative example: female (6) • An important note: • We can not include dummy variables for both male and female in the same regression equation • suppose we have Y = a + b1D1i + b2D2i + e • where: D1i = 0 if male D1i = 1 if female D2i = 0 if female D2i = 1 if male • OLS won’t be able to estimate the regression coefficients because D1i and D2i show perfect multicollinearity with intercept a • So if you have m qualitative variables, you should include (m-1) dummy variables in the regression equation
Example: marital status • The spreadsheet (L15_3.xls) also estimates the following regression equation using two distinct dummy variables: • where: D1i = 0 if male D1i = 1 if female D2i = 0 if other D2i = 1 if married • Using the regression equation we can create four categories: married males, unmarried males, married females, and unmarried females
Example: marital status (2) • Expected earnings for unmarried males: • Expected earnings for unmarried females: • Expected earnings for married males: • Expected earnings for unmarried females:
Interactive terms • So far we’ve only used dummy variables to change the intercept • We can also use dummy variables to alter the partial slope coefficients • Let’s think about this model: ln(Wi )= a + b1Agei + b2Marriedi + e • we could argue thatwould be different for males and females • we want to think about two sub-sample groups: males and females • we can test the hypothesis that the partial slope coefficients will be different for these 2 groups
Interactive terms (2) • To test our hypothesis we’ll estimate the regression equation for the whole sample and then for the two sub-sample groups • We test to see if our estimated coefficients are the same between males and females • Our null hypothesis is: H0 : aM, b1M, b2M = aF, b1F, b2F
Interactive terms (3) • We have an unrestricted form and a restricted form • unrestricted: used when we estimate for the sub-sample groups separately • restricted: used when we estimate for the whole sample • What type of statistic will we use to carry out this test? • F-statistic: q = k, the number of parameters in the model n = n1 + n2 where n is complete sample size
Interactive terms (4) • The sum of squared residuals for the unrestricted form will be: SSRU = SSRM + SSRF • L15.4.xls • the data are sorted according to the dummy variable “female” • there is a second dummy variable for marital status • there are 3 estimated regression equations, one each for the total sample, male sub-sample, and female sub-sample
Interactive terms (5) • The output allows us to gather the necessary sum of squared residuals and sample sizes to construct the estimate: • Since F0.05,3, 27 = 2.96 > F* we cannot reject the null hypothesis that the partial slope coefficients are the same for males and females
Interactive terms (6) • What if F* > F0.05,3, 27 ? How to read the results? • There’s a difference between the two sub-samples and therefore we should estimate the wage equations separately • Or we could interact the dummy variables with the other variables • To interact the dummy variables with the age and marital status variables, we multiply the dummy variable by the age and marital status variables to get: Wt = a + b1Agei + b2Marriedi + b3Di + b4(Di*Agei) + b5(Di*Marriedi) + ei
Interactive terms (7) • Using L15.4.xls you can construct the interactive terms by multiplying the FEMALE column by the AGE and MARRIED columns • one way to see if the two sub-samples are different, look at the t-ratios on the interactive terms • in this example, neither of the t-ratios are statistically significant so we can not reject the null hypothesis • We now know how to use dummy variables to indicate the importance of sub-sample groups within the data • dummy variables are also useful for testing for structural breaks or regime shifts
Interactive terms (8) • If we want to estimate the equation for the first sub-sample (males) we take the expectation of the wage equation where the dummy variable for female takes the value of zero: E(Wi|Di = 0) = a + b1Agei + b2Marriedi • We can do the same for the second sub-sample (Females) E(Wi|Di = 1) = (a + b3) + (b1 + b4)Agei + (b2 + b3)Marriedi • We can see that by using only one regression equation, we have allowed the intercept and partial slope coefficients to vary by sub-sample