270 likes | 475 Views
Multiple Regression Applications III. Lecture 18. Dummy variables. Include qualitative indicators into the regression: e.g. gender, race, regime shifts. So far, have only seen the change in the intercept for the regression line.
Multiple Regression Applications III Lecture 18
Dummy variables • Include qualitative indicators into the regression: e.g. gender, race, regime shifts. • So far, have only seen the change in the intercept for the regression line. • Suppose now we wish to investigate if the slope changes as well as the intercept. • This can be written as a general equation: Wi = a + b1Agei + b2Marriedi + b3Di + b4(Di*Agei) + b5(Di*Marriedi) + ei • Suppose first we wish to test for the difference between males and females.
Interactive terms • For females and males separately, the model would be: Wi = a + b1Agei + b2Marriedi + e • in so doing we argue thatwould be different for males and females • we want to think about two sub-sample groups: males and females • we can test the hypothesis that the intercept and partial slope coefficients will be different for these 2 groups
Interactive terms (2) • To test our hypothesis we’ll estimate the regression equation above (Wi = a + b1Agei + b2Marriedi + e) for the whole sample and then for the two sub-sample groups • We test to see if our estimated coefficients are the same between males and females • Our null hypothesis is: H0 : aM, b1M, b2M = aF, b1F, b2F
Interactive terms (3) • We have an unrestricted form and a restricted form • unrestricted: used when we estimate for the sub-sample groups separately • restricted: used when we estimate for the whole sample • What type of statistic will we use to carry out this test? • F-statistic: q = k, the number of parameters in the model n = n1 + n2 where n is complete sample size
Interactive terms (4) • The sum of squared residuals for the unrestricted form will be: SSRU = SSRM + SSRF • L17_2.xls • the data is sorted according to the dummy variable “female” • there is a second dummy variable for marital status • there are 3 estimated regression equations, one each for the total sample, male sub-sample, and female sub-sample
Interactive terms (5) • The output allows us to gather the necessary sum of squared residuals and sample sizes to construct the test statistic: • Since F0.05,3, 27 = 2.96 > F* we cannot reject the null hypothesis that the partial slope coefficients are the same for males and females
Irene O. Wong: Interactive terms (6) Irene O. Wong: • What if F* > F0.05,3, 27 ? How to read the results? • There’s a difference between the two sub-samples and therefore we should estimate the wage equations separately • Or we could interact the dummy variables with the other variables • To interact the dummy variables with the age and marital status variables, we multiply the dummy variable by the age and marital status variables to get: Wt = a + b1Agei + b2Marriedi + b3Di + b4(Di*Agei) + b5(Di*Marriedi) + ei
Interactive terms (7) • Using L17_2.xls you can construct the interactive terms by multiplying the FEMALE column by the AGE and MARRIED columns • one way to see if the two sub-samples are different, look at the t-ratios on the interactive terms • in this example, neither of the t-ratios are statistically significant so we can’t reject the null hypothesis
Interactive terms (8) • If we want to estimate the equation for the first sub-sample (males) we take the expectation of the wage equation where the dummy variable for female takes the value of zero: E(Wt|Di = 0) = a + b1Agei + b2Marriedi • We can do the same for the second sub-sample (Females) E(Wt|Di = 1) = (a + b3) + (b1 + b4)Agei + (b2 + b3)Marriedi • We can see that by using only one regression equation, we have allowed the intercept and partial slope coefficients to vary by sub-sample
Phillips Curve example • Phillips curve as an example of a regime shift. • Data points from 1950 - 1970: There is a downward sloping, reciprocal relationship between wage inflation and unemployment W UN
Phillips Curve example (2) • But if we look at data points from 1971 - 1996: • From the data we can detect an upward sloping relationship • ALWAYS graph the data between the 2 main variables of interest W UN
Phillips Curve example (3) • There seems to be a regime shift between the two periods • note: this is an arbitrary choice of regime shift - it was not dictated by a specific change • We will use the Chow Test (F-test) to test for this regime shift • the test will use a restricted form: • it will also use an unrestricted form: • D is the dummy variable for the regime shift, equal to 0 for 1950-1970 and 1 for 1971-1996
Phillips Curve example (4) • L17_3.xls estimates the restricted regression equations and calculates the F-statistic for the Chow Test: • The null hypothesis will be: H0 : b1 = b3 = 0 • we are testing to see if the dummy variable for the regime shift alters the intercept or the slope coefficient • The F-statistic is (* indicates restricted) Where q=2
Phillips Curve example (5) • The expectation of wage inflation for the first time period: • The expectation of wage inflation for the second time period: • You can use the spreadsheet data to carry out these calculations
Relaxing Assumptions Lecture 18
Today’s Plan • A review of what we have learned in regression so far and a look forward to what we will happen when we relax assumptions around the regression line • Introduction to new concepts: • Heteroskedasticity • Serial correlation (also known as autocorrelation) • Non-independence of independent variables
CLRM Revision • Calculating the linear regression model (using OLS) • Use of the sum of square residuals: calculate the variance for the regression line and the mean squared deviation • Hypothesis tests: t-tests, F-tests, c2 test. • Coefficient of determination (R2) and the adjustment. • Modeling: use of log-linear, logs, reciprocal. • Relationship between F and R2 • Imposing linear restrictions: e.g. H0: b2 = b3 = 0 (q = 2); H0: a + b = 1. • Dummy variables and interactions; Chow test.
Relaxing assumptions • What are the assumptions we have used throughout? • Two assumptions about the population for the bi-variate case: 1. E(Y|X) = a + bX (the conditional expectation function is linear); 2. V(Y|X) = (conditional variances are constant) • Assumptions concerning the sampling procedure (i= 1..n) 1. Values of Xi (not all equal) are prespecified; 2. Yi is drawn from the subpopulation having X = Xi; 3. Yi ‘s are independent. • Consequences are: 1. E(Yi) = a + bXi; 2. V(Yi) = s2; 3. C(Yh, Yi) = 0 • How can we test to see if these assumptions don’t hold? • What can we do if the assumptions don’t hold?
Homoskedasticity • We would like our estimates to be BLUE • We need to look out for three potential violations of the CLRM assumptions: heteroskedasticity, autocorrelation, and non-independence of X (or simultaneity bias). • Heteroskedasticity: usually found in cross-section data (and longitudinal) • In earlier lectures, we saw that the variance of is • This is an example of homoskedasticity, where the variance is constant
X X1 X2 X3 Homoskedasticity (2) • Homoskedasticity can be illustrated like this: Y constant variance around the regression line
Heteroskedasticity • But, we don’t always have constant variance s2 • We may have a variance that varies with each observation, or • When there is heteroskedasticty, the variance around the regression line varies with the values of X
Heteroskedasticity (2) • The non-constant variance around the regression line can be drawn like this: Y X X1 X2 X3
Serial (auto) correlation • Serial correlation can be found in time series data (and longitudinal data) • Under serial correlation, we have covariance terms • where Yi and Yh are correlated or each Yi is not independently drawn • This results in nonzero covariance terms
Serial (auto) correlation (2) • Example: We can think of this using time series data such that unemployment at time t is related to unemployment in the previous time period t-1 • If we have a model with unemployment as the dependent variable Yt then • Yt and Yt-1 are related • et and et-1 are also related
Non-independence • The non-independence of independent variables is the third violation of the ordinary least squares assumptions • Remember from the OLS derivation that we minimized the sum of the squared residuals • we needed independence between the X variable and the error term • if not, the values of X are not pre-specified • without independence, the estimates are biased
Summary • Heteroskedasticity and serial correlation • make the estimates inefficient • therefore makes the estimated standard errors incorrect • Non-independence of independent variables • makes estimates biased • instrumental variables and simultaneous equations are used to deal with this third type of violation • Starting next lecture we’ll take a more in-depth look at the three violations of the CLRM assumptions