Multiple Regression Applications III

Multiple Regression Applications III Lecture 18

Dummy variables • Include qualitative indicators into the regression: e.g. gender, race, regime shifts. • So far, have only seen the change in the intercept for the regression line. • Suppose now we wish to investigate if the slope changes as well as the intercept. • This can be written as a general equation: Wi = a + b1Agei + b2Marriedi + b3Di + b4(Di*Agei) + b5(Di*Marriedi) + ei • Suppose first we wish to test for the difference between males and females.

Interactive terms • For females and males separately, the model would be: Wi = a + b1Agei + b2Marriedi + e • in so doing we argue thatwould be different for males and females • we want to think about two sub-sample groups: males and females • we can test the hypothesis that the intercept and partial slope coefficients will be different for these 2 groups

Interactive terms (2) • To test our hypothesis we’ll estimate the regression equation above (Wi = a + b1Agei + b2Marriedi + e) for the whole sample and then for the two sub-sample groups • We test to see if our estimated coefficients are the same between males and females • Our null hypothesis is: H0 : aM, b1M, b2M = aF, b1F, b2F

Interactive terms (3) • We have an unrestricted form and a restricted form • unrestricted: used when we estimate for the sub-sample groups separately • restricted: used when we estimate for the whole sample • What type of statistic will we use to carry out this test? • F-statistic: q = k, the number of parameters in the model n = n1 + n2 where n is complete sample size

Interactive terms (4) • The sum of squared residuals for the unrestricted form will be: SSRU = SSRM + SSRF • L17_2.xls • the data is sorted according to the dummy variable “female” • there is a second dummy variable for marital status • there are 3 estimated regression equations, one each for the total sample, male sub-sample, and female sub-sample

Interactive terms (5) • The output allows us to gather the necessary sum of squared residuals and sample sizes to construct the test statistic: • Since F0.05,3, 27 = 2.96 > F* we cannot reject the null hypothesis that the partial slope coefficients are the same for males and females

Irene O. Wong: Interactive terms (6) Irene O. Wong: • What if F* > F0.05,3, 27 ? How to read the results? • There’s a difference between the two sub-samples and therefore we should estimate the wage equations separately • Or we could interact the dummy variables with the other variables • To interact the dummy variables with the age and marital status variables, we multiply the dummy variable by the age and marital status variables to get: Wt = a + b1Agei + b2Marriedi + b3Di + b4(Di*Agei) + b5(Di*Marriedi) + ei

Interactive terms (7) • Using L17_2.xls you can construct the interactive terms by multiplying the FEMALE column by the AGE and MARRIED columns • one way to see if the two sub-samples are different, look at the t-ratios on the interactive terms • in this example, neither of the t-ratios are statistically significant so we can’t reject the null hypothesis

Interactive terms (8) • If we want to estimate the equation for the first sub-sample (males) we take the expectation of the wage equation where the dummy variable for female takes the value of zero: E(Wt|Di = 0) = a + b1Agei + b2Marriedi • We can do the same for the second sub-sample (Females) E(Wt|Di = 1) = (a + b3) + (b1 + b4)Agei + (b2 + b3)Marriedi • We can see that by using only one regression equation, we have allowed the intercept and partial slope coefficients to vary by sub-sample

Phillips Curve example • Phillips curve as an example of a regime shift. • Data points from 1950 - 1970: There is a downward sloping, reciprocal relationship between wage inflation and unemployment W UN

Phillips Curve example (2) • But if we look at data points from 1971 - 1996: • From the data we can detect an upward sloping relationship • ALWAYS graph the data between the 2 main variables of interest W UN

Phillips Curve example (3) • There seems to be a regime shift between the two periods • note: this is an arbitrary choice of regime shift - it was not dictated by a specific change • We will use the Chow Test (F-test) to test for this regime shift • the test will use a restricted form: • it will also use an unrestricted form: • D is the dummy variable for the regime shift, equal to 0 for 1950-1970 and 1 for 1971-1996

Phillips Curve example (4) • L17_3.xls estimates the restricted regression equations and calculates the F-statistic for the Chow Test: • The null hypothesis will be: H0 : b1 = b3 = 0 • we are testing to see if the dummy variable for the regime shift alters the intercept or the slope coefficient • The F-statistic is (* indicates restricted) Where q=2

Phillips Curve example (5) • The expectation of wage inflation for the first time period: • The expectation of wage inflation for the second time period: • You can use the spreadsheet data to carry out these calculations

Relaxing Assumptions Lecture 18

Today’s Plan • A review of what we have learned in regression so far and a look forward to what we will happen when we relax assumptions around the regression line • Introduction to new concepts: • Heteroskedasticity • Serial correlation (also known as autocorrelation) • Non-independence of independent variables

CLRM Revision • Calculating the linear regression model (using OLS) • Use of the sum of square residuals: calculate the variance for the regression line and the mean squared deviation • Hypothesis tests: t-tests, F-tests, c2 test. • Coefficient of determination (R2) and the adjustment. • Modeling: use of log-linear, logs, reciprocal. • Relationship between F and R2 • Imposing linear restrictions: e.g. H0: b2 = b3 = 0 (q = 2); H0: a + b = 1. • Dummy variables and interactions; Chow test.

Relaxing assumptions • What are the assumptions we have used throughout? • Two assumptions about the population for the bi-variate case: 1. E(Y|X) = a + bX (the conditional expectation function is linear); 2. V(Y|X) = (conditional variances are constant) • Assumptions concerning the sampling procedure (i= 1..n) 1. Values of Xi (not all equal) are prespecified; 2. Yi is drawn from the subpopulation having X = Xi; 3. Yi ‘s are independent. • Consequences are: 1. E(Yi) = a + bXi; 2. V(Yi) = s2; 3. C(Yh, Yi) = 0 • How can we test to see if these assumptions don’t hold? • What can we do if the assumptions don’t hold?

Homoskedasticity • We would like our estimates to be BLUE • We need to look out for three potential violations of the CLRM assumptions: heteroskedasticity, autocorrelation, and non-independence of X (or simultaneity bias). • Heteroskedasticity: usually found in cross-section data (and longitudinal) • In earlier lectures, we saw that the variance of is • This is an example of homoskedasticity, where the variance is constant

X X1 X2 X3 Homoskedasticity (2) • Homoskedasticity can be illustrated like this: Y constant variance around the regression line

Heteroskedasticity • But, we don’t always have constant variance s2 • We may have a variance that varies with each observation, or • When there is heteroskedasticty, the variance around the regression line varies with the values of X

Heteroskedasticity (2) • The non-constant variance around the regression line can be drawn like this: Y X X1 X2 X3

Serial (auto) correlation • Serial correlation can be found in time series data (and longitudinal data) • Under serial correlation, we have covariance terms • where Yi and Yh are correlated or each Yi is not independently drawn • This results in nonzero covariance terms

Serial (auto) correlation (2) • Example: We can think of this using time series data such that unemployment at time t is related to unemployment in the previous time period t-1 • If we have a model with unemployment as the dependent variable Yt then • Yt and Yt-1 are related • et and et-1 are also related

Non-independence • The non-independence of independent variables is the third violation of the ordinary least squares assumptions • Remember from the OLS derivation that we minimized the sum of the squared residuals • we needed independence between the X variable and the error term • if not, the values of X are not pre-specified • without independence, the estimates are biased

Summary • Heteroskedasticity and serial correlation • make the estimates inefficient • therefore makes the estimated standard errors incorrect • Non-independence of independent variables • makes estimates biased • instrumental variables and simultaneous equations are used to deal with this third type of violation • Starting next lecture we’ll take a more in-depth look at the three violations of the CLRM assumptions

Multiple Regression Applications III

Multiple Regression Applications III

Presentation Transcript

Multiple Regression

Multiple Regression

Multiple Regression

Multiple Regression

Multiple Regression

Multiple Regression

Multiple Regression

Multiple Regression III 4/16/12

Multiple Regression

MULTIPLE REGRESSION

Multiple Regression

Multiple regression

Multiple Regression Applications

Multiple Regression Applications

Multiple Regression Applications

Multiple Regression

Multiple Regression

Multiple Regression

Multiple regression:

Multiple Regression

Multiple Regression Applications

Multiple Regression