1 / 37

Marietta College

Marietta College . Spring 2011 Econ 420: Applied Regression Analysis Dr. Jacqueline Khorassani. Week 13. Tuesday, April 5. Exam 3 : Monday, April 25, 12- 2:30PM. Leadership Q&A. Cosponsored by McDonough Center for Leadership & Business and the Economic Roundtable of the Ohio Valley.

gili
Download Presentation

Marietta College

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Marietta College Spring 2011 Econ 420: Applied Regression Analysis Dr. Jacqueline Khorassani Week 13

  2. Tuesday, April 5 Exam 3: Monday, April 25, 12- 2:30PM

  3. LeadershipQ&A Cosponsored by McDonough Center for Leadership & Business and the Economic Roundtable of the Ohio Valley TONIGHT 7:30pm McDonough Gallery Is anyone interested in going to breakfast with him tomorrow 8 am, Lafayette Hotel David LeonhardtEconomics Journalist Washington Bureau The New York Times

  4. This is the last bonus opportunity of this semester • 2 points for attending • 2-5 points per question • 2-10 points per summary • Summaries are due before 5 pm on Friday, April 8 via an email attachment to me • Total bonus points will be divided by 3 and added to your exams.

  5. Return and discuss Asst 18 # 12 Page 240 a) • The estimated coefficients all are in the expected direction • R2 bar seems fairly low. • Always check the significance at 10 percent or better • Coefficients of A , A2 and S are significant. • You can only interpret the magnitude of coefficients if they pass the t-test of significance • Significance has to do with t-test • Importance has to do with the absolute value of coefficient.

  6. b) • It implies that wages rise at a declining rate with respect to age and eventually fall. • Does not imply perfect collinearity (non-linear correlation). c) • Semilog (Ln W) is a possibility • The slope coefficient represents the percentage change in wage caused by a one-unit increase in the independent variable (holding constant all the other independent variables). • Since pay raises are often discussed in percentage terms, this makes sense. • Phil & Yuan say, but what about the meaning of coefficient of A2 in a semi log function? (great point) • Linda says, it depends on the purpose of the study (great point)

  7. d) • It’s a good habit to ignore (except to make sure that one exists) even if it looks too large or too small. • Intercept picks up the mean of the error term & and that is affected by omitted variables. e) • The poor fit and the insignificant estimated coefficient of union membership are all reasons for being extremely cautious about using this regression to draw any conclusions about union membership.

  8. Collect Asst 19 # 5 Page 234 • Including Part e (data is available online under STOCK in Chapter 7)

  9. Imperfect Multicollinearity Problem • What is it? • Let’s say you estimate an regression equation, what makes you suspicious about possible mulit problem? • What are the two formal tests we talked about before?

  10. Sometimes 3 or more independent variables are correlated • Example • Income = f (wage rate, tax rate, hours of work, ….) • Wage rate, tax rate and hours of work may be all highly correlated with each other • Problem: simple correlation coefficient may not capture this.

  11. Test of Multicollinearity among 3 or more independent variables • Regress each independent variable (say X1) on the other independent variables (X2, X3, X4) • Then calculate VIF • VIF = 1 / (1- R2) • If VIF > 5 then X1 is highly correlated with the other independent variables • Do the same for all of the other independent variables

  12. Asst 20 • Data set: DRUGS (Chapter 5, P 157) • Estimate Equation 5.10 • Before you run any formal tests, do you suspect an imperfect mulitcollinearity problem? Why or why not? • Examine the absolute values of the correlation coefficients between the independent variables included in Equation 5.10. Do you find any evidence of muliticollinerity problem? Discuss. • Examine the VIF of the two most suspicious independent variables in Equation 5.1 based on what your found in Section 2 above. Do you find any evidence of muliticollinerity problem? Discuss

  13. Thursday, April 7 • Exam 3: Monday, April 25, 12- 2:30PM • If you asked David Leonhardt questions on Tuesday night, write it down and give it to me today. • Summaries are due before 5 pm tomorrow via an email attachment. • Bring laptops to class on Tuesday

  14. Return and discuss Asst 19 # 5 Page 234 • Including Part e (data is available online under STOCK in Chapter 7)

  15. (a) • You are correct but note that the null and alternative hypotheses are not about beta hats, they are about betas. (b) • It’s unusual to have a lagged variable in a cross-sectional model. • BETA is lagged.

  16. Part C • Should we include EARN in the set of our independent variables • Does the theory call for its inclusion? Yes, but a version of it is in dependent variable exclude EARN • Is the estimated coefficient of EARN significant in the right direction? No exclude EARN • As you include EARN, does the adjusted R squared goes up? Yes  include EARN • As you include EARN, do the other variables’ coefficients change significantly? Change somewhat! • As you include EARN, do AIC and SC go down? No  exclude EARN

  17. (d) • The functional form is a semilog left, which is appropriate both on a theoretical basis and also because two of the independent variables are expressed as percentages.

  18. (e) • EARN, DIV , and Beta all can be negative, can’t take their log.

  19. Return and discuss Asst 20 • Data set: DRUGS (Chapter 5, P 157) • Estimate Equation 5.10 • Before you run any formal tests, do you suspect an imperfect mulitcollinearity problem? Why or why not? • Examine the absolute values of the correlation coefficients between the independent variables included in Equation 5.10. Do you find any evidence of muliticollinerity problem? Discuss. • Examine the VIF of the two most suspicious independent variables in Equation 5.1 based on what your found in Section 2 above. Do you find any evidence of muliticollinerity problem? Discuss

  20. Do we suspect multicollinearity problem? What should we look for? R bar squared is high but we have two insignificant variables Dependent Variable: P Method: Least Squares Sample: 1 32 Included observations: 32 Variable Coefficient Std. Error t-Statistic Prob.   C 38.22131 6.387304 5.983951 0.0000 GDPN 1.433680 0.214395 6.687108 0.0000 CVN -0.594732 0.223947 -2.655679 0.0133 PP 7.311330 6.123084 1.194060 0.2432 DPC -15.62864 6.932635 -2.254359 0.0328 IPC -11.38456 7.159258 -1.590187 0.1239 R-squared 0.811223      Adjusted R-squared 0.774920     

  21. Correlation Matrix GDPN CVN PP DPC IPC GDPN  1  0.86 0.21  0.17 -0.05 CVN     1 0.12  0.31  0.06 PP     1 -0.13 -0.21 DPC 1  0.38 IPC  1 Why are the diagonal values all 1? Why did I eliminate the values in bottom half of the table? Is multicollinearity a problem?

  22. VIF = 1/ (1-0.77) VIF = 4.34 VIF<5  no serious multicollinearity problem Dependent Variable: GDPN Method: Least Squares Sample: 1 32 Included observations: 32 Variable Coefficient Std. Error t-Statistic Prob.   C 12.68648 5.187712 2.445486 0.0213 CVN 0.901037 0.101695 8.860167 0.0000 PP 4.343201 5.432426 0.799496 0.4310 DPC -4.112512 6.172508-0.666263 0.5109 IPC -3.883886 6.382854 -0.608487 0.5479 R-squared 0.766913     

  23. Dependent Variable: CVN Method: Least Squares Sample: 1 32 Included observations: 32 Variable Coefficient Std. Error t-Statistic Prob.   C -4.639597 5.415845 -0.856671 0.3992 DPC 8.402889 5.733910 1.465473 0.1543 GDPN 0.825806 0.093204 8.860167 0.0000 IPC 2.662434 6.130964 0.434260 0.6676 PP -1.333922 5.255631 -0.253808 0.8016 R-squared 0.775957 VIF = 1/ (1-0.78) VIF = 4.54 VIF<5  no serious multicollinearity problem

  24. Remedies for Multicollinearity • If your main goal is to use the equation for forecasting and you don’t want to do specific t- test on each estimated coefficient then do nothing. • This is because multicollinearity does not affect the predictive power of your equation. • If it seems that you have a redundant variable, drop it. • Examples • You don’t need both real and nominal interest rates in your model • You don’t need both nominal and real GDP in your model

  25. Remedies for Multicollinearity 3.If all variables need to stay in the equation, transform the multicollinear variables • Example: • Number of domestic cars sold = B0 + B1 average price of domestic cars + B2 average price of foreign cars +…..+ є • Problems: Prices of domestic and foreign cars are highly correlated • Solution: • Number of domestic cars sold = B0 + B1 the ratio of average price of domestic cars to the average price of foreign cars +…..+ є 4. Increase the sample size or choose a different random sample

  26. Asst 21: Due Tuesday in class Use the data set FISH in Chapter 8 (P 274) to run the following regression equation: F = f (PF, PB, Yd, P, N) • Conduct all 3 tests of imperfect multicollinearity problem and report your results. • If you find an evidence for imperfect multicollinearity problem, suggest and implement a reasonable solution.

  27. Chapter 9 (Autocorrelation or Serial Correlation) • Suppose we are using time series data to estimate consumption (C) as a function of income (Y) and other factors Ct = β1 + β2Yt +…..+ єt • Where t = (1, 2, 3, ….T) • This means that • C1 = β1 + β2 Y1 +…. + є1, and • C2 = β1 + β2 Y2 +…. + є2 • ….. • …… • CT = β1 + β2 YT+…. + єT …… • One of the classical assumptions regarding the error terms is • No correlation among the error terms in the theoretical equation • If this assumption is violated then there is a problem of pure serial correlation (autocorrelation).

  28. First Order Pure Autocorrelation є2= ρє1+ u2 • That is, the error term in period 2 depends on the error term in period 1 • Where, u2 is a normally distributed error with the mean of zero and constant variance

  29. Second Order Pure Autocorrelation є3= ρ1є1+ ρ2є2+ u3 • That is, the error term in period 3 depends on the error term in period 1 and the error term in period 2. • Where, u3 is a normally distributed error with the mean of zero and constant variance

  30. Higher Order Pure Autocorrelation єt= ρ1єt-1+ ρ2єt-2+ ρ3єt-3+ ….. + ut • That is, the error term in period t depends on the error term in period t-1, the error term in period t-2, and the error term in period t-3,…etc. • Where, ut is a normally distributed error with the mean of zero and constant variance

  31. What is the ImpureSerial Correlation? • When the true (theoretical) regression line does not have an autocorrelation problem but our estimated equation does. • Why? • Specification error • Wrong functional form • Data error

  32. Types of Serial Correlation • Positive • Errors form a pattern • A positive error is usually followed by another positive error • A negative error is usually followed by another negative error • More common

  33. Example of positive autocorrelation

  34. Types of Serial Correlation 2. Negative • A positive error is usually followed by a negative error or visa-versa • Less common

  35. Example of negative autocorrelation

  36. EViews allows you to see the residuals’ graph • After you estimate the regression equation • Click on View on your regression output • Click on Actual, Fitted, Residual Table

  37. What type of serial correlation may we have? Negative residuals seem to be followed by other negative residuals  suspect positive autocorrelation • Actual Fitted Residual Residual Plot •  23.5200  27.0183 -3.49829 |* . | . | •  25.9500  28.4120 -2.46201 | *. | . | •  25.9400  28.6804 -2.74042 | * . | . | •  27.2200  28.8564 -1.63643 | .* | . | •  27.8200  29.5686 -1.74857 | .* | . | •  29.7700  30.3005 -0.53051 | . * | . | •  32.0800  30.4455  1.63447 | . | *. | •  32.6200  31.9419  0.67814 | . | * . | •  32.8800  32.3974  0.48264 | . | * . | •  34.9000  32.2590  2.64105 | . | . * | •  36.8800  33.3502  3.52980 | . | . *| •  36.7400  35.0438  1.69615 | . | *. | •  38.4900  35.3550  3.13501 | . | . * | •  37.0100  33.6742  3.33580 | . | . * | •  36.9300  37.2539 -0.32386 | . *| . | •  36.7000  38.0231 -1.32312 | . * | . | •  39.8400  37.4913  2.34866 | . | .* | •  40.7100  39.4460  1.26398 | . | * . | •  43.1000  42.2249  0.87505 | . | * . | •  46.6400  44.7560  1.88398 | . | *. | •  46.9100  48.2692 -1.35924 | . * | . | •  48.4500  50.1672 -1.71724 | .* | . | •  ………

More Related