1 / 20

Problems in Applying the Linear Regression Model Appendix 4A

Problems in Applying the Linear Regression Model Appendix 4A. The assumptions of the linear regression model don’t always hold in the real world We now examine statistical problems, which is the central focus of the economic sub-field called econometrics Autocorrelation Heteroscedasticity

alexh
Download Presentation

Problems in Applying the Linear Regression Model Appendix 4A

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Problems in Applying the Linear Regression ModelAppendix 4A • The assumptions of the linear regression model don’t always hold in the real world • We now examine statistical problems, which is the central focus of the economic sub-field called econometrics • Autocorrelation • Heteroscedasticity • Specification and Measurement Error • Multicollinearity • Simultaneous equation relationships and the identification problem • Nonlinearities

  2. Autocorrelationalso known as serial correlation • Problem: • Coefficients are unbiased • but t-values are unreliable • Symptoms: • look at a scatter of the error terms to see if there is a pattern, or • see if Durbin Watson statistic is far from 2. • Cures: • Find more variables that explain these patterns • Take first differences of data: Q = a + b•P

  3. Scatter of Error TermsPositive AutocorrelationFigure 4A.1 page 171 Y 1 2 6 3 4 5 7 8 X

  4. 2. Heteroscedasticity • Problem: • Coefficients are unbiased • t-values are unreliable • Symptoms: • different variances for different sub-samples • scatter of error terms shows increasing or decreasing dispersion • Cures: • Transform data, e.g., transform them into logs • Take averages of each sub-sample and use weighted least squares

  5. Scatter of Error TermsHeteroscedasticity Height alternative log Ht = a + b•AGE 1 2 5 8 AGE

  6. 3. Specification & Measurement Error • Salary = a + b (Strike Outs) in baseball • b is positive !!! • Why? • Omitted variable which is the number of Hits • Salary = c + d (Strike Outs) + e ( Hits ) • here d is negative and e is positive

  7. Specification & Measurement Error • Problem: • Coefficients are biased – we can even have the wrong sign as in the baseball example • Even adding more observations will not cure this bias • Symptoms: • The results don’t make economic sense • Cure: • Think through the relationships and find the missing variables in the specification • See if the new specification improves the fit (higher R2) and makes economic sense.

  8. Sometimes independent variables aren’t independent. EXAMPLE:Let Q = Eggs sold Q = a + b Pd + c Pg where Pd is price for a dozen eggs and Pg is the price for a gross of eggs. Regression Output Q = 22 - 7.8 Pd -.9 Pg (1.2) (1.45) R-square = .87 (t-values in parentheses) N = 100 observations Notice that: R-square is 87% But that neither coefficient is statistically significant. 4. Multicollinearity

  9. Multicollinearity • Problem: • Coefficients are unbiased • The t-values are small, often insignificant • Symptoms: • High R-squares but low t-values • Cures: • Drop a variable. Usually the remaining variable becomes significant. • Do nothing if forecasting, since the added R-square of more variables is worthwhile

  10. 5. Identification Problemand theSimultaneity Problem • Problem: • Coefficients are biased • Symptom: • Independent variables are known to be part of a system of equations • Cure: • Use as many independent variables as possible

  11. Graphical Explanation of the Identification Problem P • Suppose we estimate the following demand curve Q = a + b P. • Suppose Supply varies and Demand is FIXED. • All points lie on the demand curve • The demand curve is said to be identified. S1 S2 S3 Demand |____________________________Quantity quantity

  12. Suppose instead that SUPPLY is Fixed P • Let DEMAND shift and supply is fixed on doesn’t change. • All Points are on the SUPPLY curve. • We say that the SUPPLY curve is identified. Supply D3 D2 D1 quantity

  13. When both Supply and Demand Vary P • Often both supply and demand vary. • Equilibrium points are in shaded region. • A regression of Q = a + b Pwill be neither a demand nor a supply curve. S2 S1 ? D2 D1 quantity

  14. Simultaneous Systems • Demand is Qd = a + b P + c Y + e1 • Supply is Qs = d + e P + f W + e2 • Where P is price, Y is income, W is the wage rate, and each has an error term. • Notice that P is in both of the demand and supply function. P is “endogenously” determined by both demand and supply. • The simultaneity problem is that price is not independent, as it is determined by the whole system • The cure for this problem is usually to have as many independent variables as possible in the demand regression to make demand act like it is “fixed”.

  15. 6. Nonlinear Forms • Semi-logarithmic transformations. Sometimes taking the logarithm of the dependent variable or an independent variable improves the R2. Examples are: • log Y =  + ß·X. • Here, Y grows exponentially at rate ß in X; that is, ß percent growth per period. • Y =  + ß·log X. Here, Y doubles each time X increases by the square of X. Ln Y = .01 + .05X Y X

  16. Reciprocal Transformations • The relationship between variables may be inverse. Sometimes taking the reciprocal of a variable improves the fit of the regression as in the example: • Y =  + ß·(1/X) • shapes can be: • declining slowly • if beta positive • rising slowly • if beta negative E.g., Y = 500 + 2 ( 1/X) Y X

  17. Polynomials • Quadratic, cubic, and higher degree polynomial relationships are common in business and economics. • Profit and revenue are cubic functions of output. • Average cost is a quadratic function, as it is U-shaped • Total cost is a cubic function, as it is S-shaped • TC = ·Q + ß·Q2 + ·Q3 is a cubic total cost function. • If higher order polynomials improve the R-square, then the added complexity may be worth it.

  18. Multiplicative or Double Log • With the double log form, the coefficients are elasticities • Q = A • P b • Yc • Psd • multiplicative functional form • So: Ln Q = a + b•Ln P + c•Ln Y+ d•Ln Ps • Transform all variables into natural logs • Called the double log, since logs are on the left and the right hand sides. Ln and Log are used interchangeably. We use only natural logs.

  19. Soft Drink Case, pp. 167-168 a cross section of 50 states Linear Specification Cans = 515 - 242 Price + 1.19 Income + 2.91Temp Predictor Coeff StDev T P Constant 514.8 113.2 4.55 0.000 Price -241.80 43.65 -5.54 0.000 Income 1.195 1.688 0.71 0.483 Temp 2.9136 0.7071 4.12 0.000 R-Sq = 69.8% R-Sq(adj) = 67.7% The Price elasticity in Wyoming is = (DQ/DP)(P/Q) = -241.8(2.31/102)= -5.476

  20. Double Log Soft Drink Case Ln Cans = 2.47 - 3.17 Ln Price + 0.202 Ln Income + 1.12 Ln Temp Predictor Coef Std Dev T P Constant 2.466 1.385 1.78 0.082 Ln Price -3.1695 0.6485 -4.89 0.000 Ln Income 0.2020 0.1834 1.10 0.277 Ln Temp 1.1196 0.2611 4.29 0.000 R-Sq = 67.4% R-Sq(adj) = 65.1% Characterize the demand for soft drinks in the US. Are soft drinks inelastic? Are they luxuries? Which specification fits the data better?

More Related