420 likes | 856 Views
Econometric Problems. Econometric problems, detection & fixes? Hands on problems. Regression Diagnostics . The required conditions for the model assessment to apply must be checked. Is the error variance constant? Are the errors independent? Is the error variable normally distributed?
E N D
Econometric Problems • Econometric problems, detection & fixes? • Hands on problems
Regression Diagnostics • The required conditions for the model assessment to apply must be checked. • Is the error variance constant? • Are the errors independent? • Is the error variable normally distributed? • Is multicollinearity a problem?
Econometric Problems • Heteroskedasticity (Standard Errors not reliable) • Non-normal distribution of error term (t & F-stats not reliable) • Autocorrelation or Serial Correlation (Standard Errors not reliable) • Multicolinearity (t-stats may be biased downward)
^ y + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + • Is the error variance constant? (Homoskedasticity) • When the requirement of a constant variance is met we have homoskedasticity. Residual + + + + + + + + + + + + + + ^ + + y + + + + + + + + The spread of the data points does not change much.
^ y + + + + + + + + + + + + + + + + + + + + + + + ^ The spread increases with y Heteroskedasticity • When the requirement of a constant variance is violated we have heteroskedasticity. The plot of the residual Vs. predicted value of Y will exhibit a cone shape. Residual + + + + + + + + + + + + + ^ + + + y + + + + + + + +
Heteroskedasticity • When the variance of the error term is different for different values of X you have heteroskedasticity. • Problem: The OLS estimators for the’s are no longer minimum variance. You can no longer be sure that the value you get for bi a lies close to the true i.
Detection/ Fix for heteroskedasticity • Detection: Plot of Residual Vs. Predicted Y exhibits a cone or megaphone shape. • Advanced Test: White test (Use Chi-square stat) • Fix: White Correction (Uses OLS but keeps heteroskedasticity from making the variance of the OLS estimators swell in size)
heteroskedasticity EX1No Megaphone pattern No heteroskedasticity
White Test • Ho: No heteroskedasticity H1: Have heteroskedasticity White’s Chi –Square stat is N*R-square from regression of residual squared (ei)2 against X1,…Xk and their squared terms.
White Test • White’s Test Stat > Chi-Square from table (with d.f. = # of slope coefficients) => reject Homoskedasticity, you have a problem with heterokcedasticity White’s Test Stat < Chi-Square from table (with d.f. = # of slope coefficients) => Fail to reject Homoskedasticity, you do not have a problem with heteroskedasticity
12.9 < 16.92 (Chi-Square, 95% Confidence, 9 df) Fail to reject homoskedasticity
White Test EX 2 33.75 > 11.07 (Ch—Square 95% confidence, d.f. = 5%) => Have heteroskedasticity
Non-normality of Error • If the assumption that e is distributed normally is called into question we cannot use any of the t-test, F-tests or R-square because these tests are based on the assumption that e is distributed normally. The results of these tests become meaningless.
Non-normality of Error term • Indication of Non-normal Distribution: • Histogram of residuals does not look normal • (Formal test Jarque-Bera Test)
Non-normality of Error term • If the JB stat is smaller than a Chi-square with 2 degrees of freedom (5.99 for 95% significance level) then you can relax!! Your error term follows a normal distribution. • If not, get more data or transform the dependent variable. Log(y), Square Y, Square Root of Y, 1/Y etc.
Jarque-Bera Test for non-normality of error term • JB stat = 45 > 5.99 (Chi-square, 95% Confidence df = 2) => Error term not normally distributed
Additional Fixes for Normality • Bootstrappig – Ask your advisor if this approach is right for you • Appeal to large, Assymptotic Sample theory. If your sample is large enough the t-and F tests are valid approximately.
SERIAL CORRELATION Or AUTOCORRELATION Patterns in the appearance of the residuals over time indicates that autocorrelation exists. Residual Residual + + + + + + + + + + + + + + 0 0 Time Time + + + + + + + + + + + + + + Note the runs of positive residuals, replaced by runs of negative residuals Note the oscillating behavior of the residuals around zero.
Positive first order autocorrelation occurs when consecutive residuals tend to be similar. Then, the value of d is small (less than 2). Positive first order autocorrelation + Residuals + + + 0 Time + + + + Negative first order autocorrelation Negative first order autocorrelation occurs when consecutive residuals tend to markedly differ. Then, the value of d is large (greater than 2). Residuals + + + 0 + + Time + +
Autocorrelation or Serial Correlation • The Durbin - Watson Test • This test detects first order auto-correlation between consecutive residuals in a time series • If autocorrelation exists the error variables are not independent
One tail test for positive first order auto-correlation • If d<dL there is enough evidence to show that positive first-order correlation exists • If d>dU there is not enough evidence to show that positive first-order correlation exists • If d is between dL and dU the test is inconclusive. • One tail test for negative first order auto-correlation • If d>4-dL, negative first order correlation exists • If d<4-dU, negative first order correlation does not exists • if d falls between 4-dU and 4-dL the test is inconclusive.
First order correlation does not exist First order correlation does not exist First order correlation exists Inconclusive test First order correlation exists Inconclusive test • Two-tail test for first order auto-correlation • If d<dL or d>4-dL first order auto-correlation exists • If d falls between dL and dU or between 4-dU and 4-dL the test is inconclusive • If d falls between dU and 4-dU there is no evidence for first order auto-correlation 0 dL dU 2 4-dU 4-dL 4
Example • How does the weather affect the sales of lift tickets in a ski resort? • Data of the past 20 years sales of tickets, along with the total snowfall and the average temperature during Christmas week in each year, was collected. • The model hypothesized was TICKETS=b0+b1SNOWFALL+b2TEMPERATURE+e • Regression analysis yielded the following results:
The model seems to be very poor: • The fit is very low (R-square=0.12), • It is not valid (Signif. F =0.33) • No variable is linearly related to Sales Diagnosis of the required conditions resulted with the following findings
Residual vs. predicted y The error distribution The error variance is constant The errors may be normally distributed Residual over time The errors are not independent
Test for positive first order auto-correlation: n=20, k=2. From the Durbin-Watson table we have: dL=1.10, dU=1.54. The statistic d=0.59 Conclusion: Because d<dL , there is sufficient evidence to infer that positive first order auto-correlation exists. Using the computer - Excel Tools > data Analysis > Regression (check the residual option and then OK) Tools > Data Analysis Plus > Durbin Watson Statistic > Highlight the range of the residuals from the regression run > OK The residuals
The modified regression model TICKETS=b0+ b1SNOWFALL+ b2TEMPERATURE+ b3YEARS+e The autocorrelation has occurred over time. Therefore, a time dependent variable added to the model may correct the problem • All the required conditions are met for this model. • The fit of this model is high R2 = 0.74. • The model is useful. Significance F = 5.93 E-5. • SNOWFALL and YEARS are linearly related to ticket sales. • TEMPERATURE is not linearly related to ticket sales.
Multicollinearity • When two or more X’s are correlated you have multicollinearity. • Symptoms of multicollinearity include insignificant t-stats (due to inflated standard errors of coefficients) and a good R-square. • Test: Run a correlation matrix of all X variables. • Fix: More data, combine variables.
Multicolinearity Ex: Xm 19-02 • Diagnostic Correlation Matrix • Correlation Coefficient over 0.5 => Problem with Multicollinearity