280 likes | 581 Views
A short introduction to applied econometrics Part B: Regression Analysis. presented by Dipl. Volkswirt Gerhard Kling. The basic concept of regressions. Aim: Identify partial impact of explanatory variable on dependent variable Coefficient=partial derivative
E N D
A short introduction to applied econometricsPart B: Regression Analysis presented by Dipl. Volkswirt Gerhard Kling
The basic concept of regressions • Aim: Identify partial impact of explanatory variable on dependent variable • Coefficient=partial derivative • Evaluate the significance of the impact • Null hypotheses: = 0 • Derive t-statistic and p-values • Total fit of the model – measured by R2 • Check whether assumptions are fulfilled
Estimated Coefficients • Interpretation • Partial impact of an explanatory variable • Slope of the regression line • Change of interpretation if interaction term (see dummies!) • We can assess the magnitude of impact • If saving rate increases by one marginal unit, then growth rate will increase by 0.055 units • First impression: Model overstates the influence of Population growth and saving rate
Significance • Null hypotheses: Variable has no impact • Test for: coefficient = 0 • Derive a Pivot variable – standardizing • Student distribution and t-statistic • Compare theory (null) with empirical finding • Test for total model quality • F-test: Joint hypotheses: All variables have no impact • R2 and adjusted R2: what portion of the total variation in dependent variable can be explained?
Interpretation of the first model • Only saving rate has a significant impact • Direction of impact is confirmed • But: Model overstates the influence of both variables on growth rates • We can only explain 4% of the variability in growth using this model • But: F-test tells us that the model possesses explanatory power
The CLR assumptions • Non-linearity: histograms, scatter plots, test procedures (Ramsey RESET) • Spherical error terms • Heteroscedasticity: variance not constant • Autocorrelation: error term depends on lagged values (see panel analysis) • Inspiring residual plots and several tests • Multicollinearity • Correlation matrix • Auxiliary regression – use of residual • Additional issues: exogenity etc. (skipped)
What happens if assumption are violated? • Non-linearity: • We assume false relation • Biased results • Non-spherical error terms • t-statistic and p-values are biased • We draw false conclusions • Multicollinearity • Difficulty to detect patial effect • Sometimes of minor importance – total fit
Non-linearity • Idea: Your assumed linear relation is false • This causes „strange“ patterns of residuals • Formal test procedure: Ramsey RESET • First tool: histogram, scatter plot, and transformation • Second tool: residual plots • Third tool: Ramsey RESET test
Residual plot No strange pattern detectable! Looks similar for saving rate
Ramsey RESET test Two different procedures
Heteroscedasticity • Inspiring residual plots or squared residuals – not always useful! • Several test procedures • White test: very general test statistic • Breusch-Pagan: tests specific form of heteroscedasticity • Breusch-Pagan / Cook-Weisberg
Have we detected heteroscedasticity? • Ambiguous results – depend on power of tests • Practical hint: look at residual plot (residuals plotted against fitted values) • If variability stays constant then maybe other problems (non-linearity) • Hint: Use square root respectively logarithm to transform dependent variable – test again!
Residual plot No evidence for increase in variability!
What to do now? • Note that the White test is very general (see Green (2000)) • Advantage: you can test for every form of heteroscedasticity • Disadvantage: power of test often bad • Hence, null too often rejected • Test is also sensitive if omitted variable bias (especially squared expl. Variables) • Note: Ramsey RESET tells us that OV is a problem!
Multicollinearity Negative correlation Only: –0.29 Rule of thumb: Higher than 0.8 or 0.9 in absolute value
Multicollinearity • Calculate correlation matrix • If correlation in absolute value > 0.8, multicollinearity is likely to affect results • If many explanatory variables: Auxiliary regressions helpful – can also help to specify a model: residual of auxiliary regression! • If high multicollinearity, then... • Do nothing – maybe only model fit important • Use additional information – sample size! • Note: High R2 but insignificant coefficients is a clue for multicollinearity
Conclusion: Is our model good enough? • Non-linearity can be excluded • But: White test and Ramsey RESET point in the direction of OV bias! • Problem: Biased coefficients because important variables neglected! • Heteroscedasticity and multicollinearity are not observed
Extension: Dummy variables • Allowing heterogeneity across observations • Individual intercepts • Individual slope coefficients • Danger of overspecification • Importance of the reference group
Interpretation of dummies Up to this point: Shift in intercept allowed But: Also different slopes!
What should we do now? • Model with intercept and slope coefficient • Too few observations – like splitting the sample! • Accuracy of estimation is low • Hence, I would prefer just to include an intercept dummy • Does this dummy solve the OV problem? • But: Ramsey RESET tells you that there is still a problem to be solved!