450 likes | 1.77k Views
Autocorrelation in Regression Analysis. What is Autocorrelation? What causes Autocorrelation? Tests for Autocorrelation Examples Durbin-Watson Tests Modeling Autoregressive Relationships. What is Autocorrelation?. Correlation between values of the same variable across observations
E N D
Autocorrelation in Regression Analysis • What is Autocorrelation? • What causes Autocorrelation? • Tests for Autocorrelation • Examples • Durbin-Watson Tests • Modeling Autoregressive Relationships Bush 632 Lecture 12a
What is Autocorrelation? • Correlation between values of the same variable across observations • Violation of the assumption: • where: • In the presence of autocorrelation, the function of Y can be expressed as: • the function: • where • defined as: Bush 632 Lecture 12a
What is Autocorrelation? Bush 632 Lecture 12a
Where do we find Autocorrelation? • Autocorrelation is most often a problem in time series or geographic data • It reflects changes in data that are a function of proximity in time or space • Examples • Energy market price shocks • Transitions depend on prior states • Economic consequences of LULUs • Distance from hazard influences magnitude of price effect Bush 632 Lecture 12a
Federal Budget Example: • Incrementalists argue that the federal budget shifts only incrementally from the prior year’s budget. • Partial Effects • Calculating partial effects; interpretation • Variable selection and model building • Risks in model building Bush 632 Lecture 12a
Two types of Autocorrelation • Positive autocorrelation • This is what we normally find. If the autocorrelation is positive, then we expect the sign of the residual at t to be the same as at t-1. Bush 632 Lecture 12a
Negative Autocorrelation • We find that the sign of the residual at t is the opposite of that at t-1 • Example: a drunken amble Bush 632 Lecture 12a
What causes autocorrelation? • Misspecification • Data Manipulation • Before receipt • After receipt • Event Inertia • Spatial ordering Bush 632 Lecture 12a
Positive Zone of No Autocorrelation Zone of Negative autocorrelation indecision indecision autocorrelation |_______________|__________________|_____________|_____________|__________________|___________________| 0 d-lower d-upper 2 4-d-upper 4-d-lower 4 Autocorrelation is clearly evident Ambiguous – cannot rule out autocorrelation Autocorrelation in not evident Checking for Autocorrelation • Test: Durbin-Watson statistic: Bush 632 Lecture 12a
Consider the following regression: From Statistics option in SPSS Bush 632 Lecture 12a
Find the D-upper and D-lower • Check a Durbin Watson table for the numbers for d-upper and d-lower. • In Hamilton that’s on pp. 355-356 • For n=20 and k=2, α = .05 the values are: • Lower = 1.20 • Upper = 1.41 • Because our value falls between zero and d-lower we have positive autocorrelation Bush 632 Lecture 12a
The Runs Test • An alternative to the D-W test is a formalized examination of the signs of the residuals. We would expect that the signs of the residuals will be random in the absence of autocorrelation. • The first step is to estimate the model and predict the residuals. • Next, order the signs of the residuals against time (or spatial ordering in the case of cross-sectional data) and see if there are excessive “runs” of positives or negatives. Alternatively, you can graph the residuals and look for the same trends. Bush 632 Lecture 12a
Runs test continued The final step is to use the expected mean and deviation in a standard t-test Bush 632 Lecture 12a
More on The D-W • D-W is not appropriate for auto-regressive (AR) models, where: • In this case, we use the Durbin alternative test • For AR models, need to explicitly estimate the correlation between Yi and Yi-1 as a model parameter • Techniques: • AR1 models (closest to regression; 1st order only) • ARIMA (any order) Bush 632 Lecture 12a
Dealing with Autocorrelation • There are several approaches to resolving problems of autocorrelation. • Lagged dependent variables • Differencing the Dependent variable • GLS • ARIMA Bush 632 Lecture 12a
Lagged dependent variables • The most common solution • Simply create a new variable that equals Y at t-1, and use as a RHS variable • This correction should be based on a theoretic belief for the specification • Can, at times cause more problems than it solves • Also costs a degree of freedom (lost observation) • There are several advanced techniques for dealing with this as well Bush 632 Lecture 12a
Differencing • Differencing is simply the act of subtracting the previous observation value from the current observation. • This process is effective; however, it is an EXPENSIVE correction • This technique “throws away” long-term trends • Assumes the Rho = 1 exactly Bush 632 Lecture 12a
GLS and ARIMA • GLS approaches use maximum likelihood to estimate Rho and correct the model • These are good corrections, and can be replicated in OLS • ARIMA is an acronym for Autoregressive Integrated Moving Average • This process is a univariate “filter” used to cleanse variables of a variety of pathologies before analysis Bush 632 Lecture 12a
Corrections based on Rho • There are several ways to estimate rho, the most simple being calculating it from the residuals We then estimate the regression by transforming the regressors so that: and This gives the regression: Bush 632 Lecture 12a
Estimating the relationship between X and Y • First, we can estimate the lagged dependent variable model. Bush 632 Lecture 12a
Now the regression correcting for Rho • We can estimate Rho by calculating it. • ρ = .587 Bush 632 Lecture 12a
Final thoughts • Each correction has a “best” application. • If we wanted to evaluate a mean shift (dummy variable only model), calculating rho will not be a good choice. Then we would want to use the lagged dependent variable • Also, where we want to test the effect of inertia, it is probably better to use the lag • In Small N, calculating rho tends to be more accurate Bush 632 Lecture 12a
Homework • Using the data that accompany this lecture, estimate the effect of X on Y. • Run the regular regression, a lagged dependent variable model and calculate rho. • Next, test the effect of dummy variable X2 on the series Y2. • Run a regular regression, then run a regression with a lagged dependent variable. • Write a brief description of what problems neglecting the effect of time in the second model might cause a decision-maker Bush 632 Lecture 12a
BreakComing up… • Review for Exam • Exam Posting • Available on Wednesday Morning, 10am Bush 632 Lecture 12a