410 likes | 566 Views
DSCI 5340: Predictive Modeling and Business Forecasting Spring 2013 – Dr. Nick Evangelopoulos. Lecture 7: Box-Jenkins Models – Part II (Ch. 9). Material based on: Bowerman-O’Connell-Koehler, Brooks/Cole. Homework in Textbook. Page 438 Ex 9.2, Ex 9.3, Ex 9.4. Ex 9.2 Page 438.
E N D
DSCI 5340: Predictive Modeling and Business ForecastingSpring 2013 – Dr. Nick Evangelopoulos Lecture 7: Box-Jenkins Models – Part II (Ch. 9) Material based on: Bowerman-O’Connell-Koehler, Brooks/Cole
Homework in Textbook Page 438 Ex 9.2, Ex 9.3, Ex 9.4
Ex 9.3 Page 438 Part a Autocorrelations Dies Down Slowly – Series is Not Stationary
Ex 9.3d Page 438 Autocorrelations Cut off Quickly – Series is Stationary
Ex 9.3e Page 438 Interpret SAC & SPAC • SAC dies exponentially and • SPAC cuts off after Lag 1, therefore…
Ex 9.4a Page 438 • …the series is AR(1)
Ex 9.4d Page 439 part 1 • y3hat = 3.06464 + (.64774 + 1)y2– .64774y1 • y3hat = 3.06464 + (.64774 + 1)*239 -.64774*235 • y3hat = 244.6556 • y3 - y3hat =244.090 - 244.6556 = -.5656
Ex 9.4d Page 439 part 2 • At time origin 90, • Y91hat = 3.06464 + (.64774 + 1)y90– .64774y89 • Y91hat = 3.06464 + (.64774 + 1)*1029.480 - .64774*1018.420 • Y91hat = 1039.708 • Y92hat = 3.06464 + (.64774 + 1)y91hat– .64774y90 • Y92hat = 3.06464 + (.64774 + 1)*1039.708 - .64774*1029.480 • Y92hat = 1049.398 • Y93hat = 3.06464 + (.64774 + 1)y92hat - .64774y91hat • Y93hat = 3.06464 + (.64774 + 1)* 1049.398 - .64774*1039.708 • Y93hat = 1058.739
Autoregressive Moving Average Models A time series that is a linear function of p past values plus a linear combination of q past errors is called anautoregressive moving average process of order (p,q), denoted ARMA(p,q). Also, denoted ARIMA(p,0,q)
Box-Jenkins ARIMAX Models • ARIMAX: AutoRegressive Integrated Moving Average with eXogenous variables • AR: Autoregressive Time series is a function of its own past. • MA: Moving Average Time series is a function of past shocks (deviations, innovations, errors, and so on). • I: Integrated Differencing provides stochastic trend and seasonal components, so forecasting requires integration (undifferencing). • X: Exogenous Time series is influenced by external factors. (These input variables can actually be endogenous or exogenous.)
Determine Whether the SAC or the SPAC is Cutting Off More Abruptly
What if SAC and SPAC Are Not Significant for any Lags? • This could happen if the time series is white noise:
The backshift operator Bk (sometimes Lk is used) shifts a time series by k time units. Shift 1 time unit Shift 2 time units Shift k time units The backshift operator notation is a convenient way to write ARMA models. The Backshift Operator
ACF and PACF after 1-Lag Differencing Indication of MA(1) or MA(2) with sharp cut-off after lag 2 Damping pattern eliminates AR possibility
Classical Decomposition (Box-Jenkins) Procedure Verify presence of any seasonal or time-based trends Achieve data stationarity using techniques such as “Differencing” where you difference consecutive data points up to N-lag Use sample Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF) to see if the data follows Moving Average (MA) or Auto-regressive (AR) process, respectively p – MA order d – Differencing order q – AR order “Goodness of Fit” tests (e.g., Akaike Information Criterion) on the selected model parameters to find model fits that are statistically significant
Stationarity of the AR process • If an AR model is not stationary, this implies that previous values of the error term will have a non-declining effect on the current value of the dependent variable. • This implies that the coefficients on the MA process would not converge to zero as the lag length increases. • For an AR model to be stationary, the coefficients on the corresponding MA process decline with lag length, converging on 0.
AR Process • The test for stationarity in an AR model (with p lags) is that the roots of the characteristic equation lie outside the unit circle (i.e. > 1), where the characteristic equation is:
Unit Root • When testing for stationarity for any variable, we describe it as testing for a ‘unit root’, this is based on this same idea. • The most basic AR model is the AR(1) model, on which most tests for stationarity are based, such as the Dickey-Fuller test.
Unit Root Test • (L is the backshift operator) • (This is the characteristic equation)
Unit Root Test • With the AR(1) model, the characteristic equation of (1-z)= 0, suggests that it has a root of z = 1. This lies onthe unit circle, rather than outside it, so we conclude that it is non-stationary. • As we increase the lags in the AR model, so the potential number of roots increases, so for 2 lags, we have a quadratic equation producing 2 roots, for the model to be stationary, they both need to lie outside the unit circle.
Model: Null Hypothesis: Alternative Hypothesis: One Example: The Dickey-Fuller Single Mean Test
Mean of an AR(1) Process • The (unconditional mean) for an AR(1) process, with a constant (μ) is given by: • For ϕ1 = 1, the mean drifts to infinity and the process is non-stationary
Variance of an AR(1) Process • The (unconditional) variance for an AR process of order 1 (excluding the constant) is: • For ϕ1 = 1, the variance drifts to infinity and the process is non-stationary
ADF – Augmented Dickey Fuller Test for Unit Root procarima data = TowelSales; identify var = y(1) nlag=15 stationarity = (adf = (2)); title "ARIMA Stationarity Analysis"; run; Type Lags Rho Pr < Rho Tau Pr < Tau F Pr > F Zero Mean 2 -112.754 0.0001 -6.09 <.0001 Single Mean 2 -112.743 0.0001 -6.07 <.0001 18.40 0.0010 Trend 2 -120.735 0.0001 -6.18 <.0001 19.12 0.0010 Reject unit root – conclude AR(2) is stationary.
Scan Procedure – Use for Preliminary Estimate procarima data = TowelSales; identifyvar = y(1) nlag=15 scan; title "ARIMA Analysis"; run; In this example, ARIMA(2,2) is the simplest model that yields insignificant terms: Model notation:
Tentative Model from Output –MA(1) or ARIMA(0,0,1) • Thesimplest model that has a high probability is MA(1): • AR(1) has a low probability • AR(2) is more complex • ARMA(1,1) is more complex
Forecast Model Building:Fit and Holdout samples Fit Sample Holdout Sample • Used to estimate model parameters for accuracy evaluation • Used to forecast values in holdout sample • Used to evaluate model accuracy • Simulates retrospective study Full = Fit + Holdout data is used to fit deployment model
Homework in Textbook Page 443-445 Ex 9.5, Ex 9.6 Ex 9.7