470 likes | 489 Views
Forecasting Box-Jenkins method (tseries). By Mike Jadoo. Special Thank you. MLK library Digital commons lab. Purpose . Bring about an awareness Enable individuals to properly create analysis Select the most appropriate model(s). Lecture Structure.
E N D
Forecasting Box-Jenkins method(tseries) By Mike Jadoo
Special Thank you MLK library Digital commons lab
Purpose • Bring about an awareness • Enable individuals to properly create analysis • Select the most appropriate model(s).
Lecture Structure • Slides, code, datasets are in the groups Meetup files section
Lecture Structure • Using R i386 3.2.3 or R studio • Can follow lecture and hack or hack after • There will be some web exercises
Data series Data: Openings - not seasonally adjusted (NSA) Time period: 12-01-2000 to 05-01-2015 Frequency: Monthly Source: Jobs Openings Layoffs Turnovers Separations (JOLTS); Bureau of Labor Statistics
Forecasting “a process of making statements about events or predicting values which actual outcomes have not yet been observed” There is no such things as a perfect forecast.
Some types of Forecasting • Navie method- forecast based solely on previous period • Average • Application of growth rate • Judgmental methods (based on subjective probability)
History of Forecasting ARIMA • George Box: became an accidental statistician. Started off as a chemist in school but joined the British army in WWII and eventually became a statistician. • Gwilym Meirion Jenkins: was a Welsh statistician and systems engineer. Meet Box in Wisconsin and started sharing ideas about time series analysis.
Technical Note • ARMA- autoregressive moving average “AR”- autoregressive “MA”- moving average • ARIMA- autoregressive integrated moving average • Autoregressive- current and past values have an impact on forecast. • Moving average- past unobserved values have an impact on forecasts.
Technical Note • ARMA(p,q) “p” is AR order “q” is the MA order • ARIMA(p,d,q) “d” is the difference order
Seasonal autoregressive iterative moving average.SARIMA(P,D,Q) P=number of seasonal autoregressive (SAR) terms, D=number of seasonal differences, Q=number of seasonal moving average (SMA) terms • Now the capital P and Q denotes the location of AR and MA order.
Scatter/ line Plot Methodology Summary Statistics 2 Data sources Historogram Accessing Data sources Data exploration Missing Observations 1 Report results Reporting Scoring Interpretation Analysis Model building Statistics of Fit 3 Interpretation of Coefficients Create Model(s) Testing/Checking Assumptions 4
Forecasting Flow Chart Review topics theory and methodology Plot Series Apply Transformation Test series for stability Model Selection Create the model Reformulate the model Check the Parameter estimates and residuals Forecast
TASK • What are you trying to forecast (data series-cycle) • How many periods ahead- statics/dynamic • Go over the topics relative theory -May involve extensive reading but it is the good first start!!
Finding the data sources -Government entities (the good first start!!) Can’t find the data your looking for? Staff is there to help. -There are more providers of data, some have a cost. -Construct it yourself
Finding the data sources • Review the data series methodology • Number of observations- 20 to 25 minimum
Forecasters Checklist • Plot your series • Check for normality • Check for determinist/seasonal trend • Data smoothing methods • Create the correlogram / model selection method • Create the forecast • Check residuals for White Noise • Statistics of Fit
Line Plot attach(df) ** Line plot ** plot(df, type="l", main="Scatterplot Example", xlab="DATE ", ylab="JOB OPENINGS ", pch=19)
Descriptive Statistics • library(psych) • describe(df$JTUJOL)
Missing Observations • Data is never clean there will always be a need to fix the series that you are working with. • Many ways to fix missing observations. Up to the researcher discretion. Document your process.
Dealing with Missing observations • Some options: Remove NA's from the matrix removeNA(x) Subsitute NA's with zeros or the mean or median substituteNA(X, type = "zeros") substituteNA(X, type = "mean") substituteNA(X, type = "median") Linear interpolation interpNA(x, method = "linear")
Test for Normality • Create a histogram • Jarque Bera Test jarqueberaTest(df$JTUJOL, title = NULL, description = NULL) Title: Jarque - Bera Normalality Test Test Results: STATISTIC: X-squared: 1.1813 P VALUE: Asymptotic p Value: 0.554 Hypothesis Ho: normally distributed Ha: non normal
Trends • Determinist and Seasonal • Use the ts() function then the tslm( ) function #monthly series myts <- ts(df, start=c(2000, 12), end=c(2015, 05), frequency=12) create a lm model to check for the seas and det trend fit <- tslm(myts ~ trend + season) summary(fit)
Data Smoothing Methods • Moving average – used to smooth time series that appears to have either large outliers or patterns • Exponential smoothing methods- depend on the type of data your using. simple exponential- has no trend and no seasonality double exponential – has a trend but no seasonality triple exponential – has a trend and seasonal components
Data Smoothing Methods • Seasonal Adjustments: X-12-ARIMA X-13-ARIMA-SEATS (there is no R package that replicates this process, just a wrapper) • Weather adjustment algorithm
Checking for stationary • Augmented Dickey-Fuller Test • Kwiatkowski–Phillips–Schmidt–Shin (KPSS) tests -used for testing a null hypothesis for stationary around a deterministic trend.
Checking for stationary • ADF
Checking for stationary • KPSS
Checking for stationary • Using first differenced variable KPSS
Correlogram R code: par(mfrow=c(2,1)) acf(gdf$fdJTUJOL) pacf(gdf$fdJTUJOL)
ARIMA (p,d,q)(p-AR, d=differencing, q=MA) • The AIC and SIC are what determines that most parsimonious model(s) • Run several ARIMA forecast and collect the AIC/SIC using an AR MA matrix • Run auto.arima()
ARIMA selection auto.arima()
ARIMA (4,0,2) • Out of sample forecast Forecast.Arima(x, h=?) h= stands for n periods ahead
White noise(residuals analysis) • Create a ACF correlogram of the residuals • Construct the Ljung-Box box test
White noise(residuals analysis) • Create a ACF correlogram of the residuals acf(forecast$residuals, lag.max=10)
White noise(residuals analysis) • Construct the Ljung-Box test Box.test(forecast$residuals, lag=10, type="Ljung-Box")
Statistics of fit • Method of grading your model compare to others
Model Interpretation • AR- the previous values of openings data has an ____ effect on the forecasted estimate. • MA- the previous unobserved values of openings data has an ____ effect on the forecasted estimate.
Presentation • Use combination of tabular form and line charts
Summary • Orientation • History • Statistical model process • Data sources • Forecasting model in R • Interpretation of models parameter estimates • Reporting
Final Note • George Box- Accidental Statistician
Coming Soon • Forecasting stock prices • Vector Auto-regression (VAR)
Announcements • Last session: Geospatial Analysis with R • Survey- your feedback is welcomed