220 likes | 851 Views
IT Applications in Business Analytics. Lecture 09 – Time Series Regression Thomas Zeutschler. Business Analytics (M.Sc.) IT in Business Analytics. Let’s get started…. „Prediction is very difficult, especially when it‘s about the future.“ Niels Bohr. Regression Analysis.
E N D
IT Applications in Business Analytics Lecture 09 – Time Series Regression Thomas Zeutschler Business Analytics (M.Sc.) IT in Business Analytics IT Applications in Business Analytics - 09. Time Series Regression
Let’s get started… „Prediction is very difficult, especially when it‘s about the future.“ Niels Bohr IT Applications in Business Analytics - 09. Time Series Regression
Regression Analysis • Regression analysis is a class of statistical methods, to describe the relation between one dependent and one or more independent variables. • Many economical time series have robust relations. • Oil price > fuel price • US 12m avg. fuel price > engine size (Hubraum) of US cars. • Average US income > engine size (Hubraum) of US cars. IT Applications in Business Analytics - 09. Time Series Regression
Time Series Dependencies IT Applications in Business Analytics - 09. Time Series Regression
Nonlinear Time Series IT Applications in Business Analytics - 09. Time Series Regression
Nonlinear Time Series • A nonlinear time series (process) is any stochastic process that is not linear. • Nonlinear time series are generated by nonlinear dynamic equations. Their display features cannot be modelled by linear processes: • time-changing variance, • Asymmetric cycles, • higher-moment structures, • thresholds and breaks. IT Applications in Business Analytics - 09. Time Series Regression
Time Series in R • ASTSA R Package • A collection of time series analysis methods • A package containing some sample data sets • By David Stoffer, “Data sets and scripts for Time Series Analysis and Its Applications: With R Examples”, http://www.stat.pitt.edu/stoffer/tsa3/ IT Applications in Business Analytics - 09. Time Series Regression
Time Series – Use Case • El Niño and the Fish • Southern Oscillation Index, or SOI, gives an indication of the development and intensity of El Niño or La Niña events in the Pacific Ocean. • The SOI is calculated using the pressure differences between Tahiti and Darwin. • SOI = 10 x • Pdiff = (average Tahiti MSLP for the month) - (average Darwin MSLP for the month),Pdiffav = long term average of Pdiff for the month in question, and SD(Pdiff) = long term standard deviation of Pdiff for the month in question. (Pdiff – Pdiffav) SD(Pdiff) IT Applications in Business Analytics - 09. Time Series Regression
Time Series – Use Case IT Applications in Business Analytics - 09. Time Series Regression
Time Series – Use Case • El Niño and the Fish • Fish RecruitmentA measure of the fish population in the southern hemisphere. library(astsa) # R-Package with Data sets and scripts for Time Series Analysis # Southern Oscillation Index (SOI) for a period of 453 months ranging # over the years 1950-1987. soi = scan("soi.dat") soi = ts(soi) # Fish recruitment (number of new fish) for a period of 453 months ranging # over the years 1950-1987. rec = scan("recruit.dat") rec = ts(rec) IT Applications in Business Analytics - 09. Time Series Regression
Time Series – Use Case • El Niño and the Fish • Let’s Take a look at Auto-Covariance and Correlation… • What does it tell us? # Auto-Covariance and -Correlation function estimation for REC acf(rec) # Partial Auto-Covariance and -Correlation function estimation for REC pacf(rec) # Cross-Covariance and -Correlation function estimation for REC & SOI ccf(soi,rec) IT Applications in Business Analytics - 09. Time Series Regression
Time Series – Use Case • El Niño and the Fish • Let’s do a visual analysis of SOI and REC # Visual coorelation analysis lag2.plot(soi, rec, 10) IT Applications in Business Analytics - 09. Time Series Regression
Time Series – Use Case • El Niño and the Fish • Data preparation for the setup of a prediction model… # create a tablewithshiftedtimeseries. # Just keepperiodswherefor all periodsthereis a valueusing 'ts.intersect()' alldata = ts.intersect(rec, reclag1 = lag(rec,-1), reclag2 = lag(rec,-2), soilag5 = lag(soi,-5), soilag6 = lag(soi,-6), soilag7 = lag(soi,-7), soilag8 = lag(soi,-8), soilag9 = lag(soi,-9), soilag10 = lag(soi,-10)) # showthetable alldata IT Applications in Business Analytics - 09. Time Series Regression
Time Series – Use Case • El Niño and the Fish • Build a linear model based on SOI -5 to -10 (into the past) # build a linear model (using 'lm()' function) # 1st try a multiple regression in whichthe REC variable is a linear function # of (past) lags 5, 6, 7, 8, 9, and 10 ofthe SOI variable # info: lm(formula, data) -> formatforformula := [response]~[terms] -> termsis tryit1 = lm( formula = rec ~ soilag5 + soilag6 + soilag7 + soilag8 + soilag9 + soilag10, data = alldata) summary(tryit1) # Visual analysis of prediction model plot(tryit1) IT Applications in Business Analytics - 09. Time Series Regression
Time Series – Use Case • El Niño and the Fish • Let’s take a look at the models residuals # plot and print ACF (Auto Correlated Function) and PACF (partial ACF) of REC & the model # info: residuals() is a generic function which extracts model residuals from objects returned by modeling functions acf2(rec) acf2(residuals(tryit1)) IT Applications in Business Analytics - 09. Time Series Regression
Time Series – Use Case • El Niño and the Fish • PACF > high values for t-1 and t-2 indicates auto correlation • Adjust the model and introduce REC for t-1 and t-2… # 2nd try a multiple regression in which the REC variable is a linear function # of (past) lags 5, 6, 7, 8, 9, and 10 of the SOI variable + 2 past values from REC tryit2 = lm(formula = rec ~ reclag1 + reclag2 + soilag5 + soilag6 + soilag7 + soilag8 + soilag9 + soilag10, data = alldata) summary(tryit2) acf2(residuals(tryit2)) IT Applications in Business Analytics - 09. Time Series Regression
Time Series – Use Case • El Niño and the Fish • Can we optimize or simplify the model? • Remove variable without significance: SOI t-7, t-8, t-9 and t-10 # 3rd try a multiple regression in which the REC variable is a linear function # of only 2 (past) lags 5 and 6 of the SOI variable + 2 past values REC tryit3 = lm(formula =rec~reclag1+reclag2+ soilag5+soilag6, data = alldata) summary(tryit3) acf2(residuals(tryit3)) IT Applications in Business Analytics - 09. Time Series Regression
Time Series – Use Case • El Niño and the Fish • Congratulations !!! • We have build a reliable model over the supposed dependency between El Niño and the fish replication rate. IT Applications in Business Analytics - 09. Time Series Regression
Lecture Summary & Homework IT Applications in Business Analytics - 09. Time Series Regression
Homework • Take the course… „Applied Time Series Analysis“ by Pennsylvania State Universityhttps://onlinecourses.science.psu.edu/stat510/node/33 IT Applications in Business Analytics - 09. Time Series Regression
Literatur • Take a look at „Nonlinear time series modelling. An Introduction“https://www.newyorkfed.org/medialibrary/media/research/staff_reports/sr87.pdf • Take a look at “Nonlinear Time Series, Theory, Methods and Application with R Examples”http://www.stat.pitt.edu/stoffer/nltsa/chs3_9_10.pdf • Books worth to spend money… • “Time Series Analysis: Forecasting and Control” Box, Jenkins 5th Ed. 2015http://www.amazon.com/Time-Analysis-Forecasting-Probability-Statistics/dp/1118675029 • “New Introduction to Multiple Time Series Analysis”,http://www.amazon.com/New-Introduction-Multiple-Time-Analysis/dp/3540262393 IT Applications in Business Analytics - 09. Time Series Regression
AnyQuestions? IT Applications in Business Analytics - 09. Time Series Regression