1.09k likes | 1.11k Views
Stand out from the crowd with a Data Science Course at Excelr, where we provide all the necessary support and placement assistance to all our students. Learn with experience and expert faculty.<br>https://www.excelr.com/data-science-course-training-in-pune/<br>
E N D
Forecasting Time Series
My Introduction Name: Bharani Kumar Educa+on: IIT Hyderabad Indian School of Business Professional cer+fica+ons: PMP PMI-ACP PMI-RMP CSM LSSGB LSSBB SSMBB ITIL Agile PM Project Management Professional Agile Cer4fied Prac44oner Risk Management Professional Cer4fied Scrum Master Lean Six Sigma Green Belt Lean Six Sigma Black Belt Six Sigma Master Black Belt Informa4on Technology Infrastructure Library Dynamic System Development Methodology Atern © 2013 ExcelR Solutions. All Rights Reserved
My Introduction 4 RESEARCH in ANALYTICS, DEEP LEARNING & IOT DATA SCIENTIST 3 2 DeloiHe Driven using US policies 1 Infosys Driven using Indian policies under Large enterprises ITC Infotech Driven using Indian policies SME HSBC Driven using UK policies © 2013 ExcelR Solutions. All Rights Reserved
AGENDA Why Forecas4ng Learn about the various examples of forecasHng Forecas4ng Strategy Learn about decomposing, forecasHng & combining EDA & Graphical Representa4on Forecas4ng Strategy Forecas4ng Decomposi4on components EDA & Graphical Representa4on Learn about exploratory data analysis, scaKer plot, Hme plot, lag plot, ACF plot Why Forecas4ng? Forecas4ng components Learn about Level, Trend, Seasonal, Cyclical, Random components Forecas4ng Models & Errors Learn about various forecasHng models to be discussed & the various error measures AGENDA
Why Forecasting Why forecast, when you would know the outcome eventually? Early knowledge is the key, even if that knowledge is imperfect – For seQng producHon schedules, one needs to forecast sales – For staffing of call centers, a company needs to forecast the demand for service – For dealing with epidemic emergencies, naHons should forecast the various flu • • © 2013 ExcelR Solutions. All Rights Reserved
Types of forecast Short Term or Long Term Micro Scale or Macro Scale Density Forecast Point Forecast Interval Forecast Forecas4ng Classifica4on Qualita4ve or Quan4ta4ve Data or Judgment © 2013 ExcelR Solutions. All Rights Reserved
Who generates Forecast? © 2013 ExcelR Solutions. All Rights Reserved
Who generates Forecast? © 2013 ExcelR Solutions. All Rights Reserved
Time series vs Cross-sectional data 01 Cross-sec4onal Data 02 Time Series Data © 2013 ExcelR Solutions. All Rights Reserved
Dataset for further discussion Month Jan-91 Feb-91 Mar-91 Apr-91 May-91 Jun-91 Jul-91 Aug-91 Sep-91 Oct-91 Nov-91 Dec-91 Jan-92 Feb-92 Mar-92 Apr-92 May-92 Jun-92 Footfall in thousands 1709 1621 1973 1812 1975 1862 1940 2013 1596 1725 1676 1814 1615 1557 1891 1956 1885 1623 t = 1, 2, 3,…....= Hme period index Yt = value of the series at Hme period t Yt+k = forecast for Hme period t+k, given data unHl Hme t et = forecast error for period t Monthly FooWalls of customers from Jan 1991 to March 2004 © 2013 ExcelR Solutions. All Rights Reserved
Forecasting Strategy 01 02 03 04 05 06 07 08 Define Goal Data Collec4on Explore & Visualize Series Pre-Process Data Par44on Series Apply Forecas4ng Method(s) Evaluate & Compare Performance Implement Forecasts / System © 2013 ExcelR Solutions. All Rights Reserved
Forecasting Strategy – Step 1 #1 Is the goal descriptive or predictive? #2 What is the forecast horizon? • DescripHve = Time Series Analysis • How far into the future? k in Yt+k • Rolling forward or at single Hme point? • PredicHve = Time Series ForecasHng Define Goal #4 Forecasting expertise & automation #3 How will the forecast be used? • In-house forecasHng or consultants? • Who are the stakeholders? • Numerical or event forecast? • How many series? How ofen? • Cost of over-predicHon & under-predicHon • Data & sofware availability © 2013 ExcelR Solutions. All Rights Reserved
Forecasting Strategy – Step 2 #1 Data Quality #2 Temporal Frequency • Typically small sample, so need good quality • Data same as series to be forecasted • Should we use real-Hme Hcket collecHon data? • Balance between signal & noise • AggregaHon / DisaggregaHon Data Collec-on #4. Domain exper4se #3 Series Granularity? • Necessary informaHon source • Affects modeling process from start to end • Level of communicaHon/ coordinaHon between forecasters & domain experts • Coverage of the data – Geographical, populaHon, Hme,… • Should be aligned with goal © 2013 ExcelR Solutions. All Rights Reserved
Forecasting Strategy Step3 (Explore Series) Season al PaHern s Addi4ve: Yt = Level + Trend + Seasonality + Noise Mul4plica4ve: Yt = Level x Trend x Seasonality x Noise Trend NON-SYSTEMATIC PART Noise Level Seasonal PaHerns SYSTEMATIC PART © 2013 ExcelR Solutions. All Rights Reserved
Trend Component • Persistent, overall upward or downward paKern • Due to populaHon, technology etc. • Overall Upward or Downward Movement • Several years duraHon Response Mo., Qtr., Yr. © 2013 ExcelR Solutions. All Rights Reserved
Seasonal Component • Regular paKern of up & down fluctuaHons • Due to weather, customs etc. • Occurs within one year • Example: Passenger traffic during 24 hours Summer Response Mo., Qtr. © 2013 ExcelR Solutions. All Rights Reserved
Irregular/Random/Noise Component • ErraHc, unsystemaHc, ‘residual’ fluctuaHons • Due to random variaHon or unforeseen events – Union strike – War • Short duraHon & nonrepeaHng © 2013 ExcelR Solutions. All Rights Reserved
Time Series Components © 2013 ExcelR Solutions. All Rights Reserved
Time Plot • Plots a variable against Hme index • Appropriate for visualizing serially collected data (Hme series) • Brings out many useful aspects of the structure of the data • Example: Electrical usage for Washington Water Power (Quarterly data from 1980 to 1991) © 2013 ExcelR Solutions. All Rights Reserved
Time plot Electrical power usage for Washington Water Power: 1980-1991 1100 1000 Power usage (KilowaHs) 900 800 700 600 500 400 1980 1982 1984 1986 1988 1990 Year © 2013 ExcelR Solutions. All Rights Reserved
Observations • There is a cyclic trend • Maximum demand in first quarter; minimum in third quarter • There may also be a slowly increasing trend (to be examined) • Any reasonable forecast should have cyclic fluctuaHons • Trend (if any) need to be uHlized for forecasHng • Forecast would not be exact – there would be some error © 2013 ExcelR Solutions. All Rights Reserved
Time plot © 2013 ExcelR Solutions. All Rights Reserved
Quarterly Sales of Ice-cream © 2013 ExcelR Solutions. All Rights Reserved
Scatter Diagram Cost 859 682 471 708 1094 224 320 651 1049 Age 8 5 3 9 11 2 1 8 12 • Plots one variable against another • One of the simplest tools for visualizaHon Example: Maintenance cost and Age for nine buses (Spokane Transit) This is an example of cross-secHonal data (observaHons collected in a single point of Hme) © 2013 ExcelR Solutions. All Rights Reserved
Scatter Plot 1200 Yearly cost of maintenance (US $) 1000 800 600 400 200 0 0 2 4 6 Age of bus 8 10 12 14 © 2013 ExcelR Solutions. All Rights Reserved
Observations • Older buses have higher cost of maintenance • There is some variaHon (case to case) • The rise in cost is about $ 80 per year of age • It may be possible to use ‘age’ to forecast maintenance cost • Forecast would not be a ‘certain’ predicHon – there would be some error © 2013 ExcelR Solutions. All Rights Reserved
Lag plot • Plots a variable against its own lagged sample • Brings out possible associaHon between successive samples • Example: Monthly sale of VCRs by a music store in a year = Number of VCRs sold in Hme period t = Number of VCRs sold in Hme period t – k © 2013 ExcelR Solutions. All Rights Reserved
Example of lagged variables Number of VCRs sold in a month Time 1 123 2 130 3 125 4 138 5 145 6 142 7 141 8 146 9 147 10 157 11 150 12 160 Original Lagged one step 123 130 125 138 145 142 141 146 147 157 150 Lagged two steps 123 130 125 138 145 142 141 146 147 157 © 2013 ExcelR Solutions. All Rights Reserved
Lag plot (k = 1) ScaHer plot of VCR sales with 1-step lagged VCR sales 160 155 150 145 140 135 130 125 120 120 125 130 135 140 145 150 155 160 © 2013 ExcelR Solutions. All Rights Reserved
Observations • There is a reasonable degree of associaHon between the original variable and the lagged one • Value of lagged variable is known beforehand, so it is useful for predicHon • AssociaHon between original and lagged variable may be quan+fied through a correlaHon © 2013 ExcelR Solutions. All Rights Reserved
Autocorrelation • CorrelaHon between a variable and its lagged version (one Hme-step or more) = ObservaHon in Hme period t = ObservaHon in Hme period t – k = Mean of the values of the series = AutocorrelaHon coefficient for k-step lag © 2013 ExcelR Solutions. All Rights Reserved
Standard error of rk • The standard error is The standard error of the mean esHmates the variability between samples whereas the standard deviaHon measures the variability within a single sample. • Increases progressively with k, but eventually reaches a maximum value • If the ‘true’ autocorrelaHon is 0, then the esHmate rk should be in the interval (– 2SE(rk), 2SE(rk)) 95% of the Hme • SomeHmes SE(rk) is approximated by © 2013 ExcelR Solutions. All Rights Reserved
Correlogram or ACF plot • Plots the ACF or AutocorrelaHon funcHon (rk) against the lag (k) • Plus-and-minus two-standard errors are displayed as limits to be exceeded for staHsHcal significance • Reveals lagged variables that can be potenHally useful for forecasHng © 2013 ExcelR Solutions. All Rights Reserved
Correlogram for VCR data © 2013 ExcelR Solutions. All Rights Reserved
ACF plot for electricity usage data © 2013 ExcelR Solutions. All Rights Reserved
Observations • Every alternate sample is large, many of them staHsHcally significant also • ACFs at lags 4, 8, 12, etc are posiHve • ACF at lags 2,6,10 etc are negaHve • All these pick up the seasonal aspect of the data • The data may be re-examined afer ‘removing’ seasonality © 2013 ExcelR Solutions. All Rights Reserved
ACF of de-seasoned KW data © 2013 ExcelR Solutions. All Rights Reserved
Observations • De-seasoned series has small ACFs • This part of the data has liKle forecasHng value © 2013 ExcelR Solutions. All Rights Reserved
Typical questions in exploratory analysis Is there a TREND? All the plots contain informaHon regarding these quesHons Is there a SEASONALITY? Are the data RANDOM? © 2013 ExcelR Solutions. All Rights Reserved
Time series plots © 2013 ExcelR Solutions. All Rights Reserved
Effect of omission of data on the Time series plot © 2013 ExcelR Solutions. All Rights Reserved
Effect of omission of data on the Time series plot © 2013 ExcelR Solutions. All Rights Reserved
Confusing kind of trend due to other type of scaling 20406080 20406080 y y 0 0 0 5 10 t 15 20 0 1 2 3 Log t 4.5 4.5 3.54 3.54 Log y Log y 2.53 2.53 0 5 10 t 15 20 0 1 2 3 Log t © 2013 ExcelR Solutions. All Rights Reserved
Few points on Plots Plot helps us to summarize & reveal paKerns in data Graphics help us to idenHfy anomalies in data Plot helps us to present a huge amount of data in small space & makes huge data set coherent To get all the advantages of plot, the “Aspect RaHo” of plot is very crucial The raHo of Height to Width of a plot is called the ASPECT RATIO © 2013 ExcelR Solutions. All Rights Reserved
Aspect Ratio • Generally aspect raHo should be around 0.618 • However, for long Hme series data aspect raHo should be around 0.25. To understand the impact of aspect raHo see the two plots in the next two slides © 2013 ExcelR Solutions. All Rights Reserved
Aspect ratio © 2013 ExcelR Solutions. All Rights Reserved
Aspect ratio © 2013 ExcelR Solutions. All Rights Reserved
Preliminaries for Step 3 of 8-Step forecasting strategy Should we use all historical data for forecas4ng ? Solu4on = DATA PARTIONING Fit the model only to TRAINING period Training Data Valida4on Data Assess performance on VALIDATION period © 2013 ExcelR Solutions. All Rights Reserved
Partitioning Deploy model by joining Training + ValidaHon to forecast the Future © 2013 ExcelR Solutions. All Rights Reserved
How to choose a Validation Period? Forecast Horizon Seasonality Strategy to choose Valida4on Data Period Length of series Underlying condi4ons affec4ng series © 2013 ExcelR Solutions. All Rights Reserved