1 / 109

Data Science Course

Stand out from the crowd with a Data Science Course at Excelr, where we provide all the necessary support and placement assistance to all our students. Learn with experience and expert faculty.<br>https://www.excelr.com/data-science-course-training-in-pune/<br>

expertdigi
Download Presentation

Data Science Course

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Forecasting Time Series

  2. My Introduction Name: Bharani Kumar Educa+on: IIT Hyderabad Indian School of Business Professional cer+fica+ons: PMP PMI-ACP PMI-RMP CSM LSSGB LSSBB SSMBB ITIL Agile PM Project Management Professional Agile Cer4fied Prac44oner Risk Management Professional Cer4fied Scrum Master Lean Six Sigma Green Belt Lean Six Sigma Black Belt Six Sigma Master Black Belt Informa4on Technology Infrastructure Library Dynamic System Development Methodology Atern © 2013 ExcelR Solutions. All Rights Reserved

  3. My Introduction 4 RESEARCH in ANALYTICS, DEEP LEARNING & IOT DATA SCIENTIST 3 2 DeloiHe Driven using US policies 1 Infosys Driven using Indian policies under Large enterprises ITC Infotech Driven using Indian policies SME HSBC Driven using UK policies © 2013 ExcelR Solutions. All Rights Reserved

  4. AGENDA Why Forecas4ng Learn about the various examples of forecasHng Forecas4ng Strategy Learn about decomposing, forecasHng & combining EDA & Graphical Representa4on Forecas4ng Strategy Forecas4ng Decomposi4on components EDA & Graphical Representa4on Learn about exploratory data analysis, scaKer plot, Hme plot, lag plot, ACF plot Why Forecas4ng? Forecas4ng components Learn about Level, Trend, Seasonal, Cyclical, Random components Forecas4ng Models & Errors Learn about various forecasHng models to be discussed & the various error measures AGENDA

  5. Why Forecasting Why forecast, when you would know the outcome eventually? Early knowledge is the key, even if that knowledge is imperfect –  For seQng producHon schedules, one needs to forecast sales –  For staffing of call centers, a company needs to forecast the demand for service –  For dealing with epidemic emergencies, naHons should forecast the various flu •  •  © 2013 ExcelR Solutions. All Rights Reserved

  6. Types of forecast Short Term or Long Term Micro Scale or Macro Scale Density Forecast Point Forecast Interval Forecast Forecas4ng Classifica4on Qualita4ve or Quan4ta4ve Data or Judgment © 2013 ExcelR Solutions. All Rights Reserved

  7. Who generates Forecast? © 2013 ExcelR Solutions. All Rights Reserved

  8. Who generates Forecast? © 2013 ExcelR Solutions. All Rights Reserved

  9. Time series vs Cross-sectional data 01 Cross-sec4onal Data 02 Time Series Data © 2013 ExcelR Solutions. All Rights Reserved

  10. Dataset for further discussion Month Jan-91 Feb-91 Mar-91 Apr-91 May-91 Jun-91 Jul-91 Aug-91 Sep-91 Oct-91 Nov-91 Dec-91 Jan-92 Feb-92 Mar-92 Apr-92 May-92 Jun-92 Footfall in thousands 1709 1621 1973 1812 1975 1862 1940 2013 1596 1725 1676 1814 1615 1557 1891 1956 1885 1623 t = 1, 2, 3,…....= Hme period index Yt = value of the series at Hme period t Yt+k = forecast for Hme period t+k, given data unHl Hme t et = forecast error for period t Monthly FooWalls of customers from Jan 1991 to March 2004 © 2013 ExcelR Solutions. All Rights Reserved

  11. Forecasting Strategy 01 02 03 04 05 06 07 08 Define Goal Data Collec4on Explore & Visualize Series Pre-Process Data Par44on Series Apply Forecas4ng Method(s) Evaluate & Compare Performance Implement Forecasts / System © 2013 ExcelR Solutions. All Rights Reserved

  12. Forecasting Strategy – Step 1 #1 Is the goal descriptive or predictive? #2 What is the forecast horizon? • DescripHve = Time Series Analysis • How far into the future? k in Yt+k • Rolling forward or at single Hme point? • PredicHve = Time Series ForecasHng Define Goal #4 Forecasting expertise & automation #3 How will the forecast be used? • In-house forecasHng or consultants? • Who are the stakeholders? • Numerical or event forecast? • How many series? How ofen? • Cost of over-predicHon & under-predicHon • Data & sofware availability © 2013 ExcelR Solutions. All Rights Reserved

  13. Forecasting Strategy – Step 2 #1 Data Quality #2 Temporal Frequency • Typically small sample, so need good quality • Data same as series to be forecasted • Should we use real-Hme Hcket collecHon data? • Balance between signal & noise • AggregaHon / DisaggregaHon Data Collec-on #4. Domain exper4se #3 Series Granularity? • Necessary informaHon source • Affects modeling process from start to end • Level of communicaHon/ coordinaHon between forecasters & domain experts • Coverage of the data – Geographical, populaHon, Hme,… • Should be aligned with goal © 2013 ExcelR Solutions. All Rights Reserved

  14. Forecasting Strategy Step3 (Explore Series) Season al PaHern s Addi4ve: Yt = Level + Trend + Seasonality + Noise Mul4plica4ve: Yt = Level x Trend x Seasonality x Noise Trend NON-SYSTEMATIC PART Noise Level Seasonal PaHerns SYSTEMATIC PART © 2013 ExcelR Solutions. All Rights Reserved

  15. Trend Component •  Persistent, overall upward or downward paKern •  Due to populaHon, technology etc. •  Overall Upward or Downward Movement •  Several years duraHon Response Mo., Qtr., Yr. © 2013 ExcelR Solutions. All Rights Reserved

  16. Seasonal Component •  Regular paKern of up & down fluctuaHons •  Due to weather, customs etc. •  Occurs within one year •  Example: Passenger traffic during 24 hours Summer Response Mo., Qtr. © 2013 ExcelR Solutions. All Rights Reserved

  17. Irregular/Random/Noise Component •  ErraHc, unsystemaHc, ‘residual’ fluctuaHons •  Due to random variaHon or unforeseen events –  Union strike –  War •  Short duraHon & nonrepeaHng © 2013 ExcelR Solutions. All Rights Reserved

  18. Time Series Components © 2013 ExcelR Solutions. All Rights Reserved

  19. Time Plot •  Plots a variable against Hme index •  Appropriate for visualizing serially collected data (Hme series) •  Brings out many useful aspects of the structure of the data •  Example: Electrical usage for Washington Water Power (Quarterly data from 1980 to 1991) © 2013 ExcelR Solutions. All Rights Reserved

  20. Time plot Electrical power usage for Washington Water Power: 1980-1991 1100 1000 Power usage (KilowaHs) 900 800 700 600 500 400 1980 1982 1984 1986 1988 1990 Year © 2013 ExcelR Solutions. All Rights Reserved

  21. Observations •  There is a cyclic trend •  Maximum demand in first quarter; minimum in third quarter •  There may also be a slowly increasing trend (to be examined) •  Any reasonable forecast should have cyclic fluctuaHons •  Trend (if any) need to be uHlized for forecasHng •  Forecast would not be exact – there would be some error © 2013 ExcelR Solutions. All Rights Reserved

  22. Time plot © 2013 ExcelR Solutions. All Rights Reserved

  23. Quarterly Sales of Ice-cream © 2013 ExcelR Solutions. All Rights Reserved

  24. Scatter Diagram Cost 859 682 471 708 1094 224 320 651 1049 Age 8 5 3 9 11 2 1 8 12 •  Plots one variable against another •  One of the simplest tools for visualizaHon ž  Example: Maintenance cost and Age for nine buses (Spokane Transit) ž  This is an example of cross-secHonal data (observaHons collected in a single point of Hme) © 2013 ExcelR Solutions. All Rights Reserved

  25. Scatter Plot 1200 Yearly cost of maintenance (US $) 1000 800 600 400 200 0 0 2 4 6 Age of bus 8 10 12 14 © 2013 ExcelR Solutions. All Rights Reserved

  26. Observations •  Older buses have higher cost of maintenance •  There is some variaHon (case to case) •  The rise in cost is about $ 80 per year of age •  It may be possible to use ‘age’ to forecast maintenance cost •  Forecast would not be a ‘certain’ predicHon – there would be some error © 2013 ExcelR Solutions. All Rights Reserved

  27. Lag plot •  Plots a variable against its own lagged sample •  Brings out possible associaHon between successive samples •  Example: Monthly sale of VCRs by a music store in a year = Number of VCRs sold in Hme period t = Number of VCRs sold in Hme period t – k © 2013 ExcelR Solutions. All Rights Reserved

  28. Example of lagged variables Number of VCRs sold in a month Time 1 123 2 130 3 125 4 138 5 145 6 142 7 141 8 146 9 147 10 157 11 150 12 160 Original Lagged one step 123 130 125 138 145 142 141 146 147 157 150 Lagged two steps 123 130 125 138 145 142 141 146 147 157 © 2013 ExcelR Solutions. All Rights Reserved

  29. Lag plot (k = 1) ScaHer plot of VCR sales with 1-step lagged VCR sales 160 155 150 145 140 135 130 125 120 120 125 130 135 140 145 150 155 160 © 2013 ExcelR Solutions. All Rights Reserved

  30. Observations •  There is a reasonable degree of associaHon between the original variable and the lagged one •  Value of lagged variable is known beforehand, so it is useful for predicHon •  AssociaHon between original and lagged variable may be quan+fied through a correlaHon © 2013 ExcelR Solutions. All Rights Reserved

  31. Autocorrelation •  CorrelaHon between a variable and its lagged version (one Hme-step or more) = ObservaHon in Hme period t = ObservaHon in Hme period t – k = Mean of the values of the series = AutocorrelaHon coefficient for k-step lag © 2013 ExcelR Solutions. All Rights Reserved

  32. Standard error of rk •  The standard error is The standard error of the mean esHmates the variability between samples whereas the standard deviaHon measures the variability within a single sample. •  Increases progressively with k, but eventually reaches a maximum value •  If the ‘true’ autocorrelaHon is 0, then the esHmate rk should be in the interval (– 2SE(rk), 2SE(rk)) 95% of the Hme •  SomeHmes SE(rk) is approximated by © 2013 ExcelR Solutions. All Rights Reserved

  33. Correlogram or ACF plot •  Plots the ACF or AutocorrelaHon funcHon (rk) against the lag (k) •  Plus-and-minus two-standard errors are displayed as limits to be exceeded for staHsHcal significance •  Reveals lagged variables that can be potenHally useful for forecasHng © 2013 ExcelR Solutions. All Rights Reserved

  34. Correlogram for VCR data © 2013 ExcelR Solutions. All Rights Reserved

  35. ACF plot for electricity usage data © 2013 ExcelR Solutions. All Rights Reserved

  36. Observations •  Every alternate sample is large, many of them staHsHcally significant also •  ACFs at lags 4, 8, 12, etc are posiHve •  ACF at lags 2,6,10 etc are negaHve •  All these pick up the seasonal aspect of the data •  The data may be re-examined afer ‘removing’ seasonality © 2013 ExcelR Solutions. All Rights Reserved

  37. ACF of de-seasoned KW data © 2013 ExcelR Solutions. All Rights Reserved

  38. Observations •  De-seasoned series has small ACFs •  This part of the data has liKle forecasHng value © 2013 ExcelR Solutions. All Rights Reserved

  39. Typical questions in exploratory analysis Is there a TREND? All the plots contain informaHon regarding these quesHons Is there a SEASONALITY? Are the data RANDOM? © 2013 ExcelR Solutions. All Rights Reserved

  40. Time series plots © 2013 ExcelR Solutions. All Rights Reserved

  41. Effect of omission of data on the Time series plot © 2013 ExcelR Solutions. All Rights Reserved

  42. Effect of omission of data on the Time series plot © 2013 ExcelR Solutions. All Rights Reserved

  43. Confusing kind of trend due to other type of scaling 20406080 20406080 y y 0 0 0 5 10 t 15 20 0 1 2 3 Log t 4.5 4.5 3.54 3.54 Log y Log y 2.53 2.53 0 5 10 t 15 20 0 1 2 3 Log t © 2013 ExcelR Solutions. All Rights Reserved

  44. Few points on Plots Plot helps us to summarize & reveal paKerns in data Graphics help us to idenHfy anomalies in data Plot helps us to present a huge amount of data in small space & makes huge data set coherent To get all the advantages of plot, the “Aspect RaHo” of plot is very crucial The raHo of Height to Width of a plot is called the ASPECT RATIO © 2013 ExcelR Solutions. All Rights Reserved

  45. Aspect Ratio •  Generally aspect raHo should be around 0.618 •  However, for long Hme series data aspect raHo should be around 0.25. To understand the impact of aspect raHo see the two plots in the next two slides © 2013 ExcelR Solutions. All Rights Reserved

  46. Aspect ratio © 2013 ExcelR Solutions. All Rights Reserved

  47. Aspect ratio © 2013 ExcelR Solutions. All Rights Reserved

  48. Preliminaries for Step 3 of 8-Step forecasting strategy Should we use all historical data for forecas4ng ? Solu4on = DATA PARTIONING Fit the model only to TRAINING period Training Data Valida4on Data Assess performance on VALIDATION period © 2013 ExcelR Solutions. All Rights Reserved

  49. Partitioning Deploy model by joining Training + ValidaHon to forecast the Future © 2013 ExcelR Solutions. All Rights Reserved

  50. How to choose a Validation Period? Forecast Horizon Seasonality Strategy to choose Valida4on Data Period Length of series Underlying condi4ons affec4ng series © 2013 ExcelR Solutions. All Rights Reserved

More Related