570 likes | 890 Views
TIME SERIES ANALYSIS. EECS 731: INTRODUCTION TO DATA SCIENCE. Lecturer : Prof. Nicole Beckage. Team One members : Al- Smadi , Adi De Berner, Aime Duckworth, Ryan Nelakurthi , Pavan Nguyen, Phuong. Aime. CONTENTS. Introduction Methodology Prior Markov Model Seasonality
E N D
TIME SERIES ANALYSIS EECS 731: INTRODUCTION TO DATA SCIENCE Lecturer: Prof. Nicole Beckage Team One members: Al-Smadi, Adi De Berner, AimeDuckworth, RyanNelakurthi, PavanNguyen, Phuong Aime
CONTENTS • Introduction • Methodology • Prior • Markov Model • Seasonality • Time Series Models • Conclusion Aime
INTRODUCTION • “From the earliest times man has measured the passage of time with candles or clepsydras or clocks, has constructed calendars, sometimes with remarkable accuracy, and had recorded the progress of his race in the form of annals” Quoted from Maurice Kendall, “Time Series”, Hafner Press, (1976), pg 1 Aime
TIME SERIES • A set of quantitative observations arranged in chronological order(Assumption: Time is a Discrete Variable) • a collection of observations of well-defined data items obtained through repeated measurements over time • Sequence of numerical data points in successive order Aime
Time Series Analysis / FORECASTING • Time Series Analysis: • Use of methods for analyzing time series data to extract their characteristics and meaningful statistics • A branch of Statistics that generally deals with structural dependencies between observed data of random phenomena and related parameters (observed phenomena indexed by time) • Time Series Forecasting: • Use of a model to predict future values based on previous observed values Aime
TERMINOLOGY • Ergodicity: assumption that means the sample moments calculated on the basis of time series with a finite number of observations converges for T->(“consistency properties”) • Stationarity: statistical equilibrium condition for stochastic process to be ergodic. Distance between two observations does not change over time. • Stochastic process (data generating process) of all possible realization (probability theory) Aime
TYPES OF TIME SERIES • Continuous vs. Discrete • Observations made continuously vs. those made at certain times in (in discrete time-interval aggregation) • Stationary vs. Non-Stationary • Data that fluctuates around a certain constant • Series with parameters of cycle (eg: length, amplitude, phase) change over time • Deterministic vs. Stochastic • Data can be predicted exactly • Data partly determined by past values, future values to be described by a probability distribution. Aime
TYPES OF TIME SERIES • Seasonal vs Non-seasonal • Linear vs Non-linear • Univariate vs Multivariate • Chaotic (Randomly distributed and non-periodic) Aime
TYPES OF TIME SERIES • Seasonal vs Non-seasonal, • Linear vs Non-linear, • Univariate vs Multivariate, • Chaotic (Randomly distributed and non-periodic) Aime
Goals of Time series ANALYSIS • Descriptive Analysis: Trends and patterns that a Time Series has by plotting or using complex techniques • Spectral Analysis: Variation in Time Series accounted for by cyclic components, (estimate on frequency – noise) • Forecasting: Prediction based on previous behavior (models built: predictions within certain confidence limits) • Intervention Analysis: “Change in a Time Series before and after a certain event” • Explanative Analysis (Cross Correlation): Mechanisms resulting in an estimate. “What is the relationship between two Time Series datasets?” Aime
Forecasting Methodology Forecasting Causal Models Time Series Models Seasonal Trend Random Cyclical Regression Phuong
Time-Series Method Structure Time Series Models Trend Models Cyclical Variation Seasonal Variation Random Variation Markov Model “Prior” Error/ Noise Seasonality Phuong
THE NOTION OF “PRIOR” • The probability that an event will reflect established beliefs about the event before the arrival of new evidence or information. • It is the unconditional probability that is assigned before any relevant evidence is taken into account. • It is the mathematical base for prediction. Pavan
Posterior Probability • Prior probabilities are the original probabilities of an outcome, which be will updated with new information to create posterior probabilities. • Bayes' theorem calculates the renormalized pointwise product of the prior and the likelihood function, to produce the posterior probability distribution, which is the conditional distribution of the uncertain quantity given the data. Pavan
Bayes theorem This relates the probability of the hypothesis before getting the evidence P(H), to the probability of the hypothesis after getting the evidence, P(H|E). For this reason, P(H) is called the prior probability, while P(H|E) is called the posterior probability. The factor that relates the two, P(E|H)/P(E), is called the likelihood ratio. Pavan
Example Pavan
Forecasting bias • A forecast bias occurs when there are consistent differences between actual outcomes and previously generated forecasts of those quantities; that is: forecasts may have a general tendency to be too high or too low. • Bias usually occurs due to addition of human personal ideology to the data. • https://www.youtube.com/watch?v=gn4nRCC9TwQ Pavan
N-step ahead • Let D1, D2, . . . Dn, . . . be the past values of the series to be predicted (demands?). If we are making a forecast during period t (for the future), assume we have observed Dt , Dt-1 etc. • Let Ft, t + t = forecast made in period t for the demand in period t + t where t = 1, 2, 3, … • Then Ft -1, t is the forecast made in t-1 for t and • Ft, t+1 is the forecast made in t for t+1. (one step ahead) Use shorthand notation Ft = Ft - 1, t Pavan
Forecasting Error • The forecast error in period t, et, is the difference between the forecast for demand in period t and the actual value of demand in t. • For a multiple step ahead forecast: et = Ft - t, t - Dt. • For one step ahead forecast: et = Ft – Dt Pavan
MARKOV MODEL • Named after a Russian Mathematician: Andrey Markov [1856 – 1922] • Future state depends only on the current state, not on events that occurred before it. • Future is independent of past, given the present. • If you know the exact state of world now, and want to predict the future, knowledge about the past isn't useful because all knowledge about the past is wrapped up in the current state. • Temporal Data (Sequence of data) • Weather • Finance • Language • Music • Assume discrete time and discrete space Ryan
Hidden Markov Model • "Hidden Markov model is a Markov chain for which the state is only partially observable." • One common use is for speech recognition. • Observed data is speech audio. • Hidden state is the spoken text. • Viterbi algorithm finds the most likely sequence of spoken words from the speech audio. Ryan
Markov Decision Process • Markov chain where state transitions depends on the current state and an action vector that is applied to the system. • Related to Reinforcement learning • Solved by value iterations Ryan
Partially Observable Markov Decision Process (POMDP) • State of the system is only partially observed. • NP complete - "nondeterministic polynomial time" (no fast solution to them is known) • Useful for agents and robotics. • Markov Random Field (Markov Network) • Generalization of a Markov Chain in multiple dimensions. • Each state depends on neighbor's state in multiple directions, as compared to a Markov Chain, where only the previous state is considered. Ryan
Hierarchical Markov Models • Can be applied to categorize human behavior. • Example: Observations of time & location on campus can be interpreted to determine activity. • At Allen Field House in the afternoon or evening → Watching a basketball event. • At a cafe around noon → Eating lunch Ryan
SEASONALITY Definitions: • Seasonality is a characteristic of a time series in which the data experiences regular and predictable changes that recur every calendar year • A seasonal pattern exists when a series is influenced by seasonal factors (e.g., the quarter of the year, the month, or day of the week). Monday Tuesday Etc.. Fall Winter Etc… Jan Feb Etc.. First Second Etc… Adi
Seasonality Example of Quarterly Seasons “Seasonal Variation In time series, that part of the movement which is assigned to the effect of the seasons on the year” Adi
Seasonality • General: • Many time series display seasonality. By seasonality, we mean periodic fluctuations. • If seasonality is present, it must be incorporated into the time series model. • For example, glaciers tend to melt in summer season and then melting decline after the summer. Thus, time series of glacier’s mass will typically show mass reduction during summers. Adi
Seasonality • Seasonality Detection Techniques: • The following graphical techniques can be used to detect seasonality: • Run sequence plot; • Seasonal subseries plot; • Multiple box plots; • The autocorrelation plot; Adi
SEASONALITY DETECTION TECHNIQUES • Run sequence plot • Run sequence plot can be used to answer the following questions • Are there any shifts in location? • Are there any shifts in variation? • Purpose: Check for Shifts in Location and Scale and Outliers “Last Third of Data Shows a Shift of Location” Adi
SEASONALITY DETECTION TECHNIQUES • Seasonal subseries plot can provide answers to the following questions: • Do the data exhibit a seasonal pattern? • What is the nature of the seasonality? • Is there a within-group pattern (e.g., do January and July exhibit similar patterns)? • Are there any outliers once seasonality has been accounted for? • Purpose: a tool for detecting seasonality in a time series, allows you to detect both between group and within group patterns. • Seasonal subseries plot • peak in May • steadily decrease through September • rising until the May peak. Adi
SEASONALITY DETECTION TECHNIQUES • Multiple box plots • Multiple box plots can be used to answer the following questions • Does the location differ between subgroups? • Does the variation differ between subgroups? • Purpose: Check location and variation shifts, multiple box plots can be drawn together to compare multiple data sets or to compare groups in a single data set. This box plot reveals that machine has a significant effect on energy with respect to location and possibly variation Adi
SEASONALITY DETECTION TECHNIQUES • The autocorrelation plot • The autocorrelation plot can provide answers to the following questions: • Are the data random? • Is an observation related to an observation twice-removed? (etc.) • Is the observed time series autoregressive? • What is an appropriate model for the observed time series? • Purpose: Check Randomness. This plot shows that the time series is not random, but rather has a high degree of autocorrelation Adi
SEASONALITY DETECTION TECHNIQUES • Considering the Best Technique !! • The run sequence plot is a recommended first step for analyzing any time series. • Seasonality is shown more clearly by the seasonal subseries plot or the box plot. • Both the seasonal subseries plot and the box plot assume that the seasonal periods are known. • If the period is not known, the autocorrelation plot can help. • Seasonal subseries plot • Run sequence plot • Multiple box plots • Autocorrelation plot Adi • Reference: Engineering Statistics Handbook, online-reference, http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc443.htm
SEASONALITY DETECTION TECHNIQUES Example First: run sequence plot. Second: Seasonal Subseries Plot 1st: No obvious periodic patterns are apparent in the run sequence plot. Third: Box Plot 2nd: The means for each month are relatively close and show no obvious pattern in Seasonal Subseries Plot 3rd: Due to the rather large number of observations, the box plot shows the difference between months better than the seasonal subseries plot. Adi
Time Series Models • Addition Model: X = T + S + C + R Where: X = Original Data T = Trend Value S = Seasonal Variation C = Cyclical Variation R = Random Variation Phuong
Multiplicative Model • Observed value in Time Series is the product of components • For Annual Data: • For Quarterly: Where: Ti= Trend Ci= Cyclical Ri= Random Si= Seasonal Phuong
Time Series Methodologies Time Series No Yes Smoothing Method Trend? Trend Models Quadratic Linear Moving Average Exponential Smoothing Exponential Auto-Regressive Phuong
Moving Average Graph Sales Actual Year Phuong https://www.analyticsvidhya.com/blog/2015/12/complete-tutorial-time-series-modeling/
Moving Average -- example • 3 month MA: (oct+nov+dec)/3=258.33 • 6 month MA: (jul+aug+…+dec)/6=249.33 • 12 month MA: (Jan+feb+…+dec)/12=205.33
What about Weighted Moving Averages? • This method looks at past data and tries to logically attach importance to certain data over other data • Weighting factors must add to one • Can weight recent higher than older or specific data above others • If forecasting staffing, we could use data from the last four weeks where Tuesdays are to be forecast. • Weighting on Tuesdays is: T-1 is .25; T-2 is .20;T-3 is .15; T-4 is .10 and Average of all other days is weighed .30.
Exponential Smoothing Attendance Actual Year Phuong https://www.analyticsvidhya.com/blog/2015/12/complete-tutorial-time-series-modeling/
Time Series Methodologies Time Series No Yes Smoothing Method Trend? Trend Models Quadratic Linear Moving Average Exponential Smoothing Exponential Auto-Regressive Phuong
Characteristics of Time Series Data • Data are NOT necessarily INDEPENDENT and NOT necessarily IDENTICALLY distributed • ORDERING is very important. • Changing the order could change the meaning of the data -> DEPENDENCY Phuong
Time Series Forecasting Horizons • Long Term • Five years or more into the future • E.g., plant location and product planning • Medium Term • 1 season to 2 years • E.g., sales forecasts • Short Term • 1 day to 1 year or less than 1 season • E.g., staffing levels and inventory levels Phuong
When Should Time Series Analysis Best Be Used? • Deterministic factors are NOT READILY AVAILABLE. • Consider a UNIVARIATE time series – the same variable collected over time. Phuong
How to Apply Time Series Analysis? • Given a continuous signal, we can sample its values at equal time intervals. • E.g., human electrocardiography • 2. The value of the state variable accumulates during some time interval. • E.g., daily rainfall • Some processes are inherently discrete. • E.g., trains arriving to the station at discrete time moments Phuong
Applications of Time Series Analysis • Economic Forecasting • Sales Forecasting • Budgetary Analysis • Stock Market Analysis • Yield Projections • Process and Quality Control • Inventory Studies • Workload Projections • Utility Studies • Census Analysis Phuong
Application Software • Spreadsheets • Microsoft Excel, Quattro Pro, Lotus 1-2-3, etc. • Statistical packages • SPSS, SAS, NCSS, Minitab, etc. • Specialty forecasting packages • Forecast Master, Forecast Pro, etc. Phuong
Examples of Time series data • Number of babies born in each hour. • Daily closing price of a stock. • The monthly trade balance of the U.S. for each year. • GDP of the country, measured each year. Phuong
Marketing Example: wine sales of a company State variable: monthly wine sales months Phuong http://home.vicnet.net.au/~norca/Red_Wine.htm