420 likes | 580 Views
Chapter 2 – ECON 325. Basic Forecasting Tools. Time series data. To reprise this concept, often our historical data will consist of a sequence of observations over time. We call such a sequence a time series .
E N D
Chapter 2 – ECON 325 Basic Forecasting Tools
Time series data • To reprise this concept, often our historical data will consist of a sequence of observations over time. We call such a sequence a time series. • For example: monthly sales figures, daily stock prices, weekly interest rates, yearly profits, daily maximum temperatures, annual crop production, and electrocardiograph measurements are all time series.
Time series data • In forecasting, we are trying to estimate how the sequence of observations will continue into the future. To make things simple, we will assume that the times of observation are equally spaced. • This is not a great restriction because most business series are measured daily, monthly, quarterly, or yearly and so will be equally spaced.
Graphical summaries • The basic features of the data including patterns and unusual observations are most easily seen through graphs. • Sometimes graphs also suggest possible explanations for some of the variation in the data. • For example: industrial disputes will often affect time series of production; changes in government will affect economic time series; changes in definitions may result in identifiable changes in time series patterns.
Graphical summaries • The type of data will determine which type of graph is most appropriate. • Three common plots are time plots, seasonal plots, and scatter plots.
Time plots • Time plots and time series patterns: For time series, the most obvious graphical form is a time plot in which the data are plotted over time. • A time plot immediately reveals any trends over time, any regular seasonal behavior, and other systematic features of the data.
Time plots • An important step in selecting an appropriate forecasting method is to consider the types of data patterns, so that the methods most appropriate to those patterns can be utilized. • Four types of time series data patterns can be distinguished: horizontal, seasonal, cyclical, and trend.
Time plots • A horizontal (H) pattern exists when the data values fluctuate around a constant mean. • (Such a series called “stationary” in its mean.) • A product whose sales do not increase or decrease over time would be of this type. • A quality control situation involving sampling from a continuous production process that theoretically does not change would also show a horizontal pattern.
Time plots • A seasonal (S) pattern exists when a series is influenced by seasonal factors (e.g., the quarter of the year, the month, or day of the week). • Sales of products such as soft drinks, ice creams, and household electricity consumption all exhibit this type of pattern. • Seasonal series are sometimes also called “periodic” although they do not exactly repeat themselves over each period.
Time plots • A cyclical (C) pattern exists when the data exhibit rises and falls that are not of a fixed period. • For economic series, these are usually due to economic fluctuations such as those associated with the business cycle. • The sales of products such as automobiles, steel, and major appliances exhibit this type of pattern. • The major distinction between a seasonal and a cyclical pattern is that the former is of a constant length and recurs on a regular periodic basis, while the latter varies in length. • Moreover, the average length of a cycle is usually longer than that of seasonality and the magnitude of a cycle is usually more variable than that of seasonality.
Time plots • A trend (T) pattern exists when there is a long-term increase or decrease in the data. • The sales of many companies, the gross national product (GNP), and many other business or economic indicators follow a trend pattern in their movement over time.
Seasonal plots • Seasonal plots: for time series data that are seasonal, it is often useful to also produce a seasonal plot. • This graph consists of the data plotted against the individual “seasons” in which the data were observed. This is something like a time plot except that the data from each season are overlapped.
Seasonal plots • A seasonal plot enables the underlying seasonal pattern to be seen more clearly, and also allows any substantial departures from the seasonal pattern to be easily identified. • Seasonal subseries plots are an alternative plot where the data for each season are collected together in separate mini time plots.
Scatter plots • Some forecast data are not time series, and therefore time or seasonal plots are inappropriate. However, these data are well suited to a scatterplot, in which the variable we wish to forecast is plotted against one of the explanatory variables. • In some circumstances, categorical (qualitative) variables can also be incorporated into a scatter plot; points in the plot would be categorized by color or icon shape/size in these cases.
Scatter plots • When there are several potential predictor variables, it is useful to plot each variable against each other variable. These plots can be arranged in a scatterplot matrix. • The value of the scatterplot matrix is that it enables a quick view of the relationships between all pairs of variables. Outliers can also be seen.
Numerical summaries • In addition to graphics, it is also helpful to provide numerical summaries. A summary number for a data set is called a statistic. • For a single data set (univariate data) or a single time series, the most common descriptive statistics are the mean, the standard deviation, and the variance. • In addition, in forecasting we also frequently make use of the median (or other percentile value), as well as related concepts such as the inter-quartile range (IQR).
Numerical summaries fuel2 <- fuel[fuel$Litres<2,]summary(fuel2[,"Carbon"])sd(fuel2[,"Carbon"])
Numerical summaries • For a pair of random variables (bivariate data) it is of interest to describe how the two data sets relate to each other. The most widely used summary numbers (statistics) for this purpose are the covariance and the correlation.
Numerical summaries • Correlation (r) measures the strength of the linear relationship between two variables, x and y. It is possible for data to have a strong non-linear relationship, but low correlation, so you should always plot the data you’re analyzing. • The values for r always lie between -1 and 1, with values closer to -1 indicating a stronger negative relationship, and values closer to 1 indicating a stronger positive relationship.
Numerical summaries • Correlation equation is: • Note that the equation can be also written as: Covxy/SxSy, where Covxy is
Numerical summaries • For a single time series, it is very useful to compare the observation at one time period with the observation at another time period. The two most common statistics here are the autocovariance and the autocorrelation. • Autocorrelation measures the linear relationship between lagged values of a time series. There are several autocorrelation coefficients depending on the length of the lag selected. For example, r1 measures the relationship between yt-1 and yt.
Numerical summaries • Autocorrelation equation: • Autocovariance:
Numerical summaries beer2 <- window(ausbeer, start=1992, end=2006-.1)lag.plot(beer2, lags=9, do.lines=FALSE)
Numerical summaries • The autocorrelation coefficients are typically plotted in a correlogram or autocorrelation function (ACF). The R code for this example is simply Acf(beer2). • Time series that show no autocorrelation are called “white noise”. set.seed(30)x <- ts(rnorm(50))plot(x, main="White noise") Acf(x) • The dashed lines shown on the correlogram are equal to ; for white noise, one would expect 95% of all of the autocorrelations to lie within the band.
Exercise #3.1 • Consider the data in “running”, showing running times and maximal aerobic capacity for 14 female runners. • Calculate the mean, median, interquartile range, and standard deviation for each variable. • Which of these statistics give a measure of the center of data and which give a measure of the spread of data? • Calculate the correlation of the two variables and produce a scatterplot. • Why is it inappropriate to calculate the autocorrelation of these data?
Exercise #3.1 summary(running)sd(running[,"capacity"])sd(running[,”times”])cor(running)plot(times~capacity,data=running,pch=19,col=2)
Simple forecasting methods • There are a few very simple, and yet often quite effective, forecasting methods. • Average method • Naïve method • Seasonal naïve • Drift method
Simple forecasting methods Average (mean) method: the forecasts of all future values are equal to the mean of the historical data. If we let historical data be denoted by y1,…,yT, then meanf(y, h) # y contains the time series# h is the forecast horizon
Simple forecasting methods Naïve method: only appropriate for time series, all forecasts are simply set to the value of the previous observation. That is, the forecasts for all future values are set to be yT, where yT is the last observed value. naive(y, h)rwf(y, h) # Alternative
Simple forecasting methods Seasonal naïve: A similar method is useful for highly seasonal data. In this case, we set each forecast to be equal to the last observed value from the same season of the year (e.g., the same month of the previous year). snaive(y, h)
Simple forecasting methods Drift method: a variation on the naïve method is to allow forecasts to increase or decrease over time, where the amount of change over time (called the drift) is set to be the average change seen in historical data. This is equivalent to drawing a line between the first and last observation, and extrapolating it into the future. rwf(y, h, drift=TRUE)
Transformations and adjustments • Adjusting the historical data can often lead to a simpler forecasting model. • The purpose of transformations and adjustments is to simplify the patterns in the historical data by removing known sources of variation or by making the pattern more consistent across the whole data set. • Simpler patterns usually lead to more accurate forecasts.
Transformations and adjustments • Mathematical transformations: If the data show variation that increases or decreases with the level of the series, then a transformation can be useful. • For example, a logarithmic transformation is often useful. If we denote the original observations as y1,…,yT and the transformed observations as w1,…,wT, then wt=log(yt). • Logarithms are useful because they are interpretable: changes in a log value are relative (or percentage) changes on the original scale. So if log base 10 is used, then an increase of 1 on the log scale corresponds to a multiplication of 10 on the original scale. • Another useful feature of log transformations is that they constrain the forecasts to stay positive on the original scale.
Transformations and adjustments Sometimes other transformations are also used (although they are not so interpretable). For example, square roots and cube roots can be used. These are called power transformations because they can be written in the form
Transformations and adjustments A useful family of transformations that includes logarithms and power transformations is the family of "Box-Cox transformations", which depend on the parameter λ and are defined as follows: The logarithm in a Box-Cox transformation is always a natural logarithm (i.e., to base e). So if λ=0, natural logarithms are used, but if λ≠0, a power transformation is used followed by some simple scaling.
Transformations and adjustments plot(log(elec), ylab="Transformed electricity demand",xlab="Year", main="Transformed monthly electricity demand") title(main="Log",line=-1) A good value of λ is one which makes the size of the seasonal variation about the same across the whole series, as that makes the forecasting model simpler. In this case, λ=0.30 works quite well, although any value of λ between 0 and 0.5 would give similar results.
Transformations and adjustments # The BoxCox.lambda() function will choose a value of lambda for you. lambda <- BoxCox.lambda(elec) # = 0.27 plot(BoxCox(elec,lambda))
Transformations and adjustments Having chosen a transformation, we need to forecast the transformed data. Then, we need to reverse the transformation (or back-transform) to obtain forecasts on the original scale. The reverse Box-Cox transformation is given by:
Transformations and adjustments • For many series, transformation does not often have a major effect on forecast accuracy. • This is because most forecast methods place more weight on the most recent data. Therefore, earlier, typically smaller, variations are unlikely to influence the forecast very much. • Only when the series is rapidly changing in variation will mathematical transformations make a larger difference to the forecast. • However, some of the measures of forecast accuracy give equal weight to all data and so prediction intervals will be affected by transformations.
Transformations and adjustments • Calendar transformations: Some variation seen in seasonal data may be due to simple calendar effects. In such cases, it is usually much easier to remove the variation before fitting a forecasting model. • For example, if you are studying monthly milk production on a farm, then there will be variation between the months simply because of the different numbers of days in each month in addition to seasonal variation across the year.
Transformations and adjustments monthdays <- rep(c(31,28,31,30,31,30,31,31,30,31,30,31),14) monthdays[26 + (4*12)*(0:2)] <- 29 par(mfrow=c(2,1)) plot(milk, main="Monthly milk production per cow", ylab="Pounds",xlab="Years") plot(milk/monthdays, main="Average milk production per cow per day", ylab="Pounds", xlab="Years")
Transformations and adjustments • A similar adjustment can be done for sales data when the number of trading days in each month will vary. In this case, the sales per trading day can be modelled instead of the total sales for each month. • Population adjustments: Any data that are affected by population changes can be adjusted to give per-capita data. • Inflation adjustments: Data that are affected by the value of money are best adjusted before modelling. Financial time series are usually adjusted so all values are stated in dollar values from a particular year.