640 likes | 1.28k Views
2. 04/05/2011ESTP course. Identification of type of OutliersCalendar Effects and its determinantsX-12 ARIMA and TRAMO/SEATS. Session 2 May 4th. Dario Buono, Enrico Infante. 3. Outliers. Outliers are data which do not fit in the tendency of the Time Series observed, which fall outside the range
E N D
2. 2 Identification of type of Outliers
Calendar Effects and its determinants
X-12 ARIMA and TRAMO/SEATS
3. 3 Outliers Outliers are data which do not fit in the tendency of the Time Series observed, which fall outside the range expected on the basis of the typical pattern of the Trend and Seasonal Components
Additive Outlier (AO): the value of only one observation is affected. AO may either be caused by random effects or due to an identifiable cause as a strike, bad weather or war
Temporary Change (TC): the value of one observation is extremely high or low, then the size of the deviation reduces gradually (exponentially) in the course of the subsequent observations until the Time Series returns to the initial level
Level Shift (LS): starting from a given time period, the level of the Time Series undergoes a permanent change. Causes could include: change in concepts and definitions of the survey population, in the collection method, in the economic behavior, in the legislation or in the social traditions
4. 4 Outliers
5. 5 Outliers
6. 6 Outliers
7. 7 Outliers
8. 8 Outliers
9. 9 TRAMO & X-12 ARIMA use regARIMA models where Outliers are regressors
Xt is the raw series, Zt is the “linearized” series (corrected from Outliers and other effects)
Zt follows an ARIMA(p,d,q):
where ? is the rupture date, ? is the rupture impact and ?(B) models the kind of rupture (its “shape”) Outliers
10. 10 Outliers
Automatic and iterative detection and estimation process are available in TRAMO and X-12 ARIMA
11. 11 Outliers The smoothness of series can be decided by statisticians and the policy must be defined in advance
Consult the users
This choice can influence dramatically the credibility
Outliers in last quarter are very difficult to be identified
Some suggestions:
Look at the growth rates
Conduct a continuous analysis of external sources to identify reasons of Outliers
Where possible always add an economic explanation
Be transparent (LS, AO,TC)
12. 12 Calendar Effects Time Series: usually a daily activity measured on a monthly or quarterly basis only
Flow: monthly or quarterly sum of the observed variable
Stock: the variable is observed at a precise date (example: first or last day of the month)
Some movements in the series are due to the variation in the calendar from a period to another
Can especially be observed in flow series
Example: the production for a month often depends on the number of days
13. 13 Calendar Effects Trading Day Effect
Can be observed in production activities or retail sale
Trading Days (Working Days) = days usually worked according to the business uses
Often these days are non public holiday weekdays (Monday, Tuesday, Wednesday, Thursday, Friday)
Production usually increases with the number of working days in the month
14. 14 “Day of the week” effect
Example: Retail sale turnover is likely to be more important on Saturdays than on other weekdays
Statutory (Public) Holidays and Moving Holidays
Most of statutory holidays are linked to a date, not to a day of the week (Christmas)
Some holidays can move across the year (Easter, Ramadan) and their effect is not completely seasonal
? Months and quarters are not equivalent and not directly comparables Calendar Effects
15. 15 Calendar Effects – Trading Days
16. 16 Calendar Effects – Trading Days
17. 17 Calendar Effects – Trading Days
18. 18 Calendar Effects can be partly considered seasonal:
The number of Trading Days is almost always smaller in February than in March
Some months may have more public holidays (ex: May in France), and therefore less trading days than others
Thus, we can:
Either estimate the non seasonal Calendar Effects
Make the correction of both Seasonal and Calendar Effects
Spectra of the “Weekday” series: number of Mondays, Tuesdays, …, Friday per month Calendar Effects – Trading Days
19. 19 Average length of a year: 365.25 days
Average length of a month:
Average number of weeks per month: Calendar Effects – Trading Days
20. 20 Calendar Effects – Trading Days
21. 21 Calendar Effects – Trading Days
22. 22 Hypothesis: for the most common models, it is assumed that the effect of a day is constant over the span of the series
Basic model:
Ni,t = # of Mondays (i=1), …, Sundays (i=7) in month t
?i is the effect of day i (ex: average production for a Monday (i=1)) Calendar Effects – Trading Days
23. 23 In some cases (particular activities) you could have to make a distinction between working and non working Mondays, Tuesdays etc.
You can of course adapt the model to other more simple or more complex cases Calendar Effects – Trading Days
24. 24 Certain groupings could be more sensible than others:
Public holidays
Non working weekdays
Trading days and weekends
Etc.
Much better to use contrasts (W. Bell):
where:
? is the average effect of a day (mean of the ? and ?)
mt is the length of month t
Calendar Effects – Trading Days
25. 25 The variables take into account a part of Seasonality: the length of the month is seasonal, the White Monday is always a Monday etc.
If we want to estimate the non seasonal Calendar Effects, we can remove from the variables their long-term average
In this case, the explanatory variables do not have any Trend or Seasonality. Thus, the dependant variable must be stationary, an Irregular Component for example (X-11), or a regARIMA model should be used
X-12 ARIMA and TRAMO-SEATS Calendar Effects – Trading Days
26. 26 Coffee-Break!!!
27. 27 Calendar Effects – Moving Holidays The date of some public holidays (usually religious events) moves across the year and can affect different months or quarters according to the year
Examples: Ramadan, Chinese New Year, Easter (Catholic or Orthodox) etc.
The effect of the event could be gradual and could affect the days before or after the event itself: Chocolate sales around Easter
We focus in the following on the treatment of Easter but the methodology and the models can be easily adapted
28. 28 Calendar Effects – Moving Holidays Catholic Easter
Sunday Easter can fall between March 22 (last time: 1818, next time: 2285) and April 25 (last time: 1943, next time: 2038) i.e. during the first or second quarter
Orthodox Easter
Between April 4 (last time: 1915, next time: 2010) and May 9 (last time before 1600, next time: 2173) i.e. always during the third quarter
Easter affects the production of many economic sectors:
Statutory holidays (transportation, hotel frequentation), consumption habits (chocolate, flowers, lamb meat, etc.)
29. 29 Calendar Effects – Moving Holidays
30. 30 Calendar Effects – Moving Holidays
31. 31 Calendar Effects – Moving Holidays Usually models are based on the number of days “affected” by Easter in the various months (March or April for Catholic Easter)
Immediate Impact
Gradual Impact: effect on the n days preceding Easter
Easter can affect the w preceding days; the impact can therefore affect 2 months
The impact is supposed constant on the w days
32. 32 Calendar Effects – Moving Holidays The 2 Easter models are estimated using a regARIMA model:
Where:
Xt represents a regressor modeling Calendar Effects, Outliers (AO, LS, TS) and other user-specified effects including Easter Effects
Dt represents a priori adjustments such as leap year, strikes, etc.
33. 33 Calendar Effects – Moving Holidays The w-day period, during which the activity is affected, does NOT include the Sunday Easter. The activity level remains constant on the w-day period
34. 34 Calendar Effects – Moving Holidays If i denotes the year and j the month, let us note and the number of the w days falling in month j of year i
The regressor associated to this model is:
The regressor values are 0 except for February, March and April depending on the value of w
35. 35 Calendar Effects – Moving Holidays
36. 36 Calendar Effects – Moving Holidays
37. 37 Orthodox Easter falls more in April than in May (83% versus 17%)
Catholic Easter falls more in April than in March (78% versus 22%)
There is a “Seasonal Effect” one can correct by removing the long-term monthly average: Calendar Effects – Moving Holidays
38. 38 Calendar Effects
Definitions:
The Seasonally Adjusted series is logically defined as the raw series from which the Seasonality (St) has been removed
The “Working Day” adjusted series is defined as the raw data from which the Calendar Effects (TDt, MHt) have been removed
This definition can vary if you include or not the Calendar Effects
39. 39 X-12 ARIMA VS TRAMO/SEATS Seasonal Adjustment is usually done with an off-the-shelf program. Three popular tools are:
X-12 ARIMA (Census Bureau)
TRAMO/SEATS (Bank of Spain)
DEMETRA+ (Eurostat), interface X-12 ARIMA and Tramo/Seats
X-12 ARIMA is Filter based: always estimate a Seasonal Component and remove it from the series even if no Seasonality is present, but not all the estimates of the Seasonally Adjusted series will be good
TRAMO/SEATS is model based method variants of decomposition of Time Series into non-observed components
40. 40 X-12 ARIMA Auto-projective Models
What is f ?
Under certain hypothesis (Stationary), it is possible to find an “ARMA function” which gives a good approximation of f
These models usually depend on a few number of parameters only
Box et Jenkins methodology allows to estimate the parameters and gives quality measures of the adjustment
41. 41 X-12 ARIMA AutoRegressive Model of order p AR(p):
Moving Average Model of order q MA(q):
42. 42 X-12 ARIMA ARMA(p,q):
Corner method, triangle method, ODQ etc.
ARIMA(p,d,q):
Non stationary Time Series, with a Trend
SARIMA(p,d,q)(P,D,Q)S:
Non stationary Time Series, with Trend and Seasonality
43. 43 X-12 ARIMA A regARIMA model is a regression model with ARIMA errors. When we use regression models to estimate some of the components in a Time Series, the errors from the regression model are correlated, and we use ARIMA models to model the correlation in the errors. ARIMA models are one way to describe the relationships between points in a Time Series
Besides using regARIMA models to estimate regression effects (such as outliers, Trading Day, and Moving Holidays), we also use ARIMA models to forecast the series. Research has shown that using forecasted values gives smaller revisions at the end of the series
44. 44 X-12 ARIMA X-12 ARIMA runs through the following steps:
The series is modified by any user defined prior adjustments
The program fits a regARIMA model to the series in order to detect and adjust for outliers and other distorting effects to improve forecasts and Seasonal Adjustment
It detect and estimates additional component (e.g. Calendar Effects) and extrapolate forward (forecast) and backwards (backcast)
The program then uses a series of Moving Averages to decompose a Time Series into three components. It does this in three iterations, getting successively better estimates of the components. During these iterations extreme values are identified and replaced
In the last step a wider range of diagnostic statistics are produced, describing the final Seasonal Adjustment, and giving pointers to possible improvements which could be made
45. 45 Moving Averages A Moving Average of order p+f+1 and coefficients {?i} is defined by:
The value at date t is therefore replaced by a weighted average of p past values, the current value and f future values
Examples: simple moving averages of order 3
46. 46 Moving Averages Let us suppose the following decomposition model:
One can want remove the Seasonality and the Irregular by using a Moving Average:
So, we want to cancel some frequencies!!
47. 47 Moving Averages We would like to find a Moving Average which preserve the Trend and removes both Seasonality and Irregular
Example (preservation of constants): let us define a Series
and a Moving Average M
The coefficients must sum to 1
Property: if you want to preserve polynomials of degree d, the Moving Average coefficients must verify:
48. 48 A Filter is a weighted average where the weights sum to 1
Seasonal Filters are the filters used to estimate the Seasonal Component. Ideally, Seasonal Filters are computed using values from the same month or quarter, for example, an estimate for January would come from a weighted average of the surrounding Januaries
The Seasonal Filters available in X-12 ARIMA consist of seasonal Moving Averages of consecutive values within a given month or quarter. An n x m Moving Average is an m-term simple average taken over n consecutive sequential spans X-12 ARIMA
49. 49 An example of a 3x3 filter (5 terms) for January 2003 (or Quarter 1, 2003) is:
2001.1 + 2002.1 + 2003.1 +
2002.1 + 2003.1 + 2004.1 +
2003.1 + 2004.1 + 2005.1
9
An example of a 3x5 filter for January 2003 (or Quarter 1, 2003) is:
2000.1 + 2001.1 + 2002.1 + 2003.1 + 2004.1 +
2001.1 + 2002.1 + 2003.1 + 2004.1 + 2005.1 +
2002.1 + 2003.1 + 2004.1 + 2005.1 + 2006.1
15 X-12 ARIMA
50. 50 X-12 ARIMA Trend Filters are weighted averages of consecutive months or quarters used to estimate the trend component
An example of a 2x4 filter (5 terms) for First Quarter 2005:
2004.3 + 2004.4 + 2005.1 + 2005.2
2004.4 + 2005.1 + 2005.2 + 2005.3 _______________________________________
8
Notice that we are using the closest points, not just the closest points within the First Quarter like with the Seasonal Filters above
Notice also that every quarter has a weight of 1/4, though the Third Quarter uses values in both 2004 and 2005
51. 51 X-12 ARIMA Keep in mind that the data used in the Seasonal and the Trend Filters can go back several years. Let's look at an example using X-12 ARIMA's seasonal Moving Average Filters: if the last point in the series is January 2006, and you're using 3x5 Seasonal Filters, the value at January 2006 will effect the estimates for Januaries in 2004, and 2005. You can see the value for January 2003 in the equations below
The 3x5 filter for January 2004:
2001.1+2002.1+2003.1+2004.1+2005.1
2002.1+2003.1+2004.1+2005.1+2006.1
2003.1+2004.1+2005.1+2006.1+2007.1
_____________________________________________________________
15
52. 52 X-12 ARIMA Step 1.1: Initial Trend Estimate
Compute a centred 12 term (13 term) Moving Average as a first estimate of the Trend:
The first Trend estimation is always 2x12 or 2x4
Step 1.2: Initial Seasonal-Irregular component or “SI Ratio”
The ratio of the original series to the estimated trend is the first estimate of the de-trended series:
53. 53 X-12 ARIMA Step 1.3: Initial Preliminary Seasonal Factor
5 term weighted Moving Average (3x3) is calculated for each month of the Seasonal-Irregular ratios (SI) to obtain preliminary estimates of the Seasonal Factors:
Step 1.4: Initial Seasonal Factor
Crude “unbiased” seasonal from step 1.3 via centering:
Step 1.5: Initial Seasonal Adjustment
54. 54 X-12 ARIMA Step 2.1: Intermediate Trend
Calculate an intermediate Trend (Henderson) of length 2H+1 for data-determined H
where are the (2H+1) term Henderson weights
Step 2.2: Final SI Ratios
Calculate the de-trended series from Henderson trend:
55. 55 X-12 ARIMA Step 2.3: Preliminary Seasonal Factor
Calculate final “biased” seasonal factors via a “3x5” Seasonal Moving Average:
Step 2.4: Seasonal Factor
Calculate final “unbiased” seasonal factors via centering:
Step 2.5: Seasonal Adjustment
56. 56 X-12 ARIMA Step 3.1: Final Trend
Final Trend from a Henderson Trend Filter determined from data:
Step 3.2: Final Irregular
Final Irregular Factors as ratios between the Seasonally Adjusted series from Stage 2 and the final Trend from Step 3.1:
57. 57 X-12 ARIMA
58. 58 X-12 ARIMA The first Trend estimation (in Stage 1) is always 2x12 or 2x4
The Henderson Trend Filter choices (Step 2.1, 3.1) are based on noise-to-signal ratios, the size of the irregular variations relative to those of the Trend and labelled I/C in the X-12 ARIMA output
If I/C<1, the 9 term Henderson Filter (H=4) is used; otherwise, in Stage 2, the 13 term filter (H=6) is used
In Stage 3, the 13 term Filter is used when 1<I/C<3.5, but the 23 term Henderson Filter (H=11) is used when I/C=3.5
59. 59 X-12 ARIMA The criterion for selection of the seasonal Moving Average is based on the global I/S, which measures the relative size of irregular movements and seasonal movements averaged over all months or quarters. It is used to determine what seasonal Moving Average is applied using the following criteria:
The global I/S ratio is calculated using data that ends in the last full calendar year available
If 2.5<I/S<3.5 or 5.5<I/S<6.5 then the I/S ratio will be calculated using one year less of data to see if the I/S ratio than falls into one of the ranges given above. The year removing is repeated either until the I/S ratio falls into one of the ranges or after five years a 3x5 Moving Average will be used
60. 60 TRAMO/SEATS The objective of the procedure is to automatically identify the model fitting the Time Series and estimate the model parameters. This includes:
The selection between additive and multiplicative model types (log-test)
Automatic detection and correction of outliers, eventual interpolation of missing values
Testing and quantification of the Trading Day effect
Regression with user-defined variables
Identification of the ARIMA model fitting the Time Series, that is selection of the order of differentiation (unit root test) and the number of autoregressive and Moving Average parameters, and also the estimation of these parameters
61. 61 TRAMO/SEATS The application belongs to the ARIMA model-based method variants of decomposition of Time Series into non-observed components
The decomposition procedure of the SEATS method is built on spectrum decomposition
Components estimated using Wiener-Kolmogorov Filter
SEATS assumes that:
The Time Series to be Adjusted Seasonally is linear, with normal White Noise innovations
If this assumption is not satisfied, SEATS has the capability to interwork with TRAMO to eliminate special effects from the series, identify and eliminate outliers of various types, and interpolate missing observations
Then the ARIMA model is also borrowed from TRAMO
62. 62 TRAMO/SEATS The application decomposes the series into several various components. The decomposition may be either multiplicative or additive
The components are characterized by the spectrum or the pseudo spectrum in a non-stationary case:
The Trend Component represents the long-term development of the Time Series, and appears as a spectral peak at zero frequency. One could say that the trend is a cycle with an infinitely long period
The effect of the Seasonal Component is represented by spectral peaks at the seasonal frequencies
The Irregular Component represents the irregular White Noise behaviour, thus its spectrum is flat (constant)
The Cyclic Component represents the various deviations from the trend of the Seasonally Adjusted series, different from the pure White Noise
63. 63 First SEATS decomposes the ARIMA model of the Time Series observed, that is, identifies the ARIMA models of the components. This operation takes place in the frequency domain. The spectrum is divided into the sum of the spectra related to the various components
Actually SEATS decides on the basis of the argument of roots, which is mostly located near to the frequency of the spectral peak
The roots of high absolute value related to 0 frequency are assigned to the Trend Component
The roots related to the seasonal frequencies to the Seasonal Component
The roots of low absolute value related to 0 frequency and the cyclic (between 0 and the first seasonal frequency) and those related to frequencies between the seasonal ones are assigned to the Cyclic Component
The Irregular Component is always deemed as white noise TRAMO/SEATS
64. 64 Questions?