270 likes | 410 Views
Time Series Analysis by descriptive statistic. R. Werner Solar Terrestrial Influences Institute - BAS. Def.: A time series is a sequence of data points measured at successive times (often) spaced in uniform time intervals.
E N D
Time Series Analysis by descriptive statistic R. Werner Solar Terrestrial Influences Institute - BAS
Def.: A time series is a sequence of data points measured at successive times (often) spaced in uniform time intervals. Time series analysiscomprises methods that attempt to understand time series, often either to understand the underlying context of the data points - where did they come from, what generated them, or tomake forecasts (predictions). From Wikipedia.org
Using methods of descriptive Statistic of quantitative cross-section analysis, important measures are: • arithmetic mean • - variance • - correlation • coefficient • Do not forgetvisualization • scatter plots • example: histogramms For the time series meaningful only for stationarity!
For the time series Auto-correlation symmetric for k lag k auto-covariance used in practice:
Correspondence of the cross-correlation to thequantitative cross-section analysis Relation of two time series, co-variance: with lag k or It is not known which series is the leading series cross-correlation: non-symmetric for k
Time series decomposition into components often non-stationary (we have trends) andperiodicalvariations Models: additive: T: trend S: seasonal R: rest, noise multiplicative: by logarithmizing → transition to additive model mixed:
Step by step: • Trend determination • Trend subtraction from the series and • determination of the seasonal component • 3. After removing the seasonal component, the rest remains After this: analysis of the rest, correlation, seasonality or other periodicities or a trend
Determination of the trend Global trend (over the entire observation interval) or polynomial regression model of order p, splines Square sum of errors: F-test Not to be used for prognoses, (increasing with p)
Other linear models: exponential model logistic trend functions A>0 C>0 Local trend:movingaverage (running mean), to remove oscillations (seasonality) odd: point numbers even:
How does the variance change? where 2q+1 is the number of sampling points bi are the weights Besides, for removing the seasonal means, we have to calculate the running mean over 13 months, with bi = 1/24 for the first and the last month, otherwise bi=1/12 ! For the given examples:
Trend removing by calculation of differences Linear trend: Polynomial trend: recursive formulae
Problems related to the trend determination • For short time series, the determined trend will not be equal to the long time trend, and will not be distinguishable from the longer periodicities • By smoothing the reversal points of the time series are shifted • The production of autocorrelations by smoothing with running averages (quasi-periodicities – Slutzky effect)
FFT of the basic period,without trend FFT of the basic period with trend
Determination of the seasonal component A very simple method for constant seasonalvariations Assumption: no trend! also: Phaseaverage i is the month k is the number of years the perfect case: in practice: Standardized phase average
Or dummy regression with: 1 if the month number i 0 else 12 equations ! or together with a polynomial trend For a multiplicative model:
Periodogram analysis Strategies: - Step by stepdetermination of the period Tp - Test of a theoretical hypothesis Fourier frequencies (n odd) The entire time interval is used for T1 Harmonic analysis - non-equidistant time intervals - choice of the basic period
Harmonic series • If j/n are Fourier frequencies, the regressor functions are orthogonal. All coefficients can be calculated together and they are not changed by the choice of a new m • If j/n are not Fourier frequencies, then we have to calculate all coefficients again by changing m • If the data number is equal to the calculated coefficients, then we have no degree of freedom, the calculated series is not an estimation. The error term is zero! → filter
It can be proven that r2 is the determination coefficient, the part of the explained sum of the squared deviations, besides is the explained sum of the squared deviations Periodogram Plot of the intensities against the periods Tj Spectrogram Plot of the intensities against the frequencies fj
Other methods are: • Lomb-Scargle Periodogram • Wavelet
How to determine which is the better model approximation, additive or multiplicative? Analysis of the variance: spread versus level plot (SLP-diagram) - splitting the time series in to intervals, - determination of the standard deviations in the intervals - plotting the stand. dev. against the means line parallel to x-axis → additive model if the SLP linear line → multiplicative model no decision → mixture model
Box/Cox Transformation for λ ≠ 0 for λ = 0 or in a simpler form for λ ≠ 0 for λ = 0 • Determination of λ: • stand. dev. plot against logarithms of the mean time interval points • combination with SLP λ = 0 multiplicative model λ = 1 additive model Use simple coeff. λ 1/4;1/3; 1/2;...
Acknowledgement I want to acknowledge to the Ministery of Education and Science to support this work under the contract DVU01/0120