600 likes | 620 Views
Explore the application of covariance in time series data, correlations between random variables, and autocorrelation concepts. Learn to estimate covariance, sample correlation coefficients, and analyze the Neuse River Hydrograph data. Understand how autocorrelation reveals patterns in time series data.
E N D
Environmental Data Analysis with MatLab Lecture 17: • Covariance and Autocorrelation
SYLLABUS Lecture 01 Using MatLabLecture 02 Looking At DataLecture 03Probability and Measurement ErrorLecture 04 Multivariate DistributionsLecture 05Linear ModelsLecture 06 The Principle of Least SquaresLecture 07 Prior InformationLecture 08 Solving Generalized Least Squares ProblemsLecture 09 Fourier SeriesLecture 10 Complex Fourier SeriesLecture 11 Lessons Learned from the Fourier Transform Lecture 12 Power Spectral DensityLecture 13 Filter Theory Lecture 14 Applications of Filters Lecture 15 Factor Analysis Lecture 16 Orthogonal functions Lecture 17 Covariance and AutocorrelationLecture 18 Cross-correlationLecture 19 Smoothing, Correlation and SpectraLecture 20 Coherence; Tapering and Spectral Analysis Lecture 21 InterpolationLecture 22 Hypothesis testing Lecture 23 Hypothesis Testing continued; F-TestsLecture 24 Confidence Limits of Spectra, Bootstraps
purpose of the lecture apply the idea of covariance to time series
Atlantic Rock Dataset Scatter plot of TiO2 and Na2O Na2O TiO2
idealization as a p.d.f. scatter plot Na2O d1 TiO2 d2
types of correlations positive correlation negative correlation uncorrelated d2 d2 d2 d1 d1 d1
the covariance matrix … σ1,2 σ1,3 σ12 … σ22 σ1,2 σ2,3 C = σ2,3 σ32 … σ1,3 … … … …
recall the covariance matrix … σ1,2 σ1,3 σ12 … σ22 σ1,2 σ2,3 C = σ2,3 σ32 … σ1,3 … … … … covariance of d1 and d2
divide (di, dj) plane into bins bin, s dj Ddj di Ddi
estimate total probability in bin from the data: Ps = p(di, dj) ΔdiΔdj≈ Ns/N bin, s with Ns data dj Ddj di Ddi
approximate integral with sum and use p(di, dj) ΔdiΔdj≈Ns/N
approximate integral with sum and use p(di, dj) ΔdiΔdj≈Ns/N shrink bins so no more than one data point in each bin
normalize to range ±1 “sample” correlation coefficient
Rij-1perfect negative correlation0no correlation1perfect positive correlation
|Rij| of the Atlantic Rock Dataset SiO2 TiO2 Al2O2 FeO MgO CaO Na2O K2O SiO2 TiO2 Al2O2 FeO MgO CaO Na2O K2O
|Rij| of the Atlantic Rock Dataset 0.73 SiO2 TiO2 Al2O2 FeO MgO CaO Na2O K2O SiO2 TiO2 Al2O2 FeO MgO CaO Na2O K2O
Atlantic Rock Dataset Scatter plot of TiO2 and Na2O Na2O TiO2
Neuse River Hydrograph A) time series, d(t) d(t), cfs time t, days
high degree of short-term correlationwhat ever the river was doing yesterday, its probably doing today, toobecause water takes time to drain away
Neuse River Hydrograph A) time series, d(t) d(t), cfs time t, days
low degree of intermediate-term correlationwhat ever the river was doing last month, today it could be doing something completely differentbecause storms are so unpredictable
Neuse River Hydrograph A) time series, d(t) d(t), cfs time t, days
moderate degree of long-term correlationwhat ever the river was doing this time last year, its probably doing today, toobecause seasons repeat
Neuse River Hydrograph A) time series, d(t) d(t), cfs time t, days
1 day 3 days 30 days
Let’s assume different samples in time series are random variables and calculate their covariance
usual formula for the covariance assuming time series has zero mean
now assume that the time series is stationary(statistical properties don’t vary with time)so that covariance depends only ontime lag between samples
using the same approximation for the sample covariance as before 1
using the same approximation for the sample covariance as before 1 autocorrelation at lag (k-1)Δt
Autocorrelation on Neuse River Hydrograph symmetric about zero a point in a time series correlates equally well with another in the future and another in the past
Autocorrelation on Neuse River Hydrograph peak at zero lag a point in time series is perfectly correlated with itself
Autocorrelation on Neuse River Hydrograph falls off rapidly in the first few days lags of a few days are highly correlated because the river drains the land over the course of a few days
Autocorrelation on Neuse River Hydrograph negative correlation at lag of 182 days points separated by a half year are negatively correlated
Autocorrelation on Neuse River Hydrograph positive correlation at lag of 360 days points separated by a year are positively correlated
Autocorrelation on Neuse River Hydrograph repeating pattern A) B) the pattern of rainfall approximately repeats annually
autocorrelation similar to convolution note difference in sign
Important Relation #1autocorrelation is the convolution of a time series with its time-reversed self
integral form of autocorrelation change of variables, t’=-t