290 likes | 516 Views
Oceanography 569 Oceanographic Data Analysis Laboratory. Kathie Kelly Applied Physics Laboratory 515 Ben Hall IR Bldg class web site: faculty.washington.edu/kellyapl/classes/ocean569_2014/. Organization. 1 lecture, 1 lab period (2 hrs) per week
E N D
Oceanography 569Oceanographic Data Analysis Laboratory Kathie Kelly Applied Physics Laboratory 515 Ben Hall IR Bldg class web site: faculty.washington.edu/kellyapl/classes/ocean569_2014/
Organization • 1 lecture, 1 lab period (2 hrs) per week • Exercise assigned in lab, finish by following lecture • Presentation of solution in lecture session • One class project completed individually • Grade based on presentations and project • Office hours by appointment
Materials Materials available on class web site: • Powerpoint notes • mfiles & mat files for exercises • specialized functions (mfiles) • example solutions (following week) Text: “Modeling Methods for Marine Science” by Glover, Jenkins & Doney • on reserve in Physics Library • a good reference to buy
General Procedure for Data Analysis • Define analysis goal • Characterize data • Prepare data • Errors and error propagation • Statistical analyses • Combine data with model (prognostic, diagnostic, statistical)
Daily satellite SST maps Exercise 1: Aegean Sea temperaturesanalysis goal: create continuous 3-m time series • 5 buoys (POSEIDON) • 3-m 3-hourly temperatures (with gaps)
Exercise 1: Characterize Data 3-m: higher resolution, but gaps SST: continuous, but only daily What happens when the data are “merged”? To make a consistent series, what is sacrificed?
Exercise 1: Data discrepanciescompareapples & apples: average 3-m to daily What are the characteristics of the differences? How can the differences be reconciled?
Periodic Signals • Robust way to estimate periodic signals, especially for gappy data: • fit_harmonics: fit to cosines with period L, L/2, etc (cf. Fourier series) • [amp,phase,frac,offset,da]=fit_harmonics(data,time,nharm,L,cutoff); • d_periodic = amp(1)*cos(2*pi*t/L+phase(1)) • + amp(2)*cos(2*pi*2*t/L+phase(2)) • + ... • + amp(n)*cos(2*pi*n*t/L+phase(n)) • +offset for nharm=n • includes jth term only if frac(tion) of variance removed > cutoff/100 • returns anomaly: da = data - d_periodic • Note: offset is not the same as mean(data) • Remove mean using fit_harmonics if strong seasonal cycle!
Exercise 1: Fix discrepanciesfind & remove seasonal cycle in difference Result: daily average temperature that matches the seasonal cycle of the 3-m series
Other goals Continuous SST with a diurnal cycle: use 3m temperature to find diurnal cycle Correct SST for aliasing from undersampling the diurnal cycle Create non-seasonal temperature anomalies
AliasingSST sampling aliases diurnal cycle“Nyquist frequency”: period of 2*Δt sample diurnal temperature signal using 26-hr intervals
Matlab functions • datenum: converts yyyy,mm,dd to Julian dates, starting at year 0; also datestr, datevec, datetick(‘x’) • imagesc: bit map that shows each image pixel, scaled to colormap • (cf. pcolor, which interpolates pixels to a grid) • NaN, “not a number”: use to flag invalid data, then nanmean, nansum, etc ignore NaN’s. Does not plot. To find valid data: • ind=find(~isnan(data)); • fit_harmonics(data,time,nharm,L,cutoff): use to find any periodic signal in the data, using the time axis, period L and a cutoff (% of variance explained)
Statistics of Observations “random” variables Are these observations of random variables? Will removing the mean make them random?
Statistical Definitions: mean The sample mean is given by The mean of the parent population is given by But we never know it since the sample is finite. For class the mean wil refer to the sample mean, regardless of the symbol. The factor N here is the number of degrees of freedom.
Statistical Definitions: variance The sample variance is given by where s is the standard deviation of x. The variance of the parent population corresponds to an infinite number of samples, N. The N-1 factor occurs because using the sample mean “uses up” one of the degrees of freedom of the data set. In class the we will refer to the sample variance.
Exercise 2: Periodic Signals need to remove non-random components Both have periodic signals (seasonal, not random)
Caution: mean of data with periodic componentif incomplete cycles in sample Use “offset” from fit_harmonics instead
Exercise 2: Probability Distributions(histogram) Both non-seasonal SST and non-seasonal rain are random variables. Are either of these normally distributed?
Normal Distribution for Random Variable Why do we want a normal distribution? Least-squares fit, correlations, optimal interpolation have error estimates based on assumption of normal distributions of random data and/or errors
Exercise 2: Making a variable more normal distribution of log(rain) log(rain) rain
Exercise 2: distributions for modified variabledeciles rain uniform rain deciles
To edit or not to edit • For a truly normal distribution, 0.3% of the data are more than • 3 standard deviations from the mean • “Three-sigma edit”: remove data more than 3 std dev from mean • Best to justify edits in terms of • likely error sources and characteristics • spikes • unphysical values • comparisons with other variables
Exercise 2: Edit data3-sigma outliers • Procedure for removing suspicious data: • remove known signals (diurnal, seasonal, trends) • check for normal distribution • compute σ (standard deviation) • remove data more than 3*σ from mean • do not iterate!
Central Limit Theorem Why is Normal distribution commonly used? Underlying distributions may be unknown or non-Normal BUT if measurement (or error) is sum of many processes, distribution will approach Normal Example: distribution of the mean of X for different distributions as the number of samples increases