300 likes | 580 Views
Significance Testing Using Monte Carlo Techniques. Willem Landman and Tony Barnston. Required Reading. Wilks, D. S. 1995: Statistical Methods in the Atmospheric Sciences . Academic Press, New York, p. 145-157. Significance Testing.
E N D
Significance Testing Using Monte Carlo Techniques Willem Landman and Tony Barnston
Required Reading • Wilks, D. S. 1995: Statistical Methods in the Atmospheric Sciences. Academic Press, New York, p. 145-157.
Significance Testing • = testing of hypotheses (see any book on statistical methods) • Two contexts in which hypothesis tests are performed • parametric (theoretical distribution representation of data) • non-parametric (distribution-free, e.g., resampling procedures)
One-Tailed vs. Two-Tailed Tests • A statistical test can be either one-tailed (-sided) or two-tailed (-sided) • Probabilities on the tails of the distribution (parametric or empirical) govern whether a test result is significant or not • Whether a test is one- or two-tailed depends on the nature of the hypothesis being tested: • just interested in positive correlations: one-tailed (i.e., skill of a forecast time series) • interested in both positive and negative correlations: two-tailed (i.e., association between any two time series)
Probabilities for N(0,1) – 95% interval Typical distribution used in a two-tailed test
The Central Limit Theorem • In the limit, as the sample size becomes large the sum of a set of independent observations will be Gaussian • True, regardless of the distribution from which the observations have been drawn • Observations need not even be from same distribution • A common measure of a large enough sample for it to work is N > 30
Why Nonparametric Tests? • Parametric assumption(s) required for particular test are not met (i.e., independence, Gaussian distribution) • Sampling distribution is unknown and/or cannot be derived analytically • Sample size not large enough to know if it is Gaussian
Resampling Tests • The construction of artificial data sets from a given collection of real data by resampling the observations • Also known as rerandomization tests or Monte Carlo tests • These tests are highly adaptable: new tests can be designed to meet particular needs
Monte Carlo Test • Build up a collection of artificial data batches of the same size as the actual data at hand by using time-shuffled versions of the original data (to be described below) • Compute test statistic (i.e., correlation) of interest for each artificial batch • Number of artificial values of the test statistic = number of artificially generated batches • These reference test statistics constitute an estimated sampling distribution against which to compare the test statistic computed from the original data
Why use Monte Carlo Techniques? • No assumption regarding underlying theoretical distribution for the data necessary • ANY statistic can form the basis for the test: • i.e., median, skill difference, correlation, etc. • data being tested can be a scalar (i.e., number of forecast hits), or • vector-valued (i.e., spatial correlation)
Monte Carlo: the Bootstrap Technique • Constructing artificial data batches using sampling with replacement from data • Equivalent to writing each of the N data values on separate piece of paper • Put all N pieces in a hat • A bootstrap sample: N pieces are drawn from the hat, one at a time, and their values recorded • Each slip is put back and mixed before next draw • Repeat this whole process a large number of times (e.g., 1000)
Bootstrap Demo • For a data set of 10 elements (n1,…, n10) • Use a random-number generator (RNG) and generate any number from1 to 10, e.g., 9 • n9 is the first element of the artificial data • Repeat with the RNG for the remaining 9 entries • A possible artificial data set: n9, n3, n8, n4, n10, n6, n9, n7, n6, n4
Designing a Monte Carlo test statistic for linear correlation (1) • Two time series may have produced a linear correlation that seems useful • It can be tested if there is a statistical significant correlation using tables • Monte Carlo techniques do not require knowledge of the data’s distribution (i.e., Gaussian, binomial, Poisson, etc.)
Designing a Monte Carlo test statistic for linear correlation (2) • Rerandomization of time series; one or both (year-to-year autocorrelation must be low) • Correlate new time series, but keep the period constant • Repeat above steps a large number of times (i.e., 500 or 1000 iterations) • Sort the correlations • For two-tailed test, the absolute values of the correlations have to be sorted • For a one-tailed test, simply sort the correlations • For a 1000 iterations, the 950th value is the level of the 95% level of significance
Using Monte Carlo to calculate the significance of correlation differences • For this example, the objective is to see if the rainfall forecast time series from Model A is a significantly better forecast than the rainfall forecast time series from Model B • Three time series are therefore considered: the two rainfall forecast time series from Model A and Model B respectively, and the observed rainfall time series • Rerandomization of time series: • only the observed time series, or • only the two forecast time series, or • all three time series • Correlate each of the two new forecast time series with the new observed time series, but keep the period constant • Calculate the difference in correlation • Repeat above steps a large number of times (i.e., 500 or 1000 iterations) • Sort the correlation differences • A two-tailed test is required since the magnitude of the correlation differences are important • Therefore the absolute values of the correlation differences have to be sorted • For a 1000 iterations, the 950th value is the level of the 95% level of significance
Solid Green: forecast time series; dashed green: forecast trend Solid Blue: observed time series; dashed blue: observed trend Red: forecast 5-year running mean
Determining the significance of linear trends • Rerondomize series with bootstrap approach • Calculate new best linear fit (least-squares) • Calculate slope of artificial data • Do this a 1000 times • Sort artificial slopes and obtain 950th test statistic (just interested in the magnitude, i.e., the absolute value of the slope – therefore two-tailed test) • If original series has a slope < test statistic, trend is not statistically significant at 95% level
Example of Monte Carlo test for linear correlation Solid line: a single 500 iteration run Open circles: ten 500 iteration runs, averaged Crosses: a hundred 500 iteration runs, averaged
Increasing the number of iterations 10 fold has made the distribution smoother
Field Significance and Multiplicity • Special problems with statistical tests involving atmospheric fields – testing for pattern significance • Positive grid-point-to-grid-point correlation of underlying data produces statistical dependence among local tests • Multiplicity: the problem when the results of multiple independent significant tests are jointly evaluated
Stating the problem of Multiplicity through an example • Say we want to see if there is a significant pattern in the association (correlation) between the Nino3.4 index and an area (100 grid points) over the southern Indian Ocean • Local tests at each southern Indian Ocean grid-point indicate 8 of the 100 are significant at the 95% level • It may be supposed that, since 5% of 100 is 5, getting 8 locally significant correlations is evidence of field significance • The following example suggests that this assumption is flawed
…and… …after only a few rerandomization of the rainfall time series…
Using a Monte Carlo approach, it was possible to design a rerandomized rainfall time series that produced an El Niño type spatial pattern in the oceans. Clearly the real association between SON SSTs and the series of random numbers is zero (!!!), but the substantial grid-point-to-grid-point correlation among the SON SSTs yields spatial coherent areas of chance sample correlation that are deceptively high (due to the high spatial correlations the spatial degrees of freedom is far less than the number of grid-points).
Another example of multiplicity and its solution • An index of the eastern equatorial Pacific Ocean correlates well with concurrent seasonal rainfall indices of 9 contiguous regions over 50 seasons • Local significance tests show significant correlations (95% level) between index and regional rainfall for 7 of the 9 regions • However, there are positive region-to-region correlations found in the seasonal rainfall of the regions (see next slide) • The question: is getting 7 local significant values significant?
Region-to-region correlation Correlations significant at the 95% level are shaded R refers to regions
Solution (1) • Since regions are correlated spatially, rerandomization is done by resampling synchronized seasons (keep all regions together in time) of the regional rainfall indices • Correlate rerandomized rainfall indices with equatorial Pacific index, and repeat a 1000 times • As with a single time series, sort the correlations of each region and identify 95th percentile for each: critical values to be exceeded for a correlation to be significant at 95% for each region • Next question: for each iteration, how many of the regions’ correlations (unsorted) are greater than the respective percentile? (example on next slide)
Solution (2) • There is a count for each iteration – 10000 counts from 10000 iterations • After sorting the counts, count the number of iterations that produced counts higher than 7 (7 regions have statistically significant correlations) • This summation indicates the probability of getting 7 or better by chance
When we get 4 local significant correlations, we have already reached the 95% level of significance – getting 7 correlations significant is therefore highly significant!