510 likes | 674 Views
BSTA 670 – Statistical Computing. Lecture 15: Resampling Methods. Resampling Procedures. Resampling procedures date back to 1930s, when permutation tests were introduced by R.A. Fisher and E.J.G. Pitman. They were not feasible until the computer era. Resampling Procedures.
E N D
BSTA 670 – Statistical Computing Lecture 15: Resampling Methods
Resampling Procedures Resampling procedures date back to 1930s, when permutation tests were introduced by R.A. Fisher and E.J.G. Pitman. They were not feasible until the computer era.
Resampling Procedures The resampling method frees researchers from two limitations of conventional statistics: “the assumption that the data conform to a bell-shaped curve and the need to focus on statistical measures whose theoretical properties can be analyzed mathematically.” Diaconis, P., and B. Efron. (1983). Computer-intensive methods in statistics. Scientific American, May, 116-130.
Resampling Procedures The resampling method "addresses a key problem in statistics: how to infer the 'truth' from a sample of data that may be incomplete or drawn from an ill-defined population." Peterson, I. (July 27, 1991). Pick a sample. Science News, 140, 56-58.
Resampling Procedures Using resampling methods, “you're trying to get something for nothing. You use the same numbers over and over again until you get an answer that you can't get any other way. In order to do that, you have to assume something, and you may live to regret that hidden assumption later on” Statement by Stephen Feinberg, cited in: Peterson, I. (July 27, 1991). Pick a sample. Science News, 140, 56-58.
Permutation Tests • In classical hypothesis testing, we start with assumptions about the underlying distribution and then derive the sampling distribution of the test statistic under H0. • In Permutation testing, the initial assumptions are not needed, and the sampling distribution of the test statistic under H0 is computed by using permutations of the data.
Permutation Tests • The Permutation test is a technique that bases inference on “experiments” within the observed dataset. • Consider the following example: • In a medical experiment, rats are randomly assigned to a treatment (Tx) or control (C) group. • The outcome Xi is measured in the ith rat.
Permutation Tests • Under H0, the outcome does not depend on whether a rat carries the label Tx or C. • Under H1, the outcome tends to different, say larger for rats labeled Tx. • A test statistic T measures the difference in observed outcomes for the two groups. T may be the difference in the two group means (or medians), denoted as t for the observed data.
Permutation Tests • Under H0, the individual labels of Tx and C are unimportant, since they have no impact on the outcome. Since they are unimportant, the label can be randomly shuffled among the rats without changing the joint null distribution of the data. • Shuffling the data creates a “new” dataset. It has the same rats, but with the group labels changed so as to appear as there were different group assignments.
Permutation Tests • Let t be the value of the test statistic from the original dataset. • Let t1 be the value of the test statistic computed from a one dataset with permuted labels. • Consider all M possible permutations of the labels, obtaining the test statistics, t1, …, tM. • Under H0, t1, …, tM are all generated from the same underlying distribution that generated t.
Permutation Tests • Thus, t can be compared to the permuted data test statistics, t1, …, tM , to test the hypothesis and obtain a p-value or to construct confidence limits for the statistic.
Permutation Test Example 1 Consider data from two groups: Group 1: 20,23,30 Group 2: 10,15,18 Interested in testing if the mean of the two groups are equal. T-test is not appropriate, given the sample size.
Permutation Test Example 1 If the group means are truly equal, then shifting the group labels will not have a big impact the sum of the two groups (or mean with equal sample sizes). Some group sums will be larger than in the original data set and some will be smaller.
Permutation Test Example 1 • There are 6 observations. • There are 6!=720 permutations considering all 6 positions as unique. • Of the 720 permutations, there are 20 unique combinations (equal sample sizes). • Compute mean (or sum) for each of these.
Permutation Test Example 1 perm<-matrix( c( 20,23,30 , 10,15,18 , 20,23,10 , 30,15,18 , 20,23,15 , 30,10,18 , 20,23,18 , 30,10,15 , 20,30,10 , 23,15,18 , 20,30,15 , 23,10,18 , 20,30,18 , 23,10,15 , 23,30,10 , 20,15,18 , 23,30,15 , 20,10,18 , 23,30,18 , 20,10,15 , 10,15,18 , 20,23,30 , 10,15,20 , 23,30,18 , 10,15,23 , 20,30,18 , 10,15,30 , 20,23,18 , 10,18,20 , 23,30,15 , 10,18,23 , 20,30,15 , 10,18,30 , 20,23,15 , 15,18,20 , 23,30,10 , 15,18,23 , 20,30,10 , 15,18,30 , 20,23,10 ) , nrow=20, ncol=6, byrow=T) g1sum<-perm[,1]+perm[,2]+perm[,3] g2sum<-perm[,4]+perm[,5]+perm[,6] less<-ifelse(g1sum<g2sum,0,1) sum(less) mean(less)
Permutation Test Example 1 > perm<-matrix( c( + 20,23,30 , 10,15,18 , + 20,23,10 , 30,15,18 , + 20,23,15 , 30,10,18 , + 20,23,18 , 30,10,15 , + 20,30,10 , 23,15,18 , + 20,30,15 , 23,10,18 , + 20,30,18 , 23,10,15 , + 23,30,10 , 20,15,18 , + 23,30,15 , 20,10,18 , + 23,30,18 , 20,10,15 , + 10,15,18 , 20,23,30 , + 10,15,20 , 23,30,18 , + 10,15,23 , 20,30,18 , + 10,15,30 , 20,23,18 , + 10,18,20 , 23,30,15 , + 10,18,23 , 20,30,15 , + 10,18,30 , 20,23,15 , + 15,18,20 , 23,30,10 , + 15,18,23 , 20,30,10 , + 15,18,30 , 20,23,10 ) , nrow=20, ncol=6, byrow=T) > > g1sum<-perm[,1]+perm[,2]+perm[,3] > g2sum<-perm[,4]+perm[,5]+perm[,6] > less<-ifelse(g1sum<g2sum,1,0) > sum(less) [1] 9 > mean(less) [1] 0.45 20 orderings, 9 have Group 1 mean < Group 2 mean. Pr(Group1 Mean < Group2 / H0) = 0.45
Permutation Test Example 1 • Note that all permutations were enumerated. Thus, this was an exact permutation test. • When dataset is too large to enumerate all permutations, a large number of random permutations are selected.
Permutation Test Example 2 Two independent sample binomial test (First load Package 'exactRankTests‘) A <- c(rep(1, 61), rep(0, 19)) # Group 1 B <- c(rep(1, 45), rep(0, 35)) # Group 2 perm.test(A, B, conf.int=TRUE, exact=TRUE) # Run permutation test # Note, exact=TRUE indicates to compute exact p-value. But, this is not done if sample size exceeds 50. 2-sample Permutation Test data: A and B T = 61, p-value = 0.01180 alternative hypothesis: true mu is not equal to 0 95 percent confidence interval: 0.05263158 0.34285714
Permutation Test Example 2 Two independent sample binomial test t.test(A,B) Welch Two Sample t-test data: A and B t = 2.7198, df = 154.425, p-value = 0.007281 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 0.05473511 0.34526489 sample estimates: mean of x mean of y 0.7625 0.5625
Bootstrap Brad Efron, from Stanford University, Developed Boostrap in 1979 Efron, B. Bootstrap methods: another look at the jackknife. Annals of Statistics, 1979, 7, p.1-26.
Bootstrap (Nonparametric) Have a random sample from an unknown PDF, F. Want to estimate based on . We calculate the estimate based on . Want to know how accurate is .
Bootstrap (Nonparametric) Notation: Random sample: Empirical distribution , places mass of 1/n at each observed data value. Bootstrap sample: Random sample of size n, drawn from , denoted as Bootstrap replicate of :
Bootstrap (Nonparametric) • Bootstrap steps: • Select bootstrap sample • consisting of n data values drawn with replacement from the original data set. • Evaluate for the bootstrap sample • Repeat steps 2 and 3 B times each. • Estimate the standard error by the sample standard deviation of the B replications:
Parametric Bootstrap Resampling makes no assumptions about the population distribution. The bootstrap covered thus far is a nonparametric bootstrap. If we have information about the population distr., this can be used in resampling. In this case, when we draw randomly from the sample we can use population distr. For example, if we know that the population distr. is normal then estimate its parameters using the sample mean and variance. Then approximate the population distr. with the sample distr. and use it to draw new samples.
Parametric Bootstrap As expected, if the assumption about population distribution is correct then the parametric bootstrap will perform better than the nonparametric bootstrap. If not correct, then the nonparametric bootstrap will perform better.
Example of Bootstrap (Nonparametric) Have test scores (out of 100) for two consecutive years for each of 60 subjects. Want to obtain the correlation between the test scores and the variance of the correlation estimate. Can use bootstrap to obtain the variance estimate.
Example of Bootstrap (Nonparametric) Data year1<-c(60.05012,66.23529,66.45363,69.4353,72.35835,74.5647, 75.22353,85.08235,85.4353,85.50588,85.52,86.07059,86.80547, 87.01176,87.82437,88.04118,88.04622,88.25336,88.26,88.47059, 88.51765,88.67815,89.2916,89.54706,89.85883,89.85883,90,90.11765, 90.23529,90.51765,90.52101,90.61597,91.50588,91.64706,91.67941, 91.92941,91.97479,92.05882,92.11765,92.17857,92.29454,92.82353, 93.03866,93.27059,93.48235,93.57647,93.85966,93.92101,94.02689, 94.05462,94.12521,94.96471,96.25883,96.41512,96.61176,96.68235, 97.19832,99.37227,99.45882,99.90588) year2<-c(54.07727,71.10421,71.61901,71.64993,72.13461,72.95383, 75.10436,75.88354,76.22269,77.99503,79.89235,80.047,80.10675, 80.16948,80.38153,81.71319,81.86475,82.06805,82.7322,82.80527, 83.6807,83.91642,84.10549,84.13135,84.53588,84.71319,84.95586, 85.31587,85.88763,85.91762,86.02145,86.23808,86.32195,86.74031, 87.16442,87.31636,88.14243,88.49493,88.66029,88.82101,88.98708, 89.14404,89.60036,89.84047,90.66312,90.86598,91.18991,91.19348, 91.59389,91.78065,92.16543,92.20275,92.70338,92.87971,93.04095, 93.49293,95.54397,96.63901,98.81994,98.98661)
Example of Bootstrap (Nonparametric) R Code to perform bootstrap B<-1000 n1<-length(year1) n2<-length(year2) n<-n1 corrb<-NULL for(i in 1:B){ bs<-round(runif(n)*(n-1),0)+1 xb1<-year1[bs] xb2<-year2[bs] corrbi<-cor(xb1,xb2) corrb<-c(corrb,corrbi) } cor(year1,year2) mean(corrb) var(corrb) hist(corrb)
Example of Bootstrap (Nonparametric) Output cor(year1,year2) [1] 0.9455347 > mean(corrb) [1] 0.9406916 > var(corrb) [1] 0.000186016
Example • Bilker WB, Brensinger C, Gur RC, A two factor ANOVA-like test for correlated correlations: CORANOVA, 2004, Multivariate Behavioral Research, 39:4, pages 565-594.
Example: CORANOVA • During performance of a verbal memory task, the laterality of blood flow rate (L-R hemisphere) was measured in temporal, frontal, and subcortical regions, using PET. • Goal of study was to compare the relationships between verbal memory performance and the laterality of blood flow in each of the three brain regions, and to assess if these relationships are similar between males and females.
Bootstrap Confidence Intervals • CI’s for parameter estimates are calculated using statistics from original sample and bootstrap samples. • Multiple methods are available • Must be careful to remember that the complexity of the bootstrap method is not related to the accuracy of the answer. • Difference in results from different approaches will depend on n and the skewness of original data
Bootstrap Methods for CIs • Percentile Method • Standard Method • Bootstrap-t Method • Bias-corrected (BCa) Method
Percentile Method • Calculate the parameter θfor each bootstrap sample and select a (e.g., 0.05). • LCL = a /2 th percentile. • UCL = (1 - a /2) th percentile. • Say B=1000. Order bootstrap estimates from smallest to largest. 2.5th and 97.5th percentiles are the 25th and 975th ordered values, respectively.
Percentile Method • Consider the case where the underlying distribution is far from normal, say a survival time for mice in an experiment, but a transformation exists for normality.
Percentile Method • The normal based CI and Percentile Method CI can be very different. • In such cases, when the data are transformed to normality, the two methods provide very similar CIs, which agree with the Percentile method applied to the untransformed data.
Percentile Method • The Percentile method gets the better CI without actually applying or even knowing the transformation to normality. Why does this happen? • The percentile method “always knows the correct transformation”. • We do not need to know the correct transformation. We assume only that such a transformation exists.
Percentile Method • In terms of coverage the Percentile method generally covers well, but can under cover or over cover in some cases (tested in cases where we know the underlying distribution). • This is because the method has no knowledge of the underlying distribution and uses the empirical distribution instead.
Standard Method • Obtain B bootstrap estimates of the parameter θ • Calculate the standard error of θ based on standard deviation of B bootstrap estimates:
Bootstrap-t Method • Same as Standard Bootstrap, except obtain t-statistic from the bootstrap samples. • For each bootstrap sample, calculate tb: • Calculate the a/2 th and (1- a/2 )th percentiles tb.
Calculating SEb for Bootstrap-t • Normal Approximation Rule (Large Sample) • Nested Bootstrap • For each bootstrap sample (b), run j = 1,…,1000 bootstrap simulations to derive 1000 parameter estimates
Bootstrap-t and Percentile method • Requirements for “good” Bootstrap CI. 1. For cases where there is statistical theory available to obtain an exact answer for the CI, the Bootstrap CI should closely match the theoretical CI. 2. Should provide accurate coverage probabilities.
Bootstrap-t and Percentile method • Neither the Percentile or bootstrap-t methods meet these criteria. • The bootstrap-t CIs have erratic coverage in practice, even though they do well theoretically. • Although the percentile method CIs are less erratic in practice than the bootstrap-t CIs, the coverage is not “good enough”.
BCa Bootstrap • In previously presented bootstrap CIs, there is always the possibility of bias in as an estimator of . • There is also the possibility that SE depends on , SE ( ) . Note pivot makes the assumption the SE does not depend on • BCa makes adjustments for both of these issues. See Textbook by Efron & Tibshirani (1993)
Transformation-Respecting Property • Apply bootstrap procedure to , based on . • Apply bootstrap procedure to , based on • Apply transformation, m, to endpoints of second CI will yield first CI when the bootstrap procedure used is said to be transformation-respecting. • This can be stated as: • Compute LL(ψ) , UL(ψ) • Compute LL(θ), UL(θ) • m(LL(θ)), m(UL(θ)) should be approximately LL(ψ) , UL(ψ)
Transformation-Respecting Property • The percentile method is transformation-respecting. • The bootstrap-t method is NOT transformation-respecting (but still produces second-order valid intervals) • The BCa method is transformation-respecting (and produces second-order valid intervals)
Comparing these Bootstrap CI methods • Confidence Intervals will differ by method • In general, as n decreases and skewness increases: Bootstrap-t > BCa > percentile > standard • [LCL, UCL] can exceed the min and max from the observed data, make adjustment
When should the Bootstrap approach be used, and which approach is best? • As with other approaches for quantifying parameter uncertainty, confidence in parameter estimates improves with better data quality and increased sample size • Bootstrap is not a substitute for a weak or non-representative sample, including CIs. • Choosing the “best” bootstrap approach is a judgment call. Extent of differences in CI’s may be useful for sensitivity analysis