Resampling Methods

Resampling Methods Peter Bruce Resampling Stats, Cytel Software, Statistics.com pbruce@resample.com

Resampling Methods • What is resampling • Examples • Historical perspective

What is resampling • Permutation • Bootstrap • Monte Carlo simulation

Permutation • Survival times • Treated mice 94, 38, 23, 197, 99, 16, 141 • Mean: 86.8 • Untreated mice 52, 10, 40, 104, 51, 27, 146, 30, 46 • Mean: 56.2 (Efron & Tibshirani)

1. Calculate the difference between the means of the two observed samples – it’s 30.6 days in favor of the treated mice. 2. Consider the two samples combined (16 observations) as the relevant universe to resample from.

3. Draw 7 hypothetical observations and designate them "Treatment"; draw 9 hypothetical observations and designate them "Control". 4. Compute and record the difference between the means of the two samples.

5. Repeat steps 3 and 4 perhaps 1000 times. 6. Determine how often the resampled difference exceeds the observed difference of 30.6

Histogram of permuted differences

The Bootstrap • A new pigfood ration is tested on twelve pigs, with six-week weight gains as follows: • 496 544 464 416 512 560 608 544 480 466 512 496 • Mean: 508 ounces (establish a confidence interval)

The Classic Bootstrap Draw simulated samples from a hypothetical universe that embodies all we know about the universe that this sample came from – our sample, replicated an infinite number of times

1. Put the observed weight gains in a hat 2. Sample 12 with replacement 3. Record the mean 4. Repeat steps 2-3, say, 1000 times 5. Record the 5th and 95th percentiles (for a 90% confidence interval)

Bootstrapped sample means

Historical Perspective

1908 - W. S. Gossett

Fisher’s Tea Taster 8 cups of tea are prepared, four with tea poured first and four with milk poured first. The cups are presented to her in random order.

Permutation solution 1. Mark a strip of paper with eight guesses about the order of the "tea-first" and "milk-first" cups -- let's say T T T T M M M M. 2. Make a deck of eight cards, four marked "T" and four marked "M." 3. Deal out these eight cards successively in all possible orderings (permutations) 4. Record how many of those permutations show >= 6 matches.

Approximate Permutation 3.Shuffle the deck and deal it out along the strip of paper with the marked guesses, record the number of matches. 4. Repeat many times.

Other names • Monte Carlo permutation • Randomization test • Sampled permutation (randomization) test

Extension to multiple samples Fisher went on to apply the same idea to agricultural experiments involving two or more samples. The question became "How likely is it that random arrangements of the observed data would produce samples differing as much as the observed samples differ?"

Extension to samples from populations • In the 1930's, Fisher and Pitman showed that the inference for a permutation test extended to cover not just random re-arrangements of a fixed set of finite elements, but also samples from larger populations.

Formula-based analogs • Fisher and Pitman showed that the t-distribution and chi-squared distribution are good approximations for sufficiently large and/or normally-distributed samples.

The bootstrap • 1969 Simon publishes the bootstrap as an example in Basic Research Methods in Social Science (the earlier pigfood example) • 1979 Efron names and publishes first paper on the bootstrap • Coincides with advent of personal computer

Additional examples • Myocardial infarctions

10 of 135 high cholesterol men developed MI (.074), and only 21 of 470 low cholesterol(.045), for a difference of .029

Resampling solution 1. Constitute an urn with 31 “1’s” (MI) and 574 “2’s” (no MI) 2. Take a sample of size 135 from the urn (high cholesterol men) 3. Take a second sample of size 470 (low cholesterol men)

Resampling solution, cont. 4. Count the number of “1’s” (MI’s) in each 5. Divide by sample sizes to get proportions 6. Find the difference in proportions; sample 1 (n=135) minus sample 2 (n=470) 7. Keep score of the difference

URN 31#1 574#2 men An urn called "men" with 31 "1's" =infarctions) and 574 "2s" (=no infarction)REPEAT 1000 SHUFFLE men men TAKE men 1,135 high 'Sample (without replacement) from 'the urn 135 "men" TAKE men 136,605 low 'Same for a group of 470. COUNT high =1 a 'Count MI's in first group DIVIDE a 135 aa 'Express as a proportion COUNT low =1 b 'Count MI's in second group DIVIDE b 470 bb 'Express as a proportion SUBTRACT aa bb c 'Find the difference SCORE c z 'Keep score END HISTOGRAM z COUNT z >=.029 k How often was the resampled difference >= the observed difference? DIVIDE k 1000 kk Convert this result to a proportion PRINT kk

Results (est. p-value = 0.125)

Permutation procedures • Exact - conserve Type I error • Increasingly a part of software • Tend to be conservative

Bootstrap procedures • Bootstrap variants of permutation tests with 2 x 2 contingency tables can improve power • Straight bootstrap (involving no shuffling) can produce overly narrow confidence limits • As sample sizes increase, bootstrap undercoverage decreases

Bootstrap adjustments • Formula-based adjustments (t-boot, Boot-bca) • Double (iterated) bootstrap • Parametric bootstrap

Resampling Checklist 1. Specify relevant universe(s) 2. Specify sampling procedure (size, number of samples)? 3. Calculation of statistic/estimate of interest. 4. Re-sample results are scored and, after completion, used to calculate a numerical answer.

Flexibility in test statistic • Prior to resampling, much work expended on determining sampling distribution of test statistic. • No longer necessary • Use a wide variety of “home-grown” statistics

Terms • Bootstrap - sampling with replacement from observed data • Permutation test - constitute null model, permute & divide into resamples • Randomization test = permutation test • Approximate permutation test - shuffling instead of exhaustive permutation • Monte Carlo simulation • Exact tests =control of Type 1 error • Resampling

Resampling Methods