1 / 36

Resampling Methods

Resampling Methods. Peter Bruce Resampling Stats, Cytel Software, Statistics.com pbruce@resample.com. Resampling Methods. What is resampling Examples Historical perspective. What is resampling. Permutation Bootstrap Monte Carlo simulation. Permutation. Survival times

Download Presentation

Resampling Methods

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Resampling Methods Peter Bruce Resampling Stats, Cytel Software, Statistics.com pbruce@resample.com

  2. Resampling Methods • What is resampling • Examples • Historical perspective

  3. What is resampling • Permutation • Bootstrap • Monte Carlo simulation

  4. Permutation • Survival times • Treated mice 94, 38, 23, 197, 99, 16, 141 • Mean: 86.8 • Untreated mice 52, 10, 40, 104, 51, 27, 146, 30, 46 • Mean: 56.2 (Efron & Tibshirani)

  5. 1. Calculate the difference between the means of the two observed samples – it’s 30.6 days in favor of the treated mice. 2. Consider the two samples combined (16 observations) as the relevant universe to resample from.

  6. 3. Draw 7 hypothetical observations and designate them "Treatment"; draw 9 hypothetical observations and designate them "Control". 4. Compute and record the difference between the means of the two samples.

  7. 5. Repeat steps 3 and 4 perhaps 1000 times. 6. Determine how often the resampled difference exceeds the observed difference of 30.6

  8. Histogram of permuted differences

  9. The Bootstrap • A new pigfood ration is tested on twelve pigs, with six-week weight gains as follows: • 496 544 464 416 512 560 608 544 480 466 512 496 • Mean: 508 ounces (establish a confidence interval)

  10. The Classic Bootstrap Draw simulated samples from a hypothetical universe that embodies all we know about the universe that this sample came from – our sample, replicated an infinite number of times

  11. 1. Put the observed weight gains in a hat 2. Sample 12 with replacement 3. Record the mean 4. Repeat steps 2-3, say, 1000 times 5. Record the 5th and 95th percentiles (for a 90% confidence interval)

  12. Bootstrapped sample means

  13. Historical Perspective

  14. 1908 - W. S. Gossett

  15. Fisher’s Tea Taster 8 cups of tea are prepared, four with tea poured first and four with milk poured first. The cups are presented to her in random order.

  16. Permutation solution 1. Mark a strip of paper with eight guesses about the order of the "tea-first" and "milk-first" cups -- let's say T T T T M M M M. 2. Make a deck of eight cards, four marked "T" and four marked "M." 3. Deal out these eight cards successively in all possible orderings (permutations) 4. Record how many of those permutations show >= 6 matches.

  17. Approximate Permutation 3.Shuffle the deck and deal it out along the strip of paper with the marked guesses, record the number of matches. 4. Repeat many times.

  18. Other names • Monte Carlo permutation • Randomization test • Sampled permutation (randomization) test

  19. Extension to multiple samples Fisher went on to apply the same idea to agricultural experiments involving two or more samples. The question became "How likely is it that random arrangements of the observed data would produce samples differing as much as the observed samples differ?"

  20. Extension to samples from populations • In the 1930's, Fisher and Pitman showed that the inference for a permutation test extended to cover not just random re-arrangements of a fixed set of finite elements, but also samples from larger populations.

  21. Formula-based analogs • Fisher and Pitman showed that the t-distribution and chi-squared distribution are good approximations for sufficiently large and/or normally-distributed samples.

  22. The bootstrap • 1969 Simon publishes the bootstrap as an example in Basic Research Methods in Social Science (the earlier pigfood example) • 1979 Efron names and publishes first paper on the bootstrap • Coincides with advent of personal computer

  23. Additional examples • Myocardial infarctions

  24. 10 of 135 high cholesterol men developed MI (.074), and only 21 of 470 low cholesterol(.045), for a difference of .029

  25. Resampling solution 1. Constitute an urn with 31 “1’s” (MI) and 574 “2’s” (no MI) 2. Take a sample of size 135 from the urn (high cholesterol men) 3. Take a second sample of size 470 (low cholesterol men)

  26. Resampling solution, cont. 4. Count the number of “1’s” (MI’s) in each 5. Divide by sample sizes to get proportions 6. Find the difference in proportions; sample 1 (n=135) minus sample 2 (n=470) 7. Keep score of the difference

  27. URN 31#1 574#2 men An urn called "men" with 31 "1's" =infarctions) and 574 "2s" (=no infarction)REPEAT 1000 SHUFFLE men men TAKE men 1,135 high 'Sample (without replacement) from 'the urn 135 "men" TAKE men 136,605 low 'Same for a group of 470. COUNT high =1 a 'Count MI's in first group DIVIDE a 135 aa 'Express as a proportion COUNT low =1 b 'Count MI's in second group DIVIDE b 470 bb 'Express as a proportion SUBTRACT aa bb c 'Find the difference SCORE c z 'Keep score END HISTOGRAM z COUNT z >=.029 k How often was the resampled difference >= the observed difference? DIVIDE k 1000 kk Convert this result to a proportion PRINT kk

  28. Results (est. p-value = 0.125)

  29. Permutation procedures • Exact - conserve Type I error • Increasingly a part of software • Tend to be conservative

  30. Bootstrap procedures • Bootstrap variants of permutation tests with 2 x 2 contingency tables can improve power • Straight bootstrap (involving no shuffling) can produce overly narrow confidence limits • As sample sizes increase, bootstrap undercoverage decreases

  31. Bootstrap adjustments • Formula-based adjustments (t-boot, Boot-bca) • Double (iterated) bootstrap • Parametric bootstrap

  32. Resampling Checklist 1. Specify relevant universe(s)   2. Specify sampling procedure (size, number of samples)? 3. Calculation of statistic/estimate of interest. 4. Re-sample results are scored and, after completion, used to calculate a numerical answer.

  33. Flexibility in test statistic • Prior to resampling, much work expended on determining sampling distribution of test statistic. • No longer necessary • Use a wide variety of “home-grown” statistics

  34. Terms • Bootstrap - sampling with replacement from observed data • Permutation test - constitute null model, permute & divide into resamples • Randomization test = permutation test • Approximate permutation test - shuffling instead of exhaustive permutation • Monte Carlo simulation • Exact tests =control of Type 1 error • Resampling

More Related