1 / 25

Randomization and Bootstrap Methods in the Introductory S tatistics Course

Randomization and Bootstrap Methods in the Introductory S tatistics Course. Kari Lock Morgan Robin Lock Duke University St. Lawrence University kari@stat.duke.edu rlock@stlawu.edu. Panel at 2013 Joint Mathematics Meetings San Diego, CA.

gyala
Download Presentation

Randomization and Bootstrap Methods in the Introductory S tatistics Course

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Randomization and Bootstrap Methods in the Introductory Statistics Course Kari Lock Morgan Robin Lock Duke University St. Lawrence University kari@stat.duke.edu rlock@stlawu.edu Panel at 2013 Joint Mathematics Meetings San Diego, CA

  2. Data production (samples/experiments) • Descriptive Statistics – one and two samples • Bootstrap confidence intervals • Randomization-based hypothesis tests • Normal and t-based inference • Chi-square, ANOVA, Regression Revised Curriculum

  3. Why start with Bootstrap CI’s? • Minimal prerequisites: • Population parameter vs. sample statistic • Random sampling • Dotplot (or histogram) • Standard deviation and/or percentiles • Natural progression/question • Sample estimate ==> How accurate is the estimate? • Same method of randomization in most cases • Sample with replacement from original sample • Intervals are more useful? • A good debate for another session…

  4. What new content is needed to teach bootstrapping?

  5. Bootstrapping • Key ideas: • Sample with replacement from the original sample using the same sample size. • Compute the sample statistic. • Collect lots of such bootstrap statistics. • Use the distribution of bootstrap statistics to assess the sampling variability of the statistic. Why does this work?

  6. Sampling Distribution Population BUT, in practice we don’t see the “tree” or all of the “seeds” – we only have ONE seed µ

  7. Bootstrap Distribution What can we do with just one seed? Bootstrap “Population” Estimate the distribution and variability (SE) of ’s from the bootstraps Grow a NEW tree! µ

  8. Golden Rule of Bootstraps The bootstrap statistics are to the original statistic as the original statistic is to the population parameter.

  9. How does teaching with randomization/bootstrap methods change technology needs?

  10. Desirable Technology Features • Ability to simulate one to many samples • Visual display of results • Help students distinguish and keep straight the original data, a single simulated data set, and the distribution of simulated statistics • Allow students to interact with the bootstrap/randomization distribution • Consistent interface for different parameters, tests, and intervals

  11. StatKey www.lock5stat.com

  12. Example: Find a 95% confidence interval for the slope when using the size of bill to predict tip at a restaurant. Data: n=157 bills at First Crush Bistro (Potsdam, NY) r=0.915

  13. How does the use of randomization/bootstrap methods for statistical inference change the assessments used?

  14. Assessment with Technology • Given a question and corresponding data: • Generate and interpret a CI • Generate a p-value and make a conclusion • How is this assessment different? • Answers will vary slightly • Tip: ask students to include a screenshot of their bootstrap/randomization distribution • OR provide a distribution and ask students to label the x-axis according to their bootstrap/randomization distribution

  15. Assessment: Projects • Relatively early in the course, students can do confidence intervals and hypotheses tests for many different parameters! • Can find their own data and pick their own parameter of interest

  16. Assessment: Free Response • Give students context and a picture of a bootstrap distribution and ask them to… • Explain how to generate one of the dots • Estimate the sample statistic • Estimate the standard error • Use these estimates to calculate a 95% CI • OR have them estimate a 90% (or other) CI • Interpret the CI in context

  17. Assessment: Free Response • Give students context, the sample statistic, and a picture of a randomization distribution and ask them to… • State the null and alternative hypotheses • Explain how to generate one of the dots • Estimate the p-value • Use the p-value to evaluate strength of evidence against H0 / forHa • Use the p-value to make a formal decision • Make a conclusion in context

  18. Assessment Tips • Show bootstrap or randomization distributions as dotplots with a manageable number of dots • OR have students circle relevant part of distribution

  19. Assessment: Dotplots Bootstrap distribution, 1000 statistics: • 98% CI: CI ≈ 30 to 75 Randomization distribution, 100 statistics: • stat = 69 • lower-tail test p=value = 0.02

  20. Assessment: Multiple Choice • You have sample data on weight consisting of these data values: 121, 136, 160, 185, 203 • Is each of the following a valid bootstrap sample? • 121, 121, 160, 185, 203 (a) Yes (b) No • 121, 121, 136, 160, 185, 203 (a) Yes (b) No • 121, 160, 185, 203 (a) Yes (b) No • 121, 142, 160, 185, 190 (a) Yes (b) No

  21. Assessment: Multiple Choice • If would changing the following aspects of the study or analysis, change the confidence interval: • Increase the sample size? • Increase the number of bootstrap samples? • Increase the confidence level? • (a) It would get wider • (b) It would get narrower • (c) It would stay about the same • Can do similar questions for SE, p-value, etc.

  22. Assessment: Multiple Choice • Randomizing in a randomized experiment breaks the link between • (a) explanatory and response variables • (b) explanatory and confounding variables • (c) response and confounding variables • Re-randomizing (reallocating) in a randomization test breaks the link between • (a) explanatory and response variables • (b) explanatory and confounding variables • (c) response and confounding variables

  23. Assessment: Connecting with Traditional • Once students have learned formulas for standard errors…. • Give context, summary statistics, and an unlabeled bootstrap/randomization distribution, then ask students to label at least 3 points on the x-axis

  24. Conceptual Assessment • Almost all conceptual assessment items used in the past regarding confidence intervals and hypothesis tests still work! • The concepts we want our students to understand are the same!

  25. Thanks for listening!

More Related