1 / 27

Lecture 4

Lecture 4. An RPG approach to hypothesis testing. Everbody , roll a D20 for an implausibility check. Modifier + dice roll > 19. Guess the modifier If it quacks like a duck … . Suppose we want to know whether a character has a 0 modifier for a trait checked with D20=20.

nami
Download Presentation

Lecture 4

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 4 An RPG approach to hypothesis testing

  2. Everbody, roll a D20 for an implausibility check

  3. Modifier + dice roll > 19

  4. Guess the modifierIf it quacks like a duck … • Suppose we want to know whether a character has a 0 modifier for a trait checked with D20=20. • Note if the check is passed. • If passed, assume the modifier is greater than 0. • If fail, assume the modifier is greater than 0.

  5. A problem • Note that characters with very small modifiers will probably fail the test. This is called a Type 2 error. • So the test works best if the character has a large modifier. • A non-significant result does not “prove” that the character has a 0 modifier.

  6. Power • A test is powerful if it rejects with high probability when the null hypothesis is false. • Power is defined against particular alternatives. • The modifier test is powerful against the alternative that the modifier is 16 • The modifier test is weak against the alternative that the modifier is 4.

  7. Gaining power • Increase the sample size • Use a powerful test (technical stats issue) • Refine the study design to reduce variance

  8. Some problems with NHST Multiple testing

  9. Multiple testing • If I roll the dice often enough, I will pass the implausibility check • This applies to hypothesis testing • Repeated tests on the same data set, within the same study, may yield a spurious “significant” result • This is called a type 1 error

  10. My recommendation • It is best to save the hypothesis test for the primary outcome • Use confidence intervals and effect sizes for secondary outcomes

  11. Other possibilities • Adjust the alpha level of significance to take into account the fact that many tests are being made.

  12. Some problems with NHST The p-value is not as informative as one might think

  13. What is p (the p-value)?

  14. The correct answer • The correct answer is c) • The p-value is the probability of getting something at least as extreme as what one got, assuming that the null hypothesis is true.

  15. p-value and sample size • The p-value is a function of the sample size • If the null is false (even by a small amount) a large sample size will yield a small p-value • A large study will almost undoubtedly yield a significant result, even when nothing interesting is happening. • A small study will almost undoubtedly yield a non-significant result, even when the intervention is effective.

  16. abuses of NHST • Fishing expeditions • No clear hypothesis • Many measurements of interest • Measurements with high degree of variability, uncertain distributions • Convenience samples • Cult-like adherence to • In the presence of electronic computers, very large data bases are available for analysis. • Alternatively, underpowered studies • Relying on the statistician to come up with the research question

  17. A possible solution • Quote estimate and confidence interval and/or • Quote an effect size Cohen’s d: • The effect size is independent of the sample size. • Alternate suggested effect size: This statistic falls between 0 and 1. There are rules of thumb for what constitute large, medium and small effects.

  18. Problems with the effect size • The effect size is sometimes taken to represent some sort of absolute measure of meaningfulness • Measures of meaningfulness need to come from the subject matter.

  19. Advantages of the p-value • The p-value measures the strength of the evidence that you have against the null hypothesis. • The p-value is a pure number (no unit of measurement) • A common standard across all experiments using that methodology

  20. Ideal conditions for NHST • Carefully designed experiments • Everything randomized that should be randomized • One outcome of interest • No more subjects than necessary to achieve good power • Structure of measurements known to be normal (or whatever distribution is assumed by the test)

  21. The take-away Most important points to remember

  22. The p-value is a function of the sample size • it’s not a measure of truth; it’s a measure of evidence • A significant result does not prove the null hypothesis to be true • If the data are matched, analyse them as matched pairs. • The t-test is fairly tolerant of departures from normality • The t-test is sensitive to differences in variance when the sample sizes are unequal. (when in doubt, use the Welch test)

  23. vocabulary The following are equivalent • The significance level • The probability of a type 1 error The following are related • The probability of a type 2 error • The power of the test, Difference between and : • is set by the experimenter • is a consequence of the design.

  24. Pop quiz • What is the difference between the significance level of a test and the p-value of that test?

More Related