1 / 49

Hypothesis Testing: p-value

STAT 101 Dr. Kari Lock Morgan. Hypothesis Testing: p-value. SECTION 4.2 Randomization distribution p-value. Paul the Octopus. http://www.youtube.com/watch?v=3ESGpRUMj9E. Hypotheses. In 2008, Paul the Octopus predicted 8 World Cup games, and predicted them all correctly

dafydd
Download Presentation

Hypothesis Testing: p-value

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. STAT 101 Dr. Kari Lock Morgan Hypothesis Testing: p-value • SECTION 4.2 • Randomization distribution • p-value

  2. Paul the Octopus http://www.youtube.com/watch?v=3ESGpRUMj9E

  3. Hypotheses • In 2008, Paul the Octopus predicted 8 World Cup games, and predicted them all correctly • Is this evidence that Paul’s chance of guessing correctly, p, is really greater than 50%? • What are the null and alternative hypotheses? • H0: p ≠ 0.5, Ha: p = 0.5 • H0: p = 0.5, Ha: p ≠ 0.5 • H0: p = 0.5, Ha: p > 0.5 • H0: p > 0.5, Ha: p = 0.5

  4. Key Question • If it is very unusual, we have statistically significantevidence against the null hypothesis • Today’s Question: How do we measure how unusual a sample statistic is, if H0 is true? How unusual is it to see a sample statistic as extreme as that observed, if H0 is true?

  5. Measuring Evidence against H0 To see if a statistic provides evidence against H0, we need to see what kind of sample statistics we would observe, just by random chance, if H0 were true

  6. Paul the Octopus • We need to know what kinds of statistics we would observe just by random chance, if the null hypothesis were true • How could we figure this out??? Simulate many samples of size n = 8 with p = 0.5

  7. Simulate! • We can simulate this with a coin! • Each coin flip = a guess between two teams (Heads = correct, Tails = incorrect) • Flip a coin 8 times, count the number of heads, and calculate the sample proportion of heads • Did you get all 8 heads (correct)? (a) Yes (b) No • How extreme is Paul’s sample proportion of 1?

  8. Paul the Octopus • Based on your simulation results, for a sample size of n = 8, do you think is statistically significant? • Yes • No

  9. Randomization Distribution A randomization distribution is a collection of statistics from samples simulated assuming the null hypothesis is true • The randomization distribution shows what types of statistics would be observed, just by random chance, if the null hypothesis were true

  10. Lots of simulations! • For a better randomization distribution, we need many more simulations! www.lock5stat.com/statkey

  11. Randomization Distribution

  12. Paul the Octopus • Based on StatKey’s simulation results, for a sample size of n = 8, do you think is statistically significant? • Yes • No

  13. Key Question • A randomization distribution tells us what kinds of statistics we would see just by random chance, if the null hypothesis is true • This makes it straightforward to assess how extreme the observed statistic is! How unusual is it to see a sample statistic as extreme as that observed, if H0 is true?

  14. Randomization Distribution In a hypothesis test for H0:  = 12 vsHa:  < 12, we have a sample with n = 45 and . What do we require about the method to produce randomization samples? •  = 12 •  < 12 We need to generate randomization samples assuming the null hypothesis is true.

  15. Randomization Distribution In a hypothesis test for H0:  = 12 vsHa:  < 12, we have a sample with n = 45 and . Where will the randomization distribution be centered? • 10.2 • 12 • 45 • 1.8 Randomization distributions are always centered around the null hypothesized value.

  16. Randomization Distribution Center A randomization distribution is centered at the value of the parameter given in the null hypothesis. • A randomization distribution simulates samples assuming the null hypothesis is true, so

  17. Randomization Distribution In a hypothesis test for H0:  = 12 vsHa:  < 12, we have a sample with n = 45 and . What will we look for on the randomization distribution? • How extreme 10.2 is • How extreme 12 is • How extreme 45 is • What the standard error is • How many randomization samples we collected We want to see how extreme the observed statistic is.

  18. Randomization Distribution In a hypothesis test for H0: 1= 2vsHa: 1> 2, we have a sample with and . What do we require about the method to produce randomization samples? • 1 = 2 • 1 > 2 • 26, 21 We need to generate randomization samples assuming the null hypothesis is true.

  19. Randomization Distribution In a hypothesis test for H0: 1= 2vsHa: 1> 2, we have a sample with and . Where will the randomization distribution be centered? • 0 • 1 • 21 • 26 • 5 The randomization distribution is centered around the null hypothesized value, 1- 2 = 0

  20. Randomization Distribution In a hypothesis test for H0: 1= 2vsHa: 1> 2, we have a sample with and . What do we look for on the randomization distribution? • The standard error • The center point • How extreme 26 is • How extreme 21 is • How extreme 5 is We want to see how extreme the observed difference in means is.

  21. Quantifying Evidence • We need a way to quantify evidence against the null…

  22. p-value The p-value is the chance of obtaining a sample statistic as extreme (or more extreme) than the observed sample statistic, if the null hypothesis is true • The p-value can be calculated as the proportion of statistics in a randomization distribution that are as extreme (or more extreme) than the observed sample statistic

  23. p-value • Paul the Octopus: the p-value is the chance of getting all 8 out of 8 guesses correct, if p = 0.5 • What proportion of statistics in the randomization distribution are as extreme as ?

  24. 1000 Simulations p-value = 0.004 Proportion as extreme as observed statistic p-value • If Paul is just guessing, the chance of him getting all 8 correct is 0.004. observed statistic

  25. Calculating a p-value • What kinds of statistics would we get, just by random chance, if the null hypothesis were true? (randomization distribution) • What proportion of these statistics are as extreme as our original sample statistic? • (p-value)

  26. ESP p-value • For our ESP example, the p-value is the chance of getting a sample proportion as high as 0.26, from a sample of n = 98, if p = 0.2 • Simulate a randomization distribution with p= 0.2 and n = 98, and see what proportion of simulated statistics are as extreme as 0.26 • www.lock5stat.com/statkey

  27. ESP p-value • If you were all just guessing randomly, the chance of us getting a sample proportion as high as 0.26 is 0.072. p-value = 0.072 Proportion as extreme as observed statistic p-value observed statistic

  28. Randomization Distributions • p-values can be calculated by randomization distributions: • simulate samples, assuming H0 is true • calculate the statistic of interest for each sample • find the p-value as the proportion of simulated statistics as extreme as the observed statistic • Let’s do a randomization distribution for a randomized experiment…

  29. Cocaine Addiction • In a randomized experiment on treating cocaine addiction, 48 people were randomly assigned to take either Desipramine (a new drug), or Lithium (an existing drug), and then followed to see who relapsed • Question of interest: Is Desipramine better than Lithium at treating cocaine addiction?

  30. Cocaine Addiction • What are the null and alternative hypotheses? • What are the possible conclusions? pD, pL: proportion of cocaine addicts who relapse after taking Desipramine or Lithium, respectively H0: pD = pL Ha: pD < pL Reject H0; Desipramine is better than Lithium Do not reject H0: We cannot determine from these data whether Desipramine is better than Lithium

  31. R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R 1. Randomly assign units to treatment groups Desipramine Lithium R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R

  32. 2. Conduct experiment 3. Observe relapse counts in each group R = Relapse N = No Relapse 1. Randomly assign units to treatment groups Desipramine Lithium R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R N R N R R R R R R R R R R R R R N R N N N N N N N N N R R R R R R R R R R R R N N N N N N N N N N N N N N N N N N N N N N N N 10 relapse, 14 no relapse 18 relapse, 6 no relapse

  33. Measuring Evidence against H0 To see if a statistic provides evidence against H0, we need to see what kind of sample statistics we would observe, just by random chance, if H0 were true

  34. Cocaine Addiction • “by random chance” means by the random assignment to the two treatment groups • “if H0 were true” means if the two drugs were equally effective at preventing relapses (equivalently: whether a person relapses or not does not depend on which drug is taken) • Simulate what would happen just by random chance, if H0 were true…

  35. R R R R R R R R R R R R R R R R N N R R R R R R N N N N N N R R R R R R N N N N N N N N N N N N 10 relapse, 14 no relapse 18 relapse, 6 no relapse

  36. R R R R R R R R R R R R R R R R N N R R R R R R N N N N N N R R R R R R N N N N N N N N N N N N Simulate another randomization Desipramine Lithium R N R N N N N R R R R R R R N R R N N N R N R R R N N R N R R N R N N N R R R N R R R R 16 relapse, 8 no relapse 12 relapse, 12 no relapse

  37. Simulate another randomization Desipramine Lithium R R R R R R R R R R R R R N R R N N R R R R R R R R N R N R R R R R R R R N R N R R N N N N N N 17 relapse, 7 no relapse 11 relapse, 13 no relapse

  38. www.lock5stat.com/statkey Proportion as extreme as observed statistic p-value observed statistic • If the two drugs are equal regarding cocaine relapse rates, we have a 1.3% chance of seeing a difference in proportions as extreme as that observed.

  39. Death Penalty • A random sample of people were asked “Are you in favor of the death penalty for a person convicted of murder?” • Did the proportion of Americans who favor the death penalty decrease from 1980 to 2010? “Death Penalty,” Gallup, www.gallup.com

  40. Death Penalty How extreme is 0.02, if p1980 = p2010? p1980 , p2010: proportion of Americans who favor the death penalty in 1980, 2010 H0: p1980 = p2010 Ha: p1980> p2010 So the sample statistic is: StatKey

  41. Death Penalty p– value = 0.164 If proportion supporting the death penalty has not changed from 1980 to 2010, we would see differences this extreme about 16% of the time.

  42. Alternative Hypothesis • A one-sided alternative contains either > or < • A two-sidedalternative contains ≠ • The p-value is the proportion in the tail in the direction specified by Ha • For a two-sided alternative, the p-value is twice the proportion in the smallest tail

  43. p-value and Ha H0:  = 0 Ha:  > 0 Upper-tail (Right Tail) H0:  = 0 Ha:  < 0 Lower-tail (Left Tail) H0:  = 0 Ha:  ≠ 0 Two-tailed

  44. Sleep versus Caffeine • Recall the sleep versus caffeine experiment from last class • s and c are the mean number of words recalled after sleeping and after caffeine. • H0: s = c Ha: s ≠ c • Let’s find the p-value! • www.lock5stat.com/statkey Two-tailed alternative

  45. Sleep or Caffeine for Memory? www.lock5stat.com/statkey p-value = 2 × 0.022 = 0.044

  46. p-value and H0 • If the p-value is small, then a statistic as extreme as that observed would be unlikely if the null hypothesis were true, providing significant evidence against H0 • The smaller the p-value, the stronger the evidence against the null hypothesis and in favor of the alternative

  47. p-value and H0 The smaller the p-value, the stronger the evidence against Ho. The smaller the p-value, the stronger the evidence against Ho. The smaller the p-value, the stronger the evidence against Ho.

  48. Summary • The randomization distribution shows what types of statistics would be observed, just by random chance, if the null hypothesis were true • A p-value is the chance of getting a statistic as extreme as that observed, if H0 is true • A p-value can be calculated as the proportion of statistics in the randomization distribution as extreme as the observed sample statistic • The smaller the p-value, the greater the evidence against H0

  49. To Do • Read Section 4.2 • Project 1 proposal (due Wednesday, 2/19)

More Related