1 / 52

Sample Size and Power Calculations

Sample Size and Power Calculations. Andy Avins, MD, MPH Kaiser Permanente Division of Research University of California, San Francisco. Why do Power Calculations?. Underpowered studies have a high chance of missing a real and important result Risk of misinterpretation is still very high

gaylord
Download Presentation

Sample Size and Power Calculations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sample Size and Power Calculations Andy Avins, MD, MPH Kaiser Permanente Division of Research University of California, San Francisco

  2. Why do Power Calculations? • Underpowered studies have a high chance of missing a real and important result • Risk of misinterpretation is still very high • Overpowered studies waste precious research resources and may delay getting the answer [BTW: “Sample Size Calculations” ≡ “(Statistical) Power Calculations”]

  3. A Few Little Secrets • Why, yes, power calculations are basically crap • Nothing more than educated guesses • Playing games with power calculations has a long and glorious history • So why do them? • You got a better idea? • Review committees don’t give you a choice • Can be enlightening, even if imperfect

  4. Power Calculations • Purpose • Try to rationally guess optimal number of participants to recruit (Goldilocks principle) • Understand sample size implications of alternative study designs

  5. The Problem of Uncertainty • If everyone responded the same way to treatment, we’d only need to study two people (one intervention, one control) • Uncertainty creeps in when: • When we draw a sample from a population • When we allocate participants to different comparison groups (random or not)

  6. Thought Experiment • We have 400 participants • Randomly allocate them to 2 groups • Test if Group 1 tx does better than Group 2 tx • Throw all participants back in the pool, re-allocate them, and repeat the experiment • Truth: Group 1 tx IS better than Group 2 tx • We know this

  7. Thought Experiment • Clinical Trial Run #1: 1>2 [Correct] • Clinical Trial Run #2: 1>2 [Correct] • Clinical Trial Run #3: 1=2 [Incorrect] • Clinical Trial Run #4: 1>2 [Correct] • Clinical Trial Run #5: 1=2 [Incorrect] • Clinical Trial Run #6: 2>1 [Incorrect] • Clinical Trial Run #7: 1>2 [Correct] • Clinical Trial Run #8: 1>2 [Correct] • ………

  8. Thought Experiment • Repeated the thought experiment an infinite number of times • Result: • 70% of runs show 1>2 (correct result) • 30% of runs show 1=2 or 1<2 (incorrect result)

  9. Thought Experiment • POWER of doing this clinical trial with 400 participants is 70% • Any ONE run of this clinical trial (with 400 participants) has about a 70% chance of producing the correct result • Any ONE run of this clinical trial (with 400 participants) has about a 30% chance of producing the wrong result

  10. Bottom Line • If you only have a 70% chance of showing what you want to show (and you only have $$ for 400 participants): • Should you bother doing the study??

  11. Power Calculations • Sample size calculations are all about making educated guesses to help ensure that our study: A) Has a sufficiently high chance of finding an effect when one exists B) Is not “over-powered” and wasteful

  12. Error Terminology • Two types of statistical errors: • “Type I Error” ≡ “Alpha Error” • Probability of finding a statistically significant effect when the truth is that there is no effect • “Type II Error” ≡ “Beta Error” • Probability of not finding a statistically significant effect when one really does exist • Goal is to minimize both types of errors

  13. Error Terminology Truth of Association Reject Ho when we shouldn't (this is fixed at 5%) Type I Error (α) Correct Observed Association Type II Error (β) Correct Don't reject Ho when we should (not fixed; this is a function of sample size)

  14. Hypothesis Testing • Power calculations are based on the principles of hypothesis testing • Research question ≠ hypothesis • Device for forcing you to be explicit about what you want to show

  15. Hypothesis Testing: Mechanics 1) Define the “Null Hypothesis” (Ho) • Generally Ho = “no effect” or “no association” • Assume it’s true • Basically, a straw man (set it up so we can knock it down) • reductio ad absurdum in geometry Example: There is no difference in the risk of stroke between statin-treated participants and placebo-treated participants.

  16. Hypothesis Testing: Mechanics • 2) Define the “Alternative Hypothesis” (HA) • Finding of some effect • Can be one-sided or two-sided • One-sided: better/greater/more or worse/less • Two-sided: different • Which to choose? • One-sided: biologically impossible for other possibility, don’t care about other possibility (careful!) • Easier to get “statistical significance” with one-sided HA • When in doubt, choose a two-sided HA • Example: Statin treatment results in a different risk of stroke compared to placebo treatment

  17. Hypothesis Testing: Mechanics 3) Define a decision rule • Virtually always: reject the null hypothesis if p<.05 • This cutpoint (.05) = “alpha level”

  18. Hypothesis Testing: Mechanics 4) Calculate the “p-value” • Assume that Ho is true • Do the study / gather the data • Calculate the probability that we’d see (by chance) something at least as extreme as what we observed IF Ho was true

  19. Hypothesis Testing: Mechanics 5) Apply the decision rule: • If p < cutpoint (.05), REJECT Ho • i.e., we assert that there is an effect • If p> cutpoint (.05), DO NOT REJECT Ho • i.e., we do not assert that there is an effect • Note: this is different from asserting that there is no effect (you can never prove the null)

  20. Terminology Review • Null and Alternative Hypotheses • One-sided and Two-Sided HA • Type I Error (α) • Type II Error (β) • Power: 1 - β • p – value • Effect Size

  21. Wald Test

  22. Standard Deviation

  23. The Normal Curve Probability

  24. Ingredients Needed to Calculate the Sample Size • Need to know / decide: • Effect Size: d • SD(d) • Standardized Effect Size: d / SD(d) • Cutpoint for our decision rule • Power we want for the study • What statistical test we’re going to use • We can use all this information to calculate our optimal sample size

  25. Where the Pieces Come From: d • d is the “Effect Size” • d should be set as the “minimum clinically important difference” (MCID) • This is smallest difference between groups that you would care about • Sources • Colleagues (accepted in clinical community) • Literature (what others have done) • Smaller differences require larger sample sizes to detect (danger: don’t fool yourself)

  26. Where the Pieces Come From: SD(d) • Standard deviation of d • Generally, based on prior research • Literature (often not stated); can derive • Contact other investigators • +/- pilot study

  27. Where the Pieces Come From: Cutpoint • Easy: written in stone (Thanks, RA Fisher) • Alpha = 0.05 • Need to state if one-sided or two-sided

  28. Where the Pieces Come From: Power • Higher is better • You decide • Rules of thumb: • 80% is minimum reviewers will accept • Rarely need to propose >90% • Greater power requires larger samples

  29. Outcome Variable Dichotomous Continuous Chi-Square t-test Dichotomous Predictor Variable t-test Correlation Coefficient Continuous Where the Pieces Come From: Statistical Test • A function of data types you will analyze

  30. Finally, Some Good News • Someone else has done these calculations for us • “Sample Size Tables” • DCR, Appendix 6A – 6E (pp. 84 – 91) • Entire books • Power Analysis Software • PASS, nQuery, Power & Precision, etc • Websites (Google search)

  31. Real-Life Example (Steroids for Acute Herniated Lumbar Disc) • Ho: There is no difference in the change in the ODI scores between two treatment groups. • Alpha: 0.0471 (two-tailed) • Beta: 0.1 (Power=90%) • Clinically relevant difference in ODI change scores: 7.0 • Standard deviation of change in ODI scores: 15.1 • Randomization ratio: 1:1 • Statistical test on which calculations are based: Student’s t-test • Number of participants required to demonstrate statistical significance = 101 per group; Total number required (two arms) = 202 • Number of participants required after accounting for 20% withdrawals = 244 • Based on a projected accrual rate of 8-10 participants per month, we anticipate that we will require approximately 2.25 years to fully recruit to this trial.

  32. Examples from DCR

  33. Sample Size for a Continuous Outcome

  34. Sample Size for a Continuous Outcome

  35. Sample Size for a Continuous Outcome

  36. Sample Size for a Continuous Outcome

  37. How do we REALLY do this?

  38. PASS Output

  39. Talk about playing games…

  40. Talk about playing games...

  41. Talk about playing games…

  42. Power Calculations for a Descriptive Study • Goal: estimate a single quantity • Power: determines the precision of the estimate (i.e., the width of the 95% CI) • Greater power = better precision = narrower CI

  43. Sample Size for a Descriptive Study (Proportion)

  44. Sample Size for a Descriptive Study (Proportion)

  45. Sample Size for a Descriptive Study (Proportion)

  46. Sample Size for a Descriptive Study (Proportion)

More Related