980 likes | 1.18k Views
How big should my study be? The science and art of choosing your sample size. Mark Pletcher Designing Clinical Research Summer 2013. Choosing sample size. A fundamental decision A critical determinate of statistical power A critical determinate of feasibility. Choosing sample size.
E N D
How big should my study be?The science and art of choosing your sample size Mark Pletcher Designing Clinical Research Summer 2013
Choosing sample size • A fundamental decision • A critical determinate of statistical power • A critical determinate of feasibility
Choosing sample size • “Nothing focuses the mind like a sample size calculation” • Mike Kohn
Choosing sample size • Ingredients for a sample size calculation • “Focusing the mind” on measurements, etc • Tools for making the calculation • Tables in the book, Stata, online calculators • Examples • What drives sample size? • Modifying study design to reduce sample size • Getting to a final answer for your study • Round peg/square hole? MAKE IT FIT! • Unknown assumptions? GUESS! • Persuasive writing and justification
Example 1 • Alcohol and atrial fibrillation incidence As an example, we might wish to assess alcohol as a predictor of incident atrial fibrillation. Assuming 20% of the cohort will drink 2 or more alcoholic beverages daily, we estimate that 2920 participants (584 drinking 2+/day) with full data and longitudinal follow-up over 5 years would provide 90% power to detect a 5% difference (15% vs. 10% in controls) in the incidence of AF using a two-tailed alpha of 0.05.
Example 1 • Alcohol and atrial fibrillation incidence As an example, we might wish to assess alcohol as a predictor of incident atrial fibrillation. Assuming 20% of the cohort will drink 2 or more alcoholic beverages daily, we estimate that 2920 participants (584 drinking 2+/day) with full data and longitudinal follow-up over 5 years would provide 90% power to detect a 5% difference (15% vs. 10% in controls) in the incidence of AF using a two-tailed alpha of 0.05.
Example 1 (boiled down…) • If………..[assumptions] • Then……a sample size of 2920 will give us a 90% chance of ending up with a “statistically significant” result
Example 1 (boiled down…) • If………..[assumptions] • Then……a sample size of 2920 will give us a 90% chance of ending up with a “statistically significant” result What are the key assumptions?
Key assumptions • Assumptions (aka “ingredients”) • Testable hypothesis • Clear measurements • Usually phrased as a “null” hypothesis • Planned statistical test • Assumption about variability of measurements • An effect size • “Alpha” error (1-sided or 2-sided) threshold
Key assumptions • Assumptions (aka “ingredients”) • Testable hypothesis “Does alcohol cause atrial fibrillation?”
Key assumptions • Assumptions (aka “ingredients”) • Testable hypothesis “Does alcohol cause atrial fibrillation?” Too vague!
Key assumptions • Assumptions (aka “ingredients”) • Testable hypothesis “Does alcohol cause atrial fibrillation?” “Is drinking 2+ drinks/day (vs. drinking less) associated with incident atrial fibrillation at 5 years in adults over age 65?”
Key assumptions • Assumptions (aka “ingredients”) • Testable hypothesis “Does alcohol cause atrial fibrillation?” “Is drinking 2+ drinks/day (vs. drinking less) associated with incident atrial fibrillation at 5 years in adults over age 65? Better, but not phrased as a “null” hypothesis
Key assumptions • Assumptions (aka “ingredients”) • Testable hypothesis “Does alcohol cause atrial fibrillation?” “Is drinking 2+ drinks/day (vs. drinking less) associated with incident atrial fibrillation at 5 years in adults over age 65? “H0: There is no association between drinking 2+ drinks/day (vs. drinking less) and incident atrial fibrillation at 5 years in adults over age 65”
The Null Hypothesis… • Why do we need a NULL hypothesis?
The Null Hypothesis… • Why do we need a NULL hypothesis? • Theoretically speaking, we can only DISPROVE something (or say it’s unlikely), we can never PROVE something* • So we state a NULL hypothesis, and then say that it is very unlikely to be true “H0: There is no association between drinking 2+ drinks/day (vs. drinking less) and incident atrial fibrillation at 5 years in adults over age 65” *Karl Popper, The Logic of Scientific Discovery, 1934
Key assumptions • Assumptions (aka “ingredients”) • Testable hypothesis • Clear measurements • Usually phrased as a “null” hypothesis • Planned statistical test • Assumption about variability of measurements • An effect size • “Alpha” error (1-sided or 2-sided) threshold
Key assumptions • Assumptions (aka “ingredients”) • Planned statistical test PREDICTOR OUTCOMEDichotomous Continuous Dichotomous chi-squared t-test Continuous t-test correlation
Key assumptions • Assumptions (aka “ingredients”) • Planned statistical test PREDICTOR OUTCOMEDichotomous Continuous Dichotomous chi-squared t-test Continuous t-test correlation Need to know your variable types!
Key assumptions • Assumptions (aka “ingredients”) • Planned statistical test Dichotomous variables have only 2 values. Male vs. female Dead vs. alive Hypertension vs. no hypertension Smoker or non-smoker
Key assumptions • Assumptions (aka “ingredients”) • Planned statistical test Continuous variables have many values Blood pressure Age Quality of life Waist circumference
Key assumptions • Assumptions (aka “ingredients”) • Planned statistical test What kind of variable is alcohol use?
Key assumptions • Assumptions (aka “ingredients”) • Planned statistical test What kind of variable is alcohol use? Drinks/day Drinker vs. non-drinker Heavy (2+) vs. light drinker (<2 drinks/day) Non-drinker vs. occasional vs. regular vs. heavy
Key assumptions • Assumptions (aka “ingredients”) • Planned statistical test What kind of variable is alcohol use? Drinks/day Drinker vs. non-drinker Heavy (2+) vs. light drinker (<2 drinks/day) Non-drinker vs. occasional vs. regular vs. heavy Not normally distributed?
Key assumptions • Assumptions (aka “ingredients”) • Planned statistical test What kind of variable is alcohol use? Drinks/day Drinker vs. non-drinker Heavy (2+) vs. light drinker (<2 drinks/day) Non-drinker vs. occasional vs. regular vs. heavy 4-level categorical variable?
Key assumptions • Assumptions (aka “ingredients”) • Planned statistical test What kind of variable is alcohol use? Drinks/day Drinker vs. non-drinker Heavy (2+) vs. light drinker (<2 drinks/day) Non-drinker vs. occasional vs. regular vs. heavy Easy! For the purposes of sample size calculation, you may want to dichotomize…
Key assumptions • Assumptions (aka “ingredients”) • Planned statistical test What kind of variable is atrial fibrillation? Person with vs. without afib Frequency of episodes Beats/minute Years to onset of afib (“time to event”) Proportion onset of afib at 5 years
Key assumptions • Assumptions (aka “ingredients”) • Planned statistical test What kind of variable is atrial fibrillation? Person with vs. without afib Frequency of episodes Beats/minute Years to onset of afib (“time to event”) Proportion onset of afib at 5 years Normally distributed?
Key assumptions • Assumptions (aka “ingredients”) • Planned statistical test What kind of variable is atrial fibrillation? Person with vs. without afib Frequency of episodes Beats/minute Years to onset of afib (“time to event”) Proportion onset of afib at 5 years “Survival analysis”
Key assumptions • Assumptions (aka “ingredients”) • Planned statistical test What kind of variable is atrial fibrillation? Person with vs. without afib Frequency of episodes Beats/minute Years to onset of afib (“time to event”) Proportion onset of afib at 5 years Dichotomous (easy)
Key assumptions • Assumptions (aka “ingredients”) • Planned statistical test PREDICTOR OUTCOMEDichotomous Continuous Dichotomous chi-squared t-test Continuous t-test correlation “H0: There is no association between drinking 2+ drinks/day (vs. drinking less) and incident atrial fibrillation at 5 years in adults over age 65”
Key assumptions • Assumptions (aka “ingredients”) • Testable hypothesis • Clear measurements • Usually phrased as a “null” hypothesis • Planned statistical test • Assumption about variability of measurements • An effect size • “Alpha” error (1-sided or 2-sided) threshold
Key assumptions • Assumptions (aka “ingredients”) • Variability and effect size for chi-squared test Probability of outcome in each predictor group P1 = 10% P2 = 15%
Key assumptions • Assumptions (aka “ingredients”) • Variability and effect size for chi-squared test Probability of outcome in each predictor group P1 = 10% (prob afib at 5 years if <2 drinks) P2 = 15% (prob afib at 5 years if 2+ drinks)
Key assumptions • Assumptions (aka “ingredients”) • Variability and effect size for chi-squared test Probability of outcome in each predictor group P1 = 10% (prob afib at 5 years if <2 drinks) P2 = 15% (prob afib at 5 years if 2+ drinks) Effect size clearly delineated: Risk difference = 5%; relative risk = 1.5
Key assumptions • Assumptions (aka “ingredients”) • Variability and effect size for chi-squared test Probability of outcome in each predictor group P1 = 10% (prob afib at 5 years if <2 drinks) P2 = 15% (prob afib at 5 years if 2+ drinks) Variability is “embedded”…varies with P1…
Key assumptions • Assumptions (aka “ingredients”) • Variability and effect size for chi-squared test Probability of outcome in each predictor group P1 = 10% (prob afib at 5 years if <2 drinks) P2 = 15% (prob afib at 5 years if 2+ drinks) Bottom line: Giving both probabilities is clear and unambiguous (…wait for counter-examples)
Key assumptions • Assumptions (aka “ingredients”) • Testable hypothesis • Clear measurements • Usually phrased as a “null” hypothesis • Planned statistical test • Assumption about variability of measurements • An effect size • “Alpha” error (1-sided or 2-sided) threshold
Key assumptions • Assumptions (aka “ingredients”) • “Alpha” error (1-sided or 2-sided) threshold Standard p-value threshold: 0.05 (“Type I error” rate = “alpha”)
Key assumptions • Assumptions (aka “ingredients”) • “Alpha” error (1-sided or 2-sided) threshold Standard p-value threshold: 0.05 (“Type I error” rate = “alpha”) Standard choice: 2-sided test
Key assumptions • Assumptions (aka “ingredients”) • “Alpha” error (1-sided or 2-sided) threshold Standard p-value threshold: 0.05 (“Type I error” rate = “alpha”) Standard choice: 2-sided test Unless uninterested in a large effect in the opposite direction as you expect, choose 2-sided - the clear, safe choice almost always
Key assumptions • Assumptions (aka “ingredients”) • “Alpha” error (1-sided or 2-sided) threshold Standard p-value threshold: 0.05 (“Type I error” rate = “alpha”) Standard choice: 2-sided test Power = 1- “beta” error (so 90% power = 10% beta error)
Example 1 • H0: There is no association between drinking 2+ drinks/day (vs. drinking less) and incident atrial fibrillation at 5 years in adults over age 65 • 2 dichotomous variables chi-squared test • P1 = 10% • P2 = 15% • 2-sided alpha = 0.05, beta = .10
Example 1 • H0: There is no association between drinking 2+ drinks/day (vs. drinking less) and incident atrial fibrillation at 5 years in adults over age 65 • 2 dichotomous variables chi-squared test • P1 = 10% • P2 = 15% • 2-sided alpha = 0.05, beta = .10 Go to page 75 of DCR (4th edition)…
Example 1 • H0: There is no association between drinking 2+ drinks/day (vs. drinking less) and incident atrial fibrillation at 5 years in adults over age 65 • 2 dichotomous variables chi-squared test • P1 = 10% • P2 = 15% • 2-sided alpha = 0.05, beta = .10 Go to page 75 of DCR (4th edition)… Sample size = 958 PER GROUP = 1916 total
Example 1 • H0: There is no association between drinking 2+ drinks/day (vs. drinking less) and incident atrial fibrillation at 5 years in adults over age 65 • 2 dichotomous variables chi-squared test • P1 = 15% • P2 = 20% Risk diff = 5% • 2-sided alpha = 0.05, beta = .10 Go to page 86 of DCR (3rd edition)… Sample size = 1252 x 2 = 2504 total
Example 1 • H0: There is no association between drinking 2+ drinks/day (vs. drinking less) and incident atrial fibrillation at 5 years in adults over age 65 • 2 dichotomous variables chi-squared test • P1 = 20% • P2 = 25% Risk diff = 5% • 2-sided alpha = 0.05, beta = .10 Go to page 86 of DCR (3rd edition)… Sample size = 1504 x 2 = 3008 total
Example 1 • H0: There is no association between drinking 2+ drinks/day (vs. drinking less) and incident atrial fibrillation at 5 years in adults over age 65 • 2 dichotomous variables chi-squared test • P1 = 20% • P2 = 30% RR = 1.5 • 2-sided alpha = 0.05, beta = .10 Go to page 86 of DCR (3rd edition)… Sample size = 412 x 2 = 824 total
Example 1 • H0: There is no association between drinking 2+ drinks/day (vs. drinking less) and incident atrial fibrillation at 5 years in adults over age 65 • 2 dichotomous variables chi-squared test • P1 = 20% • P2 = 30% RR = 1.5 • 2-sided alpha = 0.05, beta = .10 Go to page 86 of DCR (3rd edition)… Sample size = 412 x 2 = 824 total Not enough to specify an effect size of “5%” or “RR = 1.5” – need to give both probabilities
Back to our paragraph… As an example, we might wish to assess alcohol as a predictor of incident atrial fibrillation. Assuming 20% of the cohort will drink 2 or more alcoholic beverages daily, we estimate that 2920 participants (584 drinking 2+/day) with full data and longitudinal follow-up over 5 years would provide 90% power to detect a 5% difference (15% vs. 10% in controls) in the incidence of AF using a two-tailed alpha of 0.05.