510 likes | 524 Views
Learn about absolute and relative risks in biostatistics with examples of cohort and case-control studies. Discover the calculation and interpretation of risk ratios, difference in proportions, and chi-square tests. Explore concepts such as binomial distribution and statistical inferences.
E N D
The binomial applied: absolute and relative risks, chi-square
Probability speak (just shorthand!)… • P(X) = “the probability of event X” • P(D) = “the probability of disease” • P(E) = “the probability of exposure” • P(~D) = “the probability of not getting the disease” • P(~E)= “the probability of not being exposed” • P(D/E) = “the probability of disease given exposure” or “the probability of disease among the exposed” • P(D/~E) = “the probability of disease given unexposed” or “the probability of disease among the unexposed”
Things that follow a binomial distribution… Cohort study (or cross-sectional): • The number of exposed individuals in your sample that develop the disease • The number of unexposed individuals in your sample that develop the disease Case-control study: • The number of cases that have had the exposure • The number of controls that have had the exposure
Cohort study example: • You sample 100 smokers and 100 non-smokers and follow them for 5 years to see who develops heart disease.
Seeing it as a binomial… • The number of smokers that develop heart disease in your study follows a binomial distribution with N=100, p=pd/e • The number of non-smokers that develop heart disease in your study follows a binomial distribution with N=100, pd/~e
Smoker (E) Non-smoker (~E) Heart disease (D) 21 13 No Disease (~D) 79 87 100 100 A possible outcome:
Statistics for these data • 1. Risk ratio (relative risk) • 2. Difference in proportions (absolute risk) • 3. Chi-square test of independence • For 2x2 tables, mathematically equivalent to difference in proportions Z test.
Exposure (E) No Exposure (~E) Disease (D) a b No Disease (~D) c d a+c b+d risk to the exposed risk to the unexposed 1. Risk ratio (relative risk)
Exposure (E) No Exposure (~E) Disease (D) a b No Disease (~D) c d a+c b+d Risk of disease in the exposed risk of disease in the unexposed In probability terms…
Smoker (E) Non-smoker (~E) Heart disease (D) 21 13 No Disease (~D) 79 87 100 100 Risk ratio calculation: Interpretation: there is a 61% increase in risk of heart disease in smokers vs. nonsmokers
Inferences about risk ratio… • Is our observed risk ratio statistically different from 1.0? What is the p-value? • I’m going to present statistical inference for odds ratio; risk ratio is similar. • So, for now, just get answer from SAS: • 95% confidence interval: 0.86 to 3.04 • P-value>.05
Exposure (E) No Exposure (~E) Disease (D) a b No Disease (~D) c d a+c b+d 2. Difference in proportions
Smoker (E) Non-smoker (~E) Heart disease (D) 21 13 No Disease (~D) 79 87 100 100 Absolute, rather than relative risk difference! 2. Difference in proportions
Difference in proportions test • Null hypothesis: difference in proportions = 0 Under the null, the groups have the same risk of heart disease (=overall risk in the study): • The number of smokers that develop heart disease in your study follows a binomial distribution with N=100, p=.17 • The number of non-smokers that develop heart disease in your study follows a binomial distribution with N=100, p=.17
Follows a normal because binomial can be approximated with normal Recall, variance of a proportion is p(1-p)/n Use average (or pooled) proportion in standard error formula, because under the null hypothesis, groups have equal proportions. Difference in proportions test Null hypothesis: The difference in proportions is 0.
Z-test applied here… Corresponding two-sided p-value is .131.
If the 95% confidence interval crosses the null value (here=0), then p>.05 Corresponding 95% confidence interval…
OR, use computer simulation to make inferences… • 1. In SAS, assume infinite population of smokers and non-smokers with equal disease risk, p=.17 (UNDER THE NULL!) • 2. Use the random binomial function to randomly select n=100 smokers and n=100 non-smokers, each with p=.17 • 3. Calculate the observed difference in proportions. • 4. Repeat this 1000 times (or some large number of times). • 5. Observe the distribution of differences under the null hypothesis.
Computer Simulation Results Empirical standard error is about 5.3%
When we ran this study 1000 times, by chance, we got 72 results as big or bigger than 8%. We also got 82 results as small or smaller than –8%. P-value from our simulation…
P-value From our simulation, we estimate the p-value to be: 154/1000 or .154
Smoker (E) Non-smoker (~E) Heart disease (D) 21 13 No Disease (~D) 79 87 100 100 3. chi-square test of independence Null hypothesis: smoking and heart disease are independent
What does it mean to be “independent” in stats? Under independence, P(A&B)=P(A)*P(B) In words the “joint probability” equals the product of the “marginal probabilities.” OR The probability of both A and B happening is equal to the probability of A times the probability of B. If smoking and heart disease are independent, then P(smoker&heart disease)=P(smoker)*P(heart disease)
Smoker (E) Non-smoker (~E) Heart disease (D) 21 13 No Disease (~D) 79 87 100 100 Calculate expected counts under independence… IF smoking and heart disease are independentTHEN: P(HeartDisesae&Smoker)=P(HeartDisease)*P(Smoker) P(HeartDisease)=34/100=17% P(Smoker)=100/200=50% IF INDEPENDENT, then P(HeartDisease&Smoker) should be 8.5%; 8.5% of 200 = 17
Smoker (E) Non-smoker (~E) Heart disease (D) 17 Marginals are fixed! No Disease (~D) 100 100 Fill in the expected table… 17 34 83 83 156 Notice that the rest of the table is determined after you fill in 17 for cell A. There are no degrees of freedom left! (This table has only 1 degree of freedom).
Smoker (E) Smoker (E) Non-smoker (~E) Non-smoker (~E) Heart disease (D) Heart disease (D) 17 21 13 No Disease (~D) No Disease (~D) 79 87 Compare expected and observed counts… 17 expected 83 83 observed
2.25=1.5-squared. The chi-square test produces exactly the square of the Z-test and the same p-value. Degrees of freedom = (rows-1)*(columns-1)=(2-1)*(2-1)=1 Chi-Square test Rule of thumb: if the chi-square statistic is much greater than it’s degrees of freedom, indicates statistical significance. Here 2.25 not quite big enough—p=.131.
Bonus material: The Chi-Square distribution:is sum of squared normal deviates The expected value and variance of a chi-square: E(x)=df Var(x)=2(df)
Case-control study example: • You sample 50 stroke patients and 50 controls without stroke and ask about their smoking in the past.
Smoker (E) Non-smoker (~E) Stroke (D) 15 35 No Stroke (~D) 8 42 50 50 Possible study results:
Statistics for these data • 1. Odds ratio (relative risk) • 2. Difference in proportions exposed (absolute risk) • 3. Chi-square
Smoker (E) Non-smoker (~E) Stroke (D) 15 35 No Stroke (~D) 8 42 50 50 What’s the risk ratio here? Tricky: There is no risk ratio, because we cannot calculate the risk of disease!!
The odds ratio… • We cannot calculate a risk ratio from a case-control study. • BUT, we can calculate a measure called the odds ratio…
Odds vs. Risk 1:1 3:1 1:9 1:99 Note: An odds is always higher than its corresponding probability, unless the probability is 100%.
Exposure (E) No Exposure (~E) Disease (D) a b No Disease (~D) c d Odds of exposure in the cases The proportion of cases to controls are set by the investigator; therefore, they do not represent the risk (probability) of developing disease. Odds of exposure in the controls The Odds Ratio (OR) a+b=cases c+d=controls
Odds of disease in the exposed Odds of exposure in the cases Odds of disease in the unexposed Odds of exposure in the controls The Odds Ratio (OR) This expression is mathematically equivalent to: Backward from what we want… The direction of interest!
Odds of exposure in the cases Odds of exposure in the controls Bayes’ Rule Odds of disease in the exposed What we want! Odds of disease in the unexposed Proof via Bayes’ Rule (optional) =
Smoker (E) Non-smoker (~E) Stroke (D) 15 35 No Stroke (~D) 8 42 50 50 The odds ratio Interpretation: there is a 2.25-fold higher odds of stroke in smokers vs. non-smokers.
Inferences about the odds ratio… • Does the sampling distribution follow a normal distribution? • What is the standard error?
Simulation… • 1. In SAS, assume infinite population of cases and controls with equal proportion of smokers (exposure), p=.23 (UNDER THE NULL!) • 2. Use the random binomial function to randomly select n=50 cases and n=50 controls each with p=.23 chance of being a smoker. • 3. Calculate the observed odds ratio for the resulting 2x2 table. • 4. Repeat this 1000 times (or some large number of times). • 5. Observe the distribution of odds ratios under the null hypothesis.
Properties of the OR (simulation) (50 cases/50 controls/23% exposed) Under the null, this is the expected variability of the sample ORnote the right skew
Properties of the lnOR Normal!
Properties of the lnOR From the simulation, can get the empirical standard error (~0.5) and p-valuE (~.10)
Or, in general, standard error = Properties of the lnOR
Smoker (E) Non-smoker (~E) Stroke (D) 15 35 No Stroke (~D) 8 42 50 50 Inferences about the ln(OR) p=.10
Smoker (E) Non-smoker (~E) Stroke (D) 15 35 No Stroke (~D) 8 42 50 50 Confidence interval… Final answer: 2.25 (0.85,5.92)
Smoker (E) Non-smoker (~E) Stroke (D) 15 35 No Stroke (~D) 8 42 50 50 2. Difference in proportions exposed
Smoker (E) Non-smoker (~E) Stroke (D) 15 35 No Stroke (~D) 8 42 3. chi-square test of independence Expected count for cell A: proportion: 0.5*.23=.115 count: .115*100= 11.5
Smoker (E) Smoker (E) Non-smoker (~E) Non-smoker (~E) Stroke (D) Stroke (D) 11.5 15 35 No Stroke (~D) No Stroke (~D) 8 42 expected and observed counts… 38.5 expected 11.5 38.5 observed