500 likes | 636 Views
Comparing Two Proportions ( p 1 vs. p 2 ). Inferential Methods. Large independent samples 1. z-test for comparing p 1 vs. p 2 2. CI for ( p 1 – p 2 ) 3. Effect size (see below) Quantifying Risk (or Benefit) 1. Relative Risk (RR) ~ tests and CI
E N D
Inferential Methods • Large independent samples 1. z-test for comparing p1 vs. p2 2. CI for (p1 – p2) 3. Effect size (see below) • Quantifying Risk (or Benefit) 1. Relative Risk (RR) ~ tests and CI 2. Odd’s Ratio (OR) ~ tests and CI 3. Number Needed to Treat (NNT) & Number Needed to Harm (NNH) • Small independent samples - Fisher’s Exact Test (use software)
Inferential Methods (cont’d) • Small dependent samples - McNemar’s test (binomial) • Large dependent samples - McNemar’s test (chi-square)
Example 1: Nutrition Education for Pregnant Teens • Research Question: “Do the pregnant teens who receive nutrition education produce a lower proportion of low birth weight babies than do pregnant teens who do not receive such instruction?” • Study: To conduct the study 314 pregnant teens were randomly assigned to receive the nutrition education and 316 pregnant teens were assigned to the non-instruction group.
Randomly assign Population 1p1 Population 2p2 Sample size = n1 Calculate: Sample size = n2 Calculate: Experimental Comparative Study Population (e.g. pregnant teens) p1vs. p2 To make inferences use: Hypothesis test, CI for difference in proportions (and possibly RR, OR, & NNT/NNH).
Test statistic for large independent samples For testing equality of the two proportions only Ho: (p1 – p2) = 0 HA: (p1 – p2) > 0 (upper-tail) (p1 – p2) < 0(lower-tail) (p1 – p2) = 0 (two-tail, use CI approach) Provided n1p1> 10 & n1q1> 10 and n2 p2>10 & n2q2>10
Test Statistic for Large Independent Samples For testing to see if difference is at least D Ho: (p1 – p2) = D HA: (p1 – p2) > D (upper-tail) (p1 – p2) < D(lower-tail) Provided n1p1> 10 & n1q1> 10 and n2 p2>10 & n2q2>10 Most important case
Confidence Interval for (p1 – p2) for Large Independent Samples Provided n1p1> 10 & n1q1> 10 n2 p2>10 & n2q2>10 The confidence interval for (p1 – p2) has a general form: z-values 90% z = 1.645 95% z = 1.960 99% z = 2.578
Effect Size for Large Independent Samples There are three main ways to quantify effect size in situations where we are comparing proportions across two populations or treatment groups. • Difference in sample proportions = (this almost might referred to as the risk difference) • Relative Risk or Risk Ratio (RR) = • Odds Ratio (OR) (see Probability ppt)
Effect Size for Large Independent Samples Which measure you use depends on the context of the experiment/study and more importantly how the data was collected. In observational studies (e.g. case-control studies) the odds ratio (OR) is primarily used because the outcome of interest is NOT random. Therefore we cannot talk about the proportion of people with the disease, we can only talk about the proportion with the risk factor. (See example later in this Powerpoint)
Example 1: Nutrition Education for Pregnant Teens Here we are interested in determining if pregnant teens who receive the nutrition education have a lower prevalence of low birth weight infants, but we are not necessarily looking for a certain size (D) for that difference. Let, pE = proportion of babies with low birth weight born to teens who underwent nutrition education. pN = proportion of babies with low birth weight born to teens who did not receive nutrition education.
Example 1: Nutrition Education for Pregnant Teens STEP 1) State Hypotheses Ho: pE = pN or equivalently (pE – pN) = 0HA: pE < pN or equivalently (pE – pN) < 0 STEP 2) Determine Test Criteria a) Choose a = .05 (we could use something else) b) From the CDC website we find that around 9% of infants born in the U.S. are classified as having low birth weights. For teen mothers that percentage is probably higher but smaller p’s require larger samples, thus we will use p = .09 to check sample size considerations. Here n1 = 314 and n2 = 316 so … n1p1 = 28, n1q1= 286, n2p2 = 28, n2q2 = 288 (i.e. samples are LARGE) THUS WE USE LARGE SAMPLE TEST FOR COMPARING POPULATION PROPORTIONS ASSUMING EQUALITY UNDER THE NULL, i.e. D = 0.
Using these results we can calculate all the necessary proportions to use in the z-test statistic shown below. Sample Proportion Calculations Example 1: Nutrition Education for Pregnant Teens STEP 3) Collect Data and Compute Test Statistic In the study, 23 of the 314 teen mothers receiving nutrition education had low birth weight babies compared to 39 of the 316 mothers in the non-instruction group.
Test Statistic Calculations Example 1: Nutrition Education for Pregnant Teens STEP 3) Collect Data and Compute Test Statistic In the study, 23 of the 314 teen mothers receiving nutrition education had low birth weight babies compared to 39 of the 316 mothers in the non-instruction group. Finally calculating the test statistic we see that difference in the sample proportions is over 2 SE’s below 0.
From standard normal table or computer P(Z < - 2.11) = .0172 Example 1: Nutrition Education for Pregnant Teens STEPS 4 & 5) Compute p-value and make decision Our observed test statistic value is z = - 2.11. To find p-value we use the fact that our test statistic has a standard normal distribution. The probability that chance variation alone would produce an observed proportion for education group this small or smaller when compared to the non-instruction group is 1.72%. Thus we have evidence to suggest that the proportion of low birth weight babies born to teen mothers in education group is smaller than that for the non-instruction group (p = .0172).
Necessary Computations Example 1: Nutrition Education for Pregnant Teens STEPS 6) Quantify Significant Effects • 95% CI for Difference in Proportions
Example 1: Nutrition Education for Pregnant Teens • 95% CI for (pE – pN) = (-.0985, -.0039)or (- 9.85%, -.39 %) One potential interpretation of CI: We estimate that the percentage of low birth weight babies born to teen mothers who participate in a nutrition education program is between .39 and 9.85 percentage points smaller than that for teen mothers who are not given this instruction.
Example 1: Nutrition Education for Pregnant Teens • 95% CI for (pE – pN) = (-.0985, -.0039) or (- 9.85%, -.39 %) Another potential interpretation of CI: For pregnant teens participating in the nutrition education program we estimate that the prevalence of low birth weight is between .39 and 9.85 percentage points smaller than that for teen mothers receiving no such education.
Relative Risk or Risk Ratio • Recall from the probability presentation that risk ratio or relative risk is defined as: • We can use this in the study of potentially beneficial treatments by computing it as follows:
Example 1: Nutrition Education for Pregnant Teens • Using Relative Risk or Risk Ratio (RR) We have, The relative risk (RR) of low birth weight associated with being in the control (non-instruction) group is given by: Relative Risk or Risk Ratio (RR) = .1234/.0732 = 1.686 i.e., teen mothers not participating in the nutrition education program have a 1.686 times higher chance of having a baby with a low birth weight. Another way we state it is that their risk of having a low birth weight baby is 68.6% higher.
Example 1: Nutrition Education for Pregnant Teens • Using Relative Risk or Risk Ratio (RR) We have, Another way to look at this is in terms of benefit associated with being in the education vs. the non-instruction (control) group. This is achieved by reciprocating the RR. “Risk Reduction” = .0732/.1234 = .5932 which constitutes a roughly 41% reduction in risk of having a low birth weight baby associated with receiving the nutrition education.
Example 1: Nutrition Education for Pregnant Teens • But wait… there is more! For situations where we are looking at a potentially beneficial treatment we can report the NNT. • NNT (Number Needed to Treat): the number of patients who need to be treated to prevent 1 adverse outcome. To find the NNT we simple compute:
Example 1: Nutrition Education for Pregnant Teens • NNT (Number Needed to Treat): the number of patients who need to be treated to prevent 1 adverse outcome. • Thus we estimate that we would need to have 20 teen mothers participate in nutrition education program to see 1 fewer baby born with a low birth weight amongst teen moms. • Note: If we reciprocate the confidence limits for a CI for the “risk difference” we obtain a 95% CI for the NNT. Here we would have, (1/.0985 , 1/.0039) = (10.15 , 256.41) So we estimate that we would need to have between 10 and 256 teen mothers participate in the program to see 1 fewer low birth weight baby with 95% confidence.
Example 1: Nutrition Education for Pregnant Teens • There is no reason to use the odds ratio (OR) here because the “disease” outcome (i.e. low birth weight), is random. We can still calculate it however. • Recall from probability presentation
Example 1: Nutrition Education for Pregnant Teens Here, So teen mothers who received no nutrition education during pregnancy have 1.78 times higher odds for having a baby with low birth weight when compared to teen mothers who did receive nutrition instruction.
Example 1: Nutrition Education for Pregnant Teens We can display data from this study in a 2 X 2 contingency table format Study Results: In the study, 23 of the 314 teen mothers receiving nutrition education had low birth weight babies compared to 39 of the 316 mothers in the non-instruction group.
Example 1: Nutrition Education for Pregnant Teens Recall from the probability presentation that the OR has an easy formula when our data are displayed in a 2 X 2 table. Easier Formula! OR = ad/bc = (39)(291)/(277)(23) = 1.78
Example 1: Nutrition Education for Pregnant Teens Whew! Let’s stop an summarize our findings to this point in a table. Perhaps we should have confidence intervals for the RR and OR as well!
The key is identifying which cell is “a” and that risk or treatment is always the row variable!!!! Confidence Intervals for Relative Risk (RR) and Odds Ratio (OR) Before we look at the computational procedures for finding these CI’s we must note that the 2 X 2 table for our data MUST BE in the format below:
Confidence Intervals for RR • Take natural log of estimated RR, ln(RR) • Compute standard error of ln(RR) • Find CI for ln(RR)
Confidence Intervals for RR • Find CI for RR by taking the antilog (ex) of the endpoints of CI for RR in log scale: LCL for RR = eL UCL for RR = eU i.e., CI for RR = (eL , eU)
Confidence Intervals for OR • Take natural log of estimated OR, ln(OR) • Compute standard error of ln(OR) • Find CI for ln(OR)
Confidence Intervals for OR • Find CI for OR by taking the antilog (ex) of the endpoints of CI for ln(OR): LCL for OR = eL UCL for OR = eU i.e., CI for OR = (eL , eU)
Hypothesis Testing for RR and OR • In general we are interested in identifying situations where the RR/OR are greater than 1 (increased risk) or less than 1 (decreased risk). • For either the null hypothesis says that the RR or OR is equal to 1 or equivalently the ln(RR) or ln(OR) is 0 because ln(1) = 0.
Hypothesis Testing for RR and OR • The test statistic in either case provided our sample sizes are “large” is • So we use the standard normal distribution to find the p-value associated with the test statistic. • Better approach in practice is to simply look at whether or not CI for RR/OR contains 1 or not, if it does not contain 1 we Reject Ho.
Example 1: Nutrition Education for Pregnant Teens • Find a 95% CI for RR • RR = .1234/.0732 = 1.68 • ln(RR) = .519 • Find SE(ln(RR)) = • 4) Find confidence limits for ln(RR) • .519+(1.96)(.251)=(.027,1.011) • 5) Take antilog (ex) of endpoints • (e.027,e1.011) = (1.027,2.748) The CI contains only values above 1, thus we conclude the lack of nutritional education is associated with increased risk and nutritional education is associated with decreased risk of low birth weight.
Example 1: Nutrition Education for Pregnant Teens • Find a 95% CI for OR • OR = (39)(291)/(277)(23) = 1.78 • ln(RR) = .577 • Find SE(ln(RR)) = • 4) Find confidence limits for ln(RR) • .577+(1.96)(.280)=(.029,1.125) • 5) Take antilog (ex) of endpoints • (e.029,e1.125) = (1.029,3.082) Again we see the CI contains only values above 1, thus we conclude the lack of nutritional education is associated with increased risk and nutritional education is associated with decreased risk of low birth weight.
Example 1: Nutrition Education for Pregnant Teens Hypothesis Tests for RR and OR (even though the CI’s were enough!) Test Statistic for RRTest Statistics for OR Both p-values are less then a=.05 therefore reject the null and conclude there is increased risk associated with being a control and hence decreased risk of low birth weight associated with the nutritional education program for pregnant teens. This agrees with our conclusion from CI’s.
Example 1: Nutrition Education for Pregnant Teens That’s it! Let’s summarize our final findings in a table. All of this is much easier to do using statistical software, e.g. JMP.
Example 1: Nutrition Education for Pregnant Teens Enter data table as shown below
Check them all Example 1: Nutrition Education for Pregnant Teens You can see the three options related to what we just discussed below:
Example 1: Nutrition Education for Pregnant Teens All the CI’s we calculated “by hand” are shown below.
Fisher’s Exact Test • When sample sizes are “small” or when it is available, one should use Fisher’s Exact Test for comparing p1 vs. p2. • The computations are tedious and finding a p-value requires special tables but it is implemented in many statistical software packages. • By default JMP will always calculate p-values for Fisher’s Exact Test when 2 X 2 contingency tables are analyzed.
Fisher’s Exact Test • The results from JMP are shown below: • The alternatives are communicated verbally along side the p-values, the one we are interested is boxed. It states that… The probability of having a baby with a normal birth weight is greater for those who in the group that received nutritional education (p = .0235). This p-value is EXACT and does not come from a normal approximation!
Preliminary Summary of Independent Sample Comparisons (p1 vs. p2) • When sample sizes are “large” one can use a z-test and CI to make inferences about (p1 – p2), otherwise use Fisher’s Exact Test. • To further quantify and discuss effect size one can use RR and OR, along with inferential methods for them. • If it makes sense for the given situation, NNT can also be calculated from (p1 – p2).
Population 1p1 Population 2p2 Sample size = n1 Calculate: Sample size = n2 Calculate: Observational Comparative Study(e.g. case-control) p1vs. p2 To make inferences use: Hypothesis test, CI for difference in proportions (and possibly RR, OR, & NNT/NNH).
Example 2: Age at 1st Pregnancy and Cervical Cancer (Case-Control Study) • In a case-control study, we sample individuals who have a “disease” of interest (cases) and individuals who do not have the “disease” (controls) and compare these two populations in terms of potential risk factors. • In this study, samples of women who have cervical cancer and women who did not have cervical cancer were independently taken. The proportions of women who had their first child at or before the age of 25 were compared for these two populations of women.
Example 2: Age at 1st Pregnancy and Cervical Cancer (Case-Control Study) In conducting the study 49 women with cervical cancer and 317 women of similar age & background without cervical cancer were sampled. The number of women having their first child at or before the age of 25 was determined for both samples.
Example 2: Age at 1st Pregnancy and Cervical Cancer (Case-Control Study) • Because the number of women with the disease was chosen by the researchers we cannot consider P(disease | risk), thus RR cannot be calculated. (RR test and CI) • We will only compare the proportion of women with the “risk factor” in each group.(z-test, Fisher’s Exact test, CI for (p1 – p2) ) • If the prevalence of the risk factor is greater for the disease group then we have evidence of an association or link between the factor and the disease.
Example 2: Cervical Cancer & Age at 1st Pregnancy The proportion of women without the risk factor is greater for the control group than for the case group (p = .0014). Women who have their first pregnancy at or before 25 years of age have 3.369 times higher odds for developing cervical cancer. Notice that risk factor presence is the Y because it is the random outcome variable and X is case-control status. Enter data into JMP like this 85.7% of those in case group had the risk factor 64.0% of those in the control group had the risk factor.