CHAPTER 22

CHAPTER 22 INFERENCE FOR TWO PROPORTIONS

Comparing Two Proportions • In a two-sample problem, the groups we want to compare are Population 1 and Population 2. Comparing two populations: • We compare populations by doing inference about the difference, p1 - p2 , between the population proportions. The statistic that estimates this difference is the difference between the two sample proportions,

Sampling Distribution of p1-p2 • Assumptions: is an unbiased estimator of p1 - p2 • The difference in the proportion of approximates p1 - p2, that is, the difference of sample proportions is an unbiased estimator of the difference of population proportions. • The variance of is provided that the sample proportions are independent. • For large samples, the distribution of is approximately normal.

C. I. for p1-p2 (comparing two proportions) • SRS of size n1, from a population with proportion p1 of success, and SRS of size n2, from a population with proportion p2 of success. • When n1 and n2 are large, level C confidence interval for p1 - p2 is • In this formula thestandard error SEof is

Significance test for Comparing Two Proportions • To test the hypothesis H0: p1 – p2 = p0, most times it will be H0: p1 – p2 = 0 Calculate z statistic, first find the pooled sample proportion, We always pool when we use SE for significance tests of two proportions!!! Why? Since we assume that p1 = p2, there should be only one p-hat, well which one do we use? The pooled proportion!

Alternative Hypothesis • In terms of a variable Z having the standard normal distribution, the P-value for a test of H0 against: • HA: p1 – p2 > 0 is P(Z ≥ z) • HA: p1 – p2 < 0 is P(Z ≤ z) • HA: p1 - p2 ≠ 0 is 2P(Z ≥ |z|) • Earlier conditions have to be met!

Assumptions and Conditions • Independence Assumption: Within each group the data should be based on results for independent individuals. We can’t check that for certain, but we can check the following: • Randomization condition: Data in each group should be drawn independently and at random from a homogeneous population or generated by a randomized comparative experiment. An SRS is ideal. • 10% condition: The samples should not be more than 10% of their respective populations. This is important so we get data that are sampled without replacement. • Each group should be independent of one another.

Assumptions and Conditions • Normality Assumption: Make sure that each sample is Normally distributed • Success/Failure condition: Both samples are big enough that the expected frequencies have at least 10 success and 10 failures have been observed in each.

Assumptions and Conditions • Normality Assumption: Make sure that each sample is Normally distributed • Success/Failure condition: Often it’s easier just to check the observed number of successes and failures. If they are all greater than 10, then we’re good to go.

Mnemonics for Memory • We still use the following acronym for Hypothesis Tests: • Parameter • Hypotheses • Assumptions • Name the inference • Test Statistics – (z-score, etc.) • Obtain a p-value • Make a decision • State the conclusion in context of the problem

Example 1 • Would being part of a support group that meets regularly help people who are wearing the nicotine patch actually quit smoking? A county health department tries an experiment using several hundred volunteers who were planning to use the patch to help them quit smoking. The subjects were randomly divided into two groups. People in Group 1 were given the patch and attended a weekly discussion meeting with counselors and others trying to quit. People in Group 2 also used the patch but did not participate in the counseling groups. After six months 46 of the 143 smokers in Group 1 and 30 of the 151 smokers in Group 2 had successfully stopped smoking. Do these results suggest that such support groups could be an effective way to help people stop smoking?

Example 1 • Step 1: Identify population Parameter, state the null and alternative Hypotheses, determine what you are trying to do (and determine what the question is asking). • We want to know whether there is a DIFFERENCE in the proportions of smokers wearing the patch from those who attend a support group and those who don’t. The parameter of interest is difference in the proportion of people who quit smoking. The population is all smokers who are trying to quit wearing a patch. We assume that the proportions are the same whether they attend a support group or not. Let p1 represent the proportion of smokers who quit in Group 1 and p2 represent the smokers who quit in Group 2. There is no difference in proportions of the two groups The proportion of those who attend a support group is higher than those who don’t attend

Example 1 • Step 2: Verify theAssumptionsby checking the conditions • Independence Assumption • Randomization condition: We gathered our samples from volunteers which may be biased; if our sample is biased, our results may not be valid. • 10% condition: We can reasonably assume that we observed fewer than 10% of all smokers trying to quit in both samples. • Normality Assumption (Large Enough Sample) • Success/Failure condition: Both groups have at least 10 success and 10 failures

Example 1 • Step 3: If conditions are met, Name the inference procedure, find theTest statistic, andObtain the p-value in carrying out the inference: • Name the test: We will use a two-proportion z-test Test Statistics: Obtain the p-value:

Example 1 • Step 4: Make a decision (reject or fail to reject H0). State your conclusion in context of the problem using p-value. • The p-value is small enough, p-value = 0.008, that we reject the null hypothesis in favor of the alternative at the 0.05 alpha level. There is sufficient evidence to conclude that the true difference in proportions of smokers wearing the patch who quit is higher for those who attend a support group than those who do not attend the support group.

Example 1 (part 2) • Now that we’ve concluded that support groups are beneficial, can we convince the government (or some philanthropist) to fund it? Let’s approximate the true difference with a 95% confidence interval. Having already done a hypothesis test, we will assume that the assumptions and conditions are satisfied.

Stay calm and don’t PANIC! • We still use the following acronym for Confidence Intervals: • Parameter • Assumptions • Name the inference • Interval • Conclusion in context of the problem

Example 1 (part 2) • Construct a 95% CI (or Confidence Interval) for the difference in proportions of successfully quitting smoking using a nicotine patch with a support group versus without a support group. • First, state what you want to know in terms of the Parameter and determine what the question is asking • We want to find an interval that is likely, with 95% confidence, to contain the true difference in proportions, p1 – p2, of those trying to quit smoking with a nicotine patch in conjunction with a support group and without a support group.

Example 1 (part 2) • Construct a 95% CI (or Confidence Interval) for the difference in proportions of successfully quitting smoking using a nicotine patch with a support group versus without a support group. • Second, examine theAssumptions and check the conditions: • All conditions were checked and satisfied when we performed the hypothesis test.

Example 1 (part 2) • Construct a 95% CI (or Confidence Interval) for the difference in proportions of successfully quitting smoking using a nicotine patch with a support group versus without a support group. • Third, Name the inference, do the work, and state the Interval – since we know that we satisfy our conditions, we will have an approximately normal distribution. • We will perform a 95% Two-Proportion Z-Interval

Example 1 (part 2) • Construct a 95% CI (or Confidence Interval) for the difference in proportions of successfully quitting smoking using a nicotine patch with a support group versus without a support group. • Third, Name the inference, do the work, and state the Interval – since we know that we satisfy our conditions, we will have an approximately normal distribution. This is our Interval

Example 1 (part 2) • Construct a 95% CI (or Confidence Interval) for the difference in proportions of successfully quitting smoking using a nicotine patch with a support group versus without a support group. • Fourth, last but not least, state your Conclusion in context of the problem: • We are 95% confident that the support group program could raise the proportion of smokers who manage to quit by using the patch by between 2 and 22 percentage points. • Is that enough for us to get funding? Possibly, those who are in favor of it will look at the 22%, while those against it will look at the 2%...talk about being biased!!! • The interval is so large that it may not be very useful, how could we rectify this?

Example 2 • Consumer Reports analyzed the presence of bacteria in packages of frozen chicken between name brand and store brand packages. They randomly selected 75 name brand and 75 store brand packages from different stores and different states around the nation. They found that campylobacter contamination in 33% of the 75 name brand packages and 45% of the 75 store brand packages. Does this indicate that shoppers would be safer in buying the name brand product?

Example 2 • Step 1: Identify population Parameter, state the null and alternative Hypotheses, determine what you are trying to do (and determine what the question is asking). • We want to know whether there is a DIFFERENCE in the proportions of campylobacter contamination between name brand and store brand products. The parameter of interest is difference in the proportion of campylobacter contamination. The population is all frozen chicken. We assume that the proportions are the same whether a name brand or not. Let N = Name Brand and S = Store Brand There is no difference in proportions of contaminated chicken. The proportion of contamination in the name brand is less than the store brand.

Example 2 • Step 2: Verify theAssumptionsby checking the conditions • Independence Assumption • Randomization condition: We are told that bothsamples were randomly selected. • 10% condition: We can reasonably assume that we observed fewer than 10% of all frozen chickens in the US from both samples. • Normality Assumption (Large Enough Sample) • Success/Failure condition: Both samples have at least 10 success and 10 failures

Example 2 • Step 4: Make a decision (reject or fail to reject H0). State your conclusion in context of the problem using p-value. • The p-value, p-value ≈ 0.066, is NOT small enough to reject the null hypothesis in favor of the alternative at the 0.05 alpha level. We fail to reject the null hypothesis and conclude that there is insufficient evidence to determine that the true difference in proportions of contamination in name brand and store brand packages of frozen chicken is different.

Example 3 • Colton Joint Unified School District (CJUSD) was named one of the four AP Districts of the Year in 2011. Data released by the College Board, stated that 210 out of 500 AP exams were passed with a score of 3 or higher in CJUSD in 2011. In the same year, Bonita Unified School District (BUSD) recorded 647 out of 1451 passed tests with a score of 3 or higher. Even though BUSD has not been named AP District of the Year, Mr. Kim thinks that BUSD still performs better than CJUSD. Is there evidence to support Mr. Kim’s claim? Assume that 2011 is a random sample of all years in which AP exams are taken.

Example 3 • Step 1: Identify population Parameter, state the null and alternative Hypotheses, determine what you are trying to do (and determine what the question is asking). • We want to know whether there is a DIFFERENCE in the proportions of students passing AP exams in CJUSD and BUSD. The parameter of interest is difference in the proportion of AP exams that were passed. The population is all AP exams taken in CJUSD and BUSD in all year. We assume that the proportions are the same in CJUSD and BUSD. Let C = CJUSD and B = BUSD There is no difference in proportions of AP exams that are passed in CJUSD and BUSD. The proportion of AP exams that are passed in CJUSD is lower than BUSD.

Example 3 • Step 2: Verify theAssumptionsby checking the conditions • Independence Assumption • Randomization condition: We are told to assume that 2011 is a randomsample of all years. Therefore, we can assume that BOTH samples are randomly selected • 10% condition: We can reasonably assume that we observed fewer than 10% of all AP exams taken in both CJUSD and BUSD. • Normality Assumption (Large Enough Sample) • Success/Failure condition: Both samples have at least 10 success and 10 failures

Example 3 • Step 4: Make a decision (reject or fail to reject H0). State your conclusion in context of the problem using p-value. • The high p-value of 0.156 is so high that we fail to reject the null hypothesis in favor of the alternative at the 0.05 alpha level. There is insufficient evidence to conclude that there is a difference in the true proportions of AP exams that are passed by CJUSD and BUSD. In other words, there is not enough evidence to support Mr. Kim’s claim. • Does this mean that Mr. Kim is wrong?

Assignment

CHAPTER 22

CHAPTER 22

Presentation Transcript

Chapter 22

Chapter 22

Chapter 22

Chapter 22

Chapter 22

Chapter 22

Chapter 22

Chapter 22

Chapter 22

Chapter 22

Chapter 22

Chapter 22

Chapter 22

Chapter 22

Chapter 22

Chapter 22

Chapter 22

Chapter 22

Chapter 22

Chapter 22

Chapter 22

Chapter 22