760 likes | 951 Views
11.1 + 11.2. Section. 11.1. Inference about Two Population Proportions. Parallel Example 1: Distinguish between Independent and Dependent Sampling. For each of the following, determine whether the sampling method is independent or dependent.
E N D
Section 11.1 Inference about Two Population Proportions
Parallel Example 1: Distinguish between Independent and Dependent Sampling For each of the following, determine whether the sampling method is independent or dependent. • A researcher wants to know whether the price of a one night stay at a Holiday Inn Express is less than the price of a one night stay at a Red Roof Inn. She randomly selects 8 towns where the location of the hotels is close to each other and determines the price of a one night stay. • A researcher wants to know whether the “state” quarters (introduced in 1999) have a mean weight that is different from “traditional” quarters. He randomly selects 18 “state” quarters and 16 “traditional” quarters and compares their weights.
Solution • The sampling method is dependent since the 8 Holiday Inn Express hotels can be matched with one of the 8 Red Roof Inn hotels by town. • The sampling method is independent since the “state” quarters which were sampled had no bearing on which “traditional” quarters were sampled.
Sampling Distribution of the Difference between Two Proportions (Independent Sample) Suppose that a simple random sample of size n1 is taken from a population where x1 of the individuals have a specified characteristic, and a simple random sample of size n2 is independently taken from a different population where x2 of the individuals have a specified characteristic. The sampling distribution of , where and , is approximately normal, with mean and standard deviation provided that and and each sample size is no more than 5% of the population size.
Sampling Distribution of the Difference between Two Proportions The standardized version of is then written as which has an approximate standard normal distribution.
The best point estimate of p is called the pooled estimate of p, denoted , where Test statistic for Comparing Two Population Proportions
Hypothesis Test Regarding the Difference between Two Population Proportions To test hypotheses regarding two population proportions, p1 and p2, we can use the steps that follow, provided that: • the samples are independently obtained using simple random sampling, • and and • n1 ≤ 0.05N1 and n2 ≤ 0.05N2 (the sample size is no more than 5% of the population size); this requirement ensures the independence necessary for a binomial experiment.
Hypothesis Test steps: • State hypothesis, left-tail, right-tail, two-tail. • State level of significance, α. • Compute the test statistic. • Find the critical value. • Find the P-value. • Compare test-stat with CV, and α with P-value. Reject or Fail to reject. • Write a conclusion statement.
Classical Approach Step 3:Compute the test statistic where
P-Value Approach Technology Step 3:Use a statistical spreadsheet or calculator with statistical capabilities to obtain the P-value. The directions for obtaining the P-value using the TI-83/84 Plus graphing calculator, Excel, MINITAB, and StatCrunch are in the Technology Step-by-Step in the text.
Parallel Example 1: Testing Hypotheses Regarding Two Population Proportions An economist believes that the percentage of urban households with Internet access is greater than the percentage of rural households with Internet access. He obtains a random sample of 800 urban households and finds that 338 of them have Internet access. He obtains a random sample of 750 rural households and finds that 292 of them have Internet access. Test the economist’s claim at the α =0.05 level of significance.
Solution We must first verify that the requirements are satisfied: • The samples are simple random samples that were obtained independently. • x1=338, n1=800, x2=292 and n2=750, so 3. The sample sizes are less than 5% of the population size.
Solution Step 1: We want to determine whether the percentage of urban households with Internet access is greater than the percentage of rural households with Internet access. So, H0: p1 = p2 versus H1: p1 > p2 or, equivalently, H0: p1 - p2=0 versus H1: p1 - p2 > 0 Step 2: The level of significance is α= 0.05.
Solution Step 3: The pooled estimate of is: The test statistic is:
Solution: Classical Approach This is a right-tailed test with α =0.05.The critical value is z0.05=1.645.
Solution: Classical Approach Step 4: Since the test statistic, z0=1.33 is less than the critical value z.05=1.645, we fail to reject the null hypothesis.
Solution: P-Value Approach Because this is a right-tailed test, the P-value is the area under the normal to the right of the test statistic z0=1.33.That is, P-value = P(Z > 1.33) ≈ 0.09.
Solution: P-Value Approach Step 4: Since the P-value is greater than the level of significance α =0.05,we fail to reject the null hypothesis.
Solution Step 5: There is insufficient evidence at the α =0.05 level to conclude that the percentage of urban households with Internet access is greater than the percentage of rural households with Internet access.
Objective 3 • Construct and Interpret Confidence Intervals for the Difference between Two Population Proportions
Constructing a (1 – α)•100% Confidence Interval for the Difference between Two Population Proportions To construct a (1 – α)•100% confidence interval for the difference between two population proportions, the following requirements must be satisfied: • the samples are obtained independently using simple random sampling, • , and • n1 ≤ 0.05N1 and n2 ≤ 0.05N2 (the sample size is no more than 5% of the population size); this requirement ensures the independence necessary for a binomial experiment.
Constructing a (1 – α)•100% Confidence Interval for the Difference between Two Population Proportions Provided that these requirements are met,a (1 – α)•100% confidence interval for p1–p2 is given by Lower bound: Upper bound:
Parallel Example 3: Constructing a Confidence Interval for the Difference between Two Population Proportions An economist obtains a random sample of 800 urban households and finds that 338 of them have Internet access. He obtains a random sample of 750 rural households and finds that 292 of them have Internet access. Find a 99% confidence interval for the difference between the proportion of urban households that have Internet access and the proportion of rural households that have Internet access.
Solution We have already verified the requirements for constructing a confidence interval for the difference between two population proportions in the previous example. Recall
Solution Thus, Lower bound = Upper bound =
Solution We are 99% confident that the difference between the proportion of urban households that have Internet access and the proportion of rural households that have Internet access is between –0.03 and 0.10. Since the confidence interval contains 0, we are unable to conclude that the proportion of urban households with Internet access is greater than the proportion of rural households with Internet access.
Objective 4 • Test Hypotheses Regarding Two Proportions from Dependent Samples
McNemar’s Test is a test that can be used to compare two proportions with matched-pairs data (i.e., dependent samples)
Testing a Hypothesis Regarding the Difference of Two Population Proportions: Dependent Samples To test hypotheses regarding two population proportions p1 and p2, where the samples are dependent, arrange the data in a contingency table as follows:
Testing a Hypothesis Regarding the Difference of Two Population Proportions: Dependent Samples • We can use the steps that follow provided that: • the samples are dependent and are obtained • randomly and • the total number of observations where the outcomes differ must be greater than or equal to 10. That is, f12 + f21 ≥ 10.
Hypothesis Test steps: • State hypothesis, left-tail, right-tail, two-tail. • State level of significance, α. • Compute the test statistic. • Find the critical value. • Find the P-value. • Compare test-stat with CV, and α with P-value. Reject or Fail to reject. • Write a conclusion statement.
Step 1:Determine the null and alternative hypotheses. H0: the proportions between the two populations are equal (p1 = p2) H1: the proportions between the two populations differ (p1 ≠ p2)
Parallel Example 4: Analyzing the Difference of Two Proportions from Matched-Pairs Data A recent General Social Survey asked the following two questions of a random sample of 1483 adult Americans under the hypothetical scenario that the government suspected that a terrorist act was about to happen: • Do you believe the authorities should have the right to tap people’s telephone conversations? • Do you believe the authorities should have the right to detain people for as long as they want without putting them on trial?
Parallel Example 4: Analyzing the Difference of Two Proportions from Matched-Pairs Data The results of the survey are shown below: Do the proportions who agree with each scenario differ significantly? Use the α =0.05 level of significance.
Solution The sample proportion of individuals who believe that the authorities should be able to tap phones is . The sample proportion of individuals who believe that the authorities should have the right to detain people is . We want to determine whether the difference in sample proportions is due to sampling error or to the fact that the population proportions differ.
Solution The samples are dependent and were obtained randomly. The total number of individuals who agree with one scenario, but disagree with the other is 237+224=461, which is greater than 10. We can proceed with McNemar’s Test. Step 1: The hypotheses are as follows H0: the proportions between the two populations are equal (pT = pD) H1: the proportions between the two populations differ (pT ≠ pD) Step 2: The level of significance is α= 0.05.
Solution Step 3: The test statistic is:
Solution: Classical Approach The critical value with an α =0.05 level ofsignificance is z0.025 = 1.96.
Solution: Classical Approach Step 4: Since the test statistic, z0 = 0.56 is less than the critical value z.025 = 1.96, we fail to reject the null hypothesis.
Solution: P-Value Approach The P-value is two times the area under the normal curve to the right of the test statistic z0=0.56.That is, P-value = 2•P(Z > 0.56) ≈ 0.5754.
Solution: P-Value Approach Step 4: Since the P-value is greater than the level of significance α =0.05,we fail to reject the null hypothesis.
Solution Step 5: There is insufficient evidence at the α =0.05 level to conclude that there is a difference in the proportion of adult Americans who believe it is okay to phone tap versus detaining people for as long as they want without putting them on trial in the event that the government believed a terrorist plot was about to happen.
Objective 5 • Determine the Sample Size Necessary for Estimating the Difference between Two Population Proportions within a Specified Margin of Error
Sample Size for Estimating p1 – p2 The sample size required to obtain a (1 – α)•100% confidence interval with a margin of error, E, is given by rounded up to the next integer, if prior estimates of p1 and p2, , are available. If prior estimates of p1 and p2 are unavailable, the sample size is rounded up to the next integer.
Parallel Example 5: Determining Sample Size A doctor wants to estimate the difference in the proportion of 15-19 year old mothers that received prenatal care and the proportion of 30-34 year old mothers that received prenatal care. What sample size should be obtained if she wished the estimate to be within 2 percentage points with 95% confidence assuming: • she uses the results of the National Vital Statistics Report results in which 98% of the 15-19 year old mothers received prenatal care and 99.2% of 30-34 year old mothers received prenatal care. • she does not use any prior estimates.
Solution We have E=0.02 and zα/2=z0.025=1.96. • Letting , The doctor must sample 265 randomly selected 15-19 year old mothers and 265 randomly selected 30-34 year old mothers.
Solution b) Without prior estimates of p1 and p2, the sample size is The doctor must sample 4802 randomly selected 15-19 year old mothers and 4802 randomly selected 30-34 year old mothers. Note that having prior estimates of p1 and p2 reduces the number of mothers that need to be surveyed.
Section 11.2 Inference about Two Means: Dependent Samples