Chapter 11: Comparisons Involving Proportions and a Test of Independence

Chapter 11: Comparisons Involving Proportionsand a Test of Independence • Inferences About the Difference Between Two Population Proportions • Hypothesis Test for Proportions of a Multinomial Population • Test of Independence: Contingency Tables

Inferences About the Difference BetweenTwo Population Proportions • Interval Estimation of p1 - p2 • Hypothesis Tests About p1 - p2

Sampling Distribution of • Expected Value • Standard Deviation (Standard Error) where: n1 = size of sample taken from population 1 n2 = size of sample taken from population 2

If the sample sizes are large, the sampling distribution of can be approximated by a normal probability distribution. Sampling Distribution of The sample sizes are sufficiently large if all of these conditions are met: n1p1> 5 n1(1 - p1) > 5 n2p2> 5 n2(1 - p2) > 5

Sampling Distribution of p1 – p2

Interval Estimation of p1 - p2 • Interval Estimate

Interval Estimation of p1 - p2 • Example: Market Research Associates is conducting research to evaluate the effectiveness of a client’s new advertising campaign. Before the new campaign began, a telephone survey of 150 households in the test market area showed 60 households “aware” of the client’s product. The new campaign has been initiated with TV and newspaper advertisements running for three weeks.

Interval Estimation of p1 - p2 A survey conducted immediately after the new campaign showed 120 of 250 households “aware” of the client’s product. Does the data support the position that the advertising campaign has provided an increased awareness of the client’s product?

= sample proportion of households “aware” of the product after the new campaign = sample proportion of households “aware” of the product before the new campaign Point Estimator of the Difference BetweenTwo Population Proportions p1 = proportion of the population of households “aware” of the product after the new campaign p2 = proportion of the population of households “aware” of the product before the new campaign

Interval Estimation of p1 - p2 For = .05, z.025 = 1.96: .08 + 1.96(.0510) .08 + .10 Hence, the 95% confidence interval for the difference in before and after awareness of the product is -.02 to +.18.

Hypothesis Tests about p1 - p2 • Hypotheses Testing We focus on tests involving no difference between the two population proportions (i.e. p1 = p2) H0: p1 - p2< 0 Ha: p1 - p2 > 0 Left-tailed Right-tailed Two-tailed

Pooled Estimate of Standard Error of Hypothesis Tests about p1 - p2 where:

Hypothesis Tests about p1 - p2 • Test Statistic

Hypothesis Tests about p1 - p2 • Example: Market Research Associates Can we conclude, using a .05 level of significance, that the proportion of households aware of the client’s product increased after the new advertising campaign?

Hypothesis Tests about p1 - p2 H0: p1 - p2< 0 Ha: p1 - p2 > 0 1. Develop the hypotheses. p1 = proportion of the population of households “aware” of the product after the new campaign p2 = proportion of the population of households “aware” of the product before the new campaign

Hypothesis Tests about p1 - p2 a = .05 2. Specify the level of significance. 3. Compute the value of the test statistic.

Hypothesis Tests about p1 - p2 • Using the Critical Value Approach 4. Determine the critical value and rejection rule. For a = .05, z.05 = 1.645 5. Compare the Test Statistic with the Critical Value. Because 1.56 < 1.645, we cannot reject H0. • We cannot conclude that the proportion of households • aware of the client’s product increased after the new • campaign.

Hypothesis Tests about p1 - p2 • Using the p–Value Approach 4. Compute the p –value. For z = 1.56, the p–value = .0594 5. Compare the p-value with significance level. Because p–value > a = .05, we cannot reject H0. • We cannot conclude that the proportion of households • aware of the client’s product increased after the new • campaign.

Hypothesis (Goodness of Fit) Testfor Proportions of a Multinomial Population 1. Set up the null and alternative hypotheses. 2. Select a random sample and record the observed frequency, fi, for each of the k categories. 3. Assuming H0 is true, compute the expected frequency, ei, in each category by multiplying the category probability by the sample size.

Hypothesis (Goodness of Fit) Testfor Proportions of a Multinomial Population 4. Compute the value of the test statistic. where: fi = observed frequency for category i ei = expected frequency for category i k = number of categories Note: The test statistic has a chi-square distribution with k – 1 df provided that the expected frequencies are 5 or more for all categories.

Hypothesis (Goodness of Fit) Testfor Proportions of a Multinomial Population Reject H0 if 5. Rejection rule: Critical value approach: Reject H0 if p-value <a p-value approach: where  is the significance level and there are k - 1 degrees of freedom

Multinomial Distribution Goodness of Fit Test • Example: Finger Lakes Homes manufactures four models of prefabricated homes, a two-story colonial, a log cabin, a split-level, and an A-frame. To help in production planning, management would like to determine if previous customer purchases indicate that there is a preference in the style selected.

Multinomial Distribution Goodness of Fit Test The number of homes sold of each model for 100 sales over the past two years is shown below. Split- A- Model Colonial Log Level Frame # Sold 30 20 35 15

Multinomial Distribution Goodness of Fit Test • The Hypotheses H0: pC = pL = pS = pA = .25 Ha: The population proportions are not equal pC = .25, pL = .25, pS = .25, and pA = .25 where: pC = population proportion that purchase a colonial pL = population proportion that purchase a log cabin pS = population proportion that purchase a split-level pA = population proportion that purchase an A-frame

Multinomial Distribution Goodness of Fit Test • Rejection Rule Reject H0 if p-value < .05 or c2 > 7.815. With  = .05 and k - 1 = 4 - 1 = 3 degrees of freedom Do Not Reject H0 Reject H0 2 7.815

Multinomial Distribution Goodness of Fit Test • Expected Frequencies • Test Statistic • e1 = .25(100) = 25 e2 = .25(100) = 25 e3 = .25(100) = 25 e4 = .25(100) = 25 = 1 + 1 + 4 + 4 = 10

Multinomial Distribution Goodness of Fit Test • Conclusion Using the Critical Value Approach c2 = 10 > 7.815 We reject, at the .05 level of significance, the assumption that there is no home style preference.

Multinomial Distribution Goodness of Fit Test • Conclusion Using the p-Value Approach Area in Upper Tail .10 .05 .025 .01 .005 c2 Value (df = 3) 6.251 7.815 9.348 11.345 12.838 Because c2= 10 is between 9.348 and 11.345, the area in the upper tail of the distribution is between .025 and .01. The p-value <a . We can reject the null hypothesis.

Test of Independence: Contingency Tables 1. Set up the null and alternative hypotheses. 2. Select a random sample and record the observed frequency, fij, for each cell of the contingency table. 3. Compute the expected frequency, eij, for each cell.

Test of Independence: Contingency Tables Reject H0 if p -value <a or . 4. Compute the test statistic. 5. Determine the rejection rule. where  is the significance level and, with n rows and m columns, there are (n - 1)(m - 1) degrees of freedom.

Contingency Table (Independence) Test • Example Each home sold by Finger Lakes Homes can be classified according to price and to style. Finger Lakes’ manager would like to determine if the price of the home and the style of the home are independent variables.

Contingency Table (Independence) Test The number of homes sold for each model and price for the past two years is shown below. For convenience, the price of the home is listed as either $99,000 or less or more than $99,000. Price Colonial Log Split-Level A-Frame < $99,000 18 6 19 12 > $99,000 12 14 16 3

Contingency Table (Independence) Test • Hypotheses H0: Price of the home is independent of the style of the home that is purchased Ha: Price of the home is not independent of the style of the home that is purchased

Contingency Table (Independence) Test • Expected Frequencies Price Colonial Log Split-Level A-Frame Total < $99K > $99K Total 18 6 19 12 55 12 14 16 3 45 30 20 35 15 100

With  = .05 and (2 - 1)(4 - 1) = 3 d.f., Contingency Table (Independence) Test • Test Statistic = .1364 + 2.2727 + . . . + 2.0833 = 9.149 • Rejection Rule Reject H0 if p-value < .05 or 2> 7.815

Contingency Table (Independence) Test • Conclusion Using the Critical Value Approach c2 = 9.145 > 7.815 We reject, at the .05 level of significance, the assumption that the price of the home is independent of the style of home that is purchased.

Contingency Table (Independence) Test • Conclusion Using the p-Value Approach Area in Upper Tail .10 .05 .025 .01 .005 c2 Value (df = 3) 6.251 7.815 9.348 11.345 12.838 Because c2= 9.145 is between 7.815 and 9.348, the area in the upper tail of the distribution is between .05 and .025. The p-value <a . We can reject the null hypothesis.

Chapter 11: Comparisons Involving Proportions and a Test of Independence