620 likes | 749 Views
Chapter 24. Two-Way Tables and the Chi-square Test. Thought Question 1.
E N D
Chapter 24 Two-Way Tables and the Chi-square Test Chapter 24
Thought Question 1 A random sample of registered voters were asked whether they preferred balancing the budget or cutting taxes. Each was then categorized as being either a Democrat or a Republican. Of the 30 Democrats, 12 preferred cutting taxes, while of the 40 Republicans, 24 preferred cutting taxes. How would you display the data in a table? Chapter 24
Categorical Variables • In this chapter we will study the relationship between two categorical variables(variables whose values fall in groups or categories). • To analyze categorical data, use the counts or percents of individuals that fall into various categories. Chapter 24
Two-Way Table • When there are two categorical variables, the data are summarized in a two-way table • each row represents a value of the row variable • each column represents a value of the column variable • The number of observations falling into each combination of categories is entered into each cell of the table • Relationships between categorical variables are described by calculating appropriate percents from the counts given in the table • prevents misleading comparisons due to unequal sample sizes for different groups Chapter 24
Case Study Helped Pick Up Pencils?Which is More Likely:Females or Males? Statistical Methods for Psychology, 3rd edition, D. C. Howell, 1992, Belmont, CA: Duxbury Press, p. 154. Chapter 24
Case StudyDrop the Pencils A handful of pencils were accidentally dropped, so it appeared, by the researcher in an elevator in the presence of either a female subject or a male subject. The subject’s response was observed: did the subject help pick up the pencils or not? Chapter 24
Case StudyThe Question The question was whether the males or females who observed this mishap would be more likely to help pick up the pencils. • Explanatory variable: gender • Response variable: “pick up” action (Y/N) Categorical Data Chapter 24
Case StudyDisplay the Results: Contingency (Two-Way) Table Chapter 24
Case StudyDisplay the Results: Percentages Chapter 24
Case StudyStatistical Significance Is the difference between the percentages for males vs. females statistically significant? One of the following must be true: Percentages are really the same in population; observed difference is due to chance. - or - Percentages are really different in population; observed difference reflects this. Chapter 24
Assessing Statistical Significancefor a Two-Way Table • Strength of the relationship • measured by the difference in the sample percentages • Much easier to rule out chance with large samples Chapter 24
The Chi-Square statistic measures the magnitude of the difference in the sample percentages, incorporating sample size in its calculation Measuring the Difference with the Chi-Square Statistic • If percentages in the population are the same, then the Chi-square tends to be small(near 0) • If percentages in the population are different, then the Chi-square tends to be large Chapter 24
Make the Decision:Is the relationship statistically significant? • “Critical value” for 22 tables = 3.84 • If the chi-square value (for 22 tables) is larger than 3.84, then the relationship is considered to be statistically significant. • Note: Z = square root of the chi-square.(for 22 tables) • Critical value 3.84 is (1.96)2 [ (~2)2 ] *Note that the procedure given here is specifically for 22 tables (2 rows and 2 columns); the general procedure for any two-way table (with any number of rows and columns) is given later in this chapter (see slide 32) Chapter 24
Case StudyStatistical Significance • Is the difference between the percentages for males vs. females statistically significant? • Chi-square statistic = 8.65 • Since our chi-square is 8.65 > 3.84, we conclude there is a statistically significant relationship between gender and helping to pick up the pencils. Chapter 24
Case Study Thought Question 1 Agenda versus Political Party Chi-square = 2.75(significant?) Chapter 24
Case Study • Quitting Smoking with Nicotine Patches • (JAMA, Feb. 23, 1994, pp. 595-600) • Two Categorical Variables: • Explanatory: Treatment assignment • Nicotine patch • Control patch • Response: Still smoking after 8 weeks? • Yes • No Chapter 24
Case StudyDisplay the Results: Contingency (Two-Way) Table Chapter 24
Case StudyStatistically Significant Relationship? • Chi-square = 19.2 • There is a statistically significant relationship between the type of patch used and the cessation of smoking for at least 8 weeks. Chapter 24
Case StudyPopular Ad: Seldane-D Allergy Tablets Time, 27 March 1995, p. 18 • Double-blind study of side effects • Seldane-D: 374 subjects • 27 (7.2%) reporteddrowsiness • 347 did not • Placebo: 193 subjects • 22 (11.4%) reporteddrowsiness • 171 did not Chapter 24
Case StudySummaries • Chi-square = 2.58 • Baseline risk of drowsiness using placebo = 11.4% • Risk of drowsiness using Seldane-D= 7.2% • Relative risk of drowsiness using Seldane-D versus placebo = 0.63 Chapter 24
Case StudyConclusion • Randomized, controlled experiment • Statistically insignificant relationship between use of Seldane-D allergy tablets and presence of drowsiness. • Evidence does not support that Seldane-D causes drowsiness in some people. Chapter 24
demo A Caution About Sample Size, Statistical Significance, and Chi-Square • The effect of sample size on the chi-square statistic when the table percentages stay the same: • For example, if n=850 (instead of 567) and all percentages remain the same, then the chi-square would be 3.87 (instead of 2.58); would the conclusion change? Chapter 24
Inference for Relative Risk • Confidence Intervals • if two risks are the same, the relative risk is 1 • see if confidence interval contains 1 • Hypothesis Tests • to test if two individual risks are equal, test to see if the relative risk is 1 • use the chi-square value to find the P-value Chapter 24
Case Study Relationship between breast cancer and induced abortion Daling, et. al., (1994) “Risk of breast cancer among young women: relationship to induced abortion.” Journal of the National Cancer Institute, Vol. 86, No. 21, pp. 1584-1592. Is the risk of breast cancer among women who have had an induced abortion different from the risk among those who have not? Chapter 24
Case Study: Sample Relationship between breast cancer and induced abortion • 845 breast cancer cases were identified in Washington State from 1983 to 1990. • 910 control women were identified using random-digit dialing in the same area. • Women born prior to 1944 were excluded. Chapter 24
Case Study: C.I. Results Relationship between breast cancer and induced abortion • The relative risk for breast cancer was 1.5, with the higher risk for women who had an induced abortion. • A 95% confidence interval for the relative risk was 1.2 to 1.9. (given) • Note the confidence interval does not contain the value one( risks are different) Chapter 24
Case Study: C.I. Results Relationship between breast cancer and induced abortion • No increased risk was found for women who had spontaneous abortions; the relative risk was 0.9. • A 95% confidence interval for the relative risk was 0.7 to 1.2. (given) • Note the confidence interval does contain the value one( risks are not different) Chapter 24
Case Study (continued) Relationship between breast cancer and induced abortion Daling, et. al., (1994) “Risk of breast cancer among young women: relationship to induced abortion.” Journal of the National Cancer Institute, Vol. 86, No. 21, pp. 1584-1592. Is the risk of breast cancer among women who have had an induced abortion different from the risk among those who have not? Chapter 24
Case Study: The Hypotheses • Null: The risk of developing breast cancer for women who have had an induced abortion is the same as the risk for women who have not had an induced abortion.[RR= 1] • Alt: The risk of developing breast cancer for women who have had an induced abortion is different from the risk for women who have not had an induced abortion. [RR¹ 1] Chapter 24
Case Study: Test Statistic and P-value • Relative Risk = 1.5 • Could also display data in a 22 table and compute the chi-square value (9.75). • The P-value (we will not compute this one, just take it from the study) is 0.002. • Recall: Z = square root of the chi-square.(for 22 tables) Chapter 24
Case Study: Decision • Since the P-value is small, we reject chance as the reason for the relative risk (1.5) being different from 1.0. • We find the result to be statistically significant. • We reject the null hypothesis. The data provide evidence that the two population risks (of developing breast cancer) are not the same. Chapter 24
Two-Way Table: General Procedure • The remainder of this chapter presents the general procedure for determining if a significant relationship exists between two categorical variables with any number of levels • how to analyze two-way tables with any number of rows and columns • results apply to special case of 22 tables Chapter 24
Case Study Health Care: Canada and U.S. Mark, D. B. et al., “Use of medical resources and quality of life after acute myocardial infarction in Canada and the United States,” New England Journal of Medicine, 331 (1994), pp. 1130-1135. Data from patients’ own assessment of their quality of life relative to what it had been before their heart attack (data from patients who survived at least a year) Chapter 24
Case Study Health Care: Canada and U.S. Chapter 24
Case Study Health Care: Canada and U.S. Compare the Canadian group to the U.S. group in terms of feeling much better: We have that 75 Canadians reported feeling much better, compared to 541 Americans. The groups appear greatly different, but look at the group totals. Chapter 24
Case Study Health Care: Canada and U.S. Compare the Canadian group to the U.S. group in terms of feeling much better: Change the counts to percents Now, with a fairer comparison using percents, the groups appear very similar in terms of feeling much better. Chapter 24
Case Study Health Care: Canada and U.S. Is there a relationship between the explanatory variable (Country) and the response variable (Quality of life)? For each level of the explanatory variable (Country), look at the percents across all levels of the response variable (Quality of life). Conclude that a relationship exists if these distributions look significantly different. Chapter 24
Hypothesis Test • In tests for two categorical variables, we are interested in whether a relationship observed in a single sample reflects a real relationship in the population. • Hypotheses: • Null: the percentages for one variable are the same for every level of the other variable(No real relationship). • Alt: the percentages for one variable vary over levels of the other variable. (Is a real relationship). Chapter 24
Case Study Health Care: Canada and U.S. Null hypothesis: The percentages for one variable are the same for every level of the other variable.(No real relationship). For example, could look at differences in percentages between Canada and U.S. for each level of “Quality of life”: 24% vs. 25% for those who felt ‘Much better’, 23% vs. 23% for ‘Somewhat better’, etc. * Want to do all of these comparisons as one overall test… Chapter 24
Hypothesis Test • H0: no real relationship between the two categorical variables that make up the rows and columns of a two-way table • To test H0, compare the observed counts in the table (the original data) with the expected counts (the counts we would expect if H0 were true) • if the observed counts are far from the expected counts, that is evidence against H0 in favor of a real relationship between the two variables Chapter 24
Expected Counts • The expected count in any cell of a two-way table (when H0 is true) is Chapter 24
Case Study Health Care: Canada and U.S. For the observed data to the right, find the expected value for each cell: For the expected count of Canadians who feel ‘Much better’ (expected count for Row 1, Column 1): Chapter 24
Compare to see if the data support the null hypothesis Case Study Health Care: Canada and U.S. Observed counts: Expected counts: Chapter 24
Chi-Square Statistic • To determine if the differences between the observed counts and expected counts are statistically significant (to show a real relationship between the two categorical variables), we use the chi-square statistic: where the sum is over all cells in the table. Chapter 24
Chi-Square Statistic • The chi-square statistic is a measure of the distance of the observed counts from the expected counts • is always zero or positive • is only zero when the observed counts are exactly equal to the expected counts • large values of X2 are evidence against H0 because these would show that the observed counts are far from what would be expected if H0 were true • the chi-square test is one-sided (any violation of H0 produces a large value of X2) Chapter 24
Case Study Health Care: Canada and U.S. Observed counts Expected counts Chapter 24
Chi-Square Test • Calculate value of chi-square statistic • by hand (cumbersome) • using technology (computer software, etc.) • Find P-value in order to reject or fail to reject H0 • use chi-square table for chi-square distribution (next few slides) • from computer output • If significant relationship exists (small P-value): • compare appropriate percents in data table • compare individual observed and expected cell counts • look at individual terms in the chi-square statistic Chapter 24
Case Study Health Care: Canada and U.S. Using Technology: Chapter 24
Chi-Square Distributions • Family of distributions that take only positive values and are skewed to the right • Specific chi-square distribution is specified by giving its degrees of freedom (formula on next slide) Chapter 24
Chi-Square Distributions • Chi-square test for a two-way table withr rows and c columns uses critical values from a chi-square distribution with(r 1)(c 1) degrees of freedom • P-value is the area to the right of X2 under the density curve of the chi-square distribution • use chi-square table Chapter 24