More Contingency Tables & Paired Categorical Data

More Contingency Tables &Paired Categorical Data Lecture 8

A Larger Contingency Table • A 4-by-2 contingency table. • (Made-up data filled into empty cells from last class.)

Estimated Distributions • The Conditional Distributions are the distributions of the response within each level of the predictor. • For example, • No Exercise: 79/217=.364 experienced cold/flu 138/217=.636 didn’t • Light Exercise: 96/222=.432, 126/222=.568 • Etc. • The Marginal Distribution is the distribution of the responses if we ignore information about the predictor. Colds/flu: 320/750 = .427 No cold/flu: 430/750 = .573

To Summarize Distributions in a Table

Expected Values Under the Null The approximate values are due to round-off error in the estimated probabilities. Note that we avoided some round-off error by calculating 92.59 directly from the totals as 217*320/750.

Test Statistic and Sampling Distribution • A test of independence of the two variables (Exercise Level and Cold/Flu) will be carried out using a chi-square test statistic with (r-1)(c-1)=(4-1)(2-1)=3 degrees of freedom. • The test statistic is calculated as

Hypothesis Test • Assumptions Random Independent Sample Groups collected independently “Large Sample” • Hypotheses H0: conditional distributions equal HA: conditional distributions not all equal • Test Statistic Chi-square = 6.69 compared to chi-square dist’n with 3 d.f.

Hypothesis Test, cont. • P-value/Rejection Region Critical Values are 7.815 for.05 significance, 7.407 for .06, 7.060 for .07 and 6.251 for .10. Since 6.69 < 7.815, we fail to reject at the 0.05 level. The p-value is between .07 and .10. • Conclusion At the type 1 error rate of .05, we fail to reject the null hypothesis. There is not enough evidence to say that the probability of whether or not someone gets a cold depends on the exercise level.

Matched Categorical Data • Data may be matched/paired with respect to the risk factor or the response • Matching on risk factor (not directly discussed in text) • Differences of proportions, relative risks, and odds ratios are all appropriate. The formulas and the set-up of the contingency table will be different. We will focus on odds ratios, which will be calculated in the same way as for the matched case-control study. • Matching on response (Matched Case-Control Study) • Only the odds ratio is an appropriate measure of the association between the risk factor and the response. • In both cases, inference focuses on the pair.

A Matched Case-Control Study on CAD • Each of 59 adults with Coronary Artery Disease (CAD) were matched with an adult who did not develop CAD but was of the same gender, age, ethnicity, and socio-economic status. • Of interest was whether drinking 2 or more glasses of red wine (on average) per week was associated with development of CAD.

Table for Matched Case-Control Data • Do NOT use the standard contingency table that summarizes information about theindividual subjects. • Instead, use the following table to summarize information about the pairs.

Physician Adherence Study- Matching on Predictor • Suppose that investigators were interested in whether a particular educational intervention had an effect on whether physicians prescribe a particular treatment plan for their asthma patients. • 75 physicians are rated on whether they prescribe the treatment plan both before and after the educational intervention.

Estimation and Inference for Matched Categorical Data • CANNOT use formulas for CI of odds ratio given before because the two groups of subjects (whether “exposure” groups or case/control groups) are not chosen independently. • Inferences will be based on the discordant pairs, that is, the pairs in which the members “disagree” • on the predictor variable for case-control studies • on the response variable when subjects are matched with respect to the predictor

Labeling Cell (Pair) Counts & Estimation of Odds Ratio • Odds ratio is estimated as R/S Interpretation: The odds that a person in group 2 is “exposed” is R/S times the odds that a group 1 member is “exposed.” Or: The odds that an “exposed” person is in group 2 is R/S times the odds that an “unexposed” person is in group

CI for Odds Ratio • The 95% confidence interval for the (natural) log of the odds ratio is

CAD Example – Odds Ratio • There are more pairs in which a case drinks less than 2 and a control drinks more than 2 than pairs in which a case drinks more than 2 and a control drinks less than 2. Thus, >=2 has a “protective effect. ” • The odds ratio is 14/10=1.4 • The odds of someone who has at least two drinks per week not developing CAD is 1.4 times the odds of someone • The odds of developing CAD for those who drink less than two drinks per week are 1.4 times the odds for someone who drinks more than 2 drinks per week.

CAD Eg. – CI for Odds Ratio • The 95% CI for the log of the OR is log(1.4) +/- 1.96*sqrt(1/14 + 1/10) = (-.475, 1.148) • 95% CI for OR is (.622, 3.152) • With 95% confidence, the odds of developing CAD for those who drink less than two drinks per week are between .622 and 3.152 times the odds for someone who drinks more than 2 drinks per week. • This interval includes 1, therefore, the effect of drinking at least two drinks per week is not a significant effect! • However, the interval is very wide, so…

Physician Intervention: Odds Ratio • Note that there are more pairs in which the physician prescribes the treatment plan after the intervention but not before than in which the physician prescribes the treatment plan before but not after. • The odds ratio is calculated as 25/12=2.083 • The odds that a physician will prescribe the treatment plan after the intervention are 2.083 times the odds that a physician will prescribe it before the intervention.

Physicians – CI for Odds Ratio • The 95% CI for log of the odds ratio is ln(2.083) +/- 1.96*sqrt(1/25 + 1/12) = (.046, 1.422) • The 95% CI for the odds ratio is (1.047, 4.145) There is a significant effect of the intervention since 1 is not included in the interval.

Hypothesis Testing in Matched Designs • Again, the test involves comparing the discordant pairs. • In particular, if the predictor and response are independent, one would expect the population proportion of each type of discordant pairs to be equal. If there is inequality in the sample, is it possible that the inequality is just due to chance?

Hypothesis Test – The Steps • Assumptions • Random, independent selection of pairs • Large Sample (R+S > 10) • Hypotheses H0: Predictor and Response are independent variables HA: Predictor and Response are associated • Test Statistic With Yates’ continuity correction, • P-value: Compare to chi-square dist’n with 1 d.f. • Conclusion: per usual

CAD – Hypothesis Test • Assumptions • Random, independent selection of pairs • Large Sample (R+S=24 > 10) • Hypotheses H0: Drinking and CAD are independent variables HA: Drinking and CAD are associated • Test Statistic (14-10)2/(14+10) = 16/24 = .667 • P-value: Table A5.7: p-value is between .4386 and .4028. • Conclusion: Insufficient evidence to reject the null that says that Drinking is not associated with CAD.

Physician – Hypothesis Test • Assumptions • Random, independent selection of pairs • Large Sample (R+S = 37 > 10) • Hypotheses H0: Participation in intervention and prescription of treatment plan are independent variables HA: Participation in intervention and prescription of treatment plan are associated • Test Statistic(25-12)2/(25+12) = 4.568 • P-value: between .0339 and .0320. • Conclusion: At the 0.05 significance level, reject the null in favor of the alternative that the intervention does have an effect on whether physicians prescribe the treatment plan.

Homework • Textbook Reading • Chapter 29, first two sections • Repeat Chapter 9 (has info about OR for paired case-control studies) • (Last time: Chapter 8, Chapter 26) • When doing calculations for this class, you may ignore the Yates’ continuity correction. • Homework Problems

More Contingency Tables & Paired Categorical Data