410 likes | 640 Views
Chapter 18 Cross-Tabulated Counts. In Chapter 18:. 18.1 Types of Samples 18.2 Naturalistic and Cohort Samples 18.3 Chi-Square Test of Association 18.4 Test for Trend 18.5 Case-Control Samples 18.6 Matched Pairs. §18.1 Types of Samples.
E N D
In Chapter 18: • 18.1 Types of Samples • 18.2 Naturalistic and Cohort Samples • 18.3 Chi-Square Test of Association • 18.4 Test for Trend • 18.5 Case-Control Samples • 18.6 Matched Pairs
§18.1 Types of Samples • The prior chapter considered categorical response variables with two possible outcomes • This chapter considers categorical variables with any number of possible outcomes
Types of Samples, cont. Data may be generated by: I. Naturalistic Samples. An SRS with data then cross-classified according to the explanatory variable and response variable. II. Purposive Cohort Samples. Fixed numbers of individuals selected according to the explanatory factor. III. Case-Control Samples. Fixed numbers of individuals selected according to the outcome variable.
Naturalistic Samples Take an SRS from the population; then cross-classify individuals with respect to explanatory and response variables.
Purposive Cohort Samples Select predetermined numbers of exposed and nonexposed individuals; then ascertain outcomes in individuals.
Case-Control Samples Identify individuals who are positive for the outcome (cases); then sample the population for negative (controls).
§18.2 Naturalistic and Cohort Samples • Data from a naturalistic sample are shown in this 5-by-2 table • Let us always put the explanatory variable in row of such table (for uniformity) • Totals are tallied in table margins
Marginal Distributions • For naturalistic samples (only) describe marginal distributions • These may be reported graphically or in terms of percentages • Top figure: column marginal distribution • Bottom figure: row marginal distribution
Conditional Percents • The relationship between the row variable and column variable is explored with conditional percents. Two types of conditional percents : • Row percents use in cohort and naturalistic samples (describe prevalence and incidence) • Column percents use in case-control samples
Incidence and Prevalence (Naturalistic and Cohort Samples only) • The top table demonstrates R-by-C table notation (R rows and C columns) • For naturalistic and cohort samples, row percents in column 1 represent group incidence or prevalences
Prevalences - Example This table shows prevalence by education level Example of calculation, prevalence group 1:
Relative Risks, R-by-2 Tables Let group 1 represent the least exposed group Relative risks are calculated as follows:
RRs, R-by-2 Tables, Example This table lists RR for the illustrative data Notice the downward dose-response in RRs Example of calculation
Responses with More than Two Levels of Outcome Efficacy of Echinacea. A randomized controlled clinical trial pitted echinacea vs. placebo in the treatment of upper respiratory symptoms in children. The response variable was severity of illness classified as: mild, moderate or severe. Source: JAMA 2003, 290(21), 2824-30
Echinacea, Conditional Distributions • Row percents are calculated to determine the incidence of each outcome. • Example of calculation, top right table cell (data prior slide) % severe w/echinacea = 48 / 329 × 100% = 14.6% • Conclusion: the treatment group fared slightly worse than the control group: 14.6% of treatment group experienced severe symptoms compared to 10.9% of the control group.
§18.3 Chi-Square Test of Association A. Hypotheses. H0: no association in population versus Ha: association in population B. Test statistic. C. P-value. Convert the X2stat to a P-value with a a Table E or software program.
Chi-Square Test - Example Data below reveal a negative association between smoking and education level. Let us test H0: no association in the population vs. Ha: association in the population.
Chi-Square Test, P-value • X2stat= 13.20 with 4 df • Using Table E, find the row for 4 df • Find the chi-square values in this row that bracket 13.20 • Bracketing values are 11.14 (P = .025) and 13.28 (P = .01). • Thus, .025 < P < .01 (closer to .01)
Illustrative example X2stat= 13.20 with 4 df The P-value = AUC in the tail beyond X2stat
Chi-Square By Computer Here are results for the illustrative data from WinPepi > Compare2.exe > Program F Categorical Data
Yates’ Continuity Corrected Chi-Square Statistic • Two different chi-square statistics are used in practice • Pearson’s chi-square statistic (covered) is • Yates’ continuity-corrected chi-square statistic is: • The continuity-corrected method produces smaller chi-square statistics and larger P-values. • Both chi-square are used in practice.
Chi-Square, cont. • How the chi-square works. When observed values = expected values, the chi-square statistic is 0. When the observed minus expected values gets large and evidence against H0 mounts • Avoid chi-square tests in small samples. Do not use a chi-square test when more than 20% of the cells have expected values that are less than 5.
Chi-Square, cont. 3. Supplement chi-squares with measures of association. Chi-square statistics do not measure the strength of association. Use descriptive statistics or RRs to quantify “strength”. 4. Chi-square and z tests (Ch 17) produce identical P-values. The relationship between the statistics is:
18.4 Test for Trend See pp. 431 – 436
§18.5 Case-Control Samples Case-control sampling method • Identify all cases in the population • From the same source population, randomly select a series of non-cases (controls) • Ascertain the exposure status of cases and controls • Cross-tabulate the exposure status of cases and controls This provides an efficient way to study rare outcomes
Incidence Density Sampling As cases are identified in the population; select at random one or more noncases (controls) for each case at time of occurrence. This advanced concepts allows students to see that case-control studies are a type of longitudinal “time-failure” design.
Case-Control Illustrative Example • Cases: men diagnosed with esophageal cancer • Controls: noncases selected at random from electoral lists in same region • Exposure = alcohol consumption dichotomized at 80 gms/day Interpretation: The rate ratio associated with high-alcohol consumption is about 5.6
(1– α)100% CI for the OR Note use of the natural logarithmic scale
WinPepi uses a slightly different formula than ours; the Mid-P results are similar to ours. Case-Control - Example Results from WinPepi > Compare2.exe > A.
Case-Control Studies with Multiple Levels of Exposure With an ordinal exposure, compare each exposure level to the non-exposed group (next slide):
Note dose-response relationship Case-Control, Ordinal Levels of Exposure
18.6 Matched Pairs • With matched-pair samples, each participant is carefully matched to a unique individual as part of the selection process • This technique is used to mitigate confounding by the matching factor • Both cohort and case-control samples may avail themselves of matching
Here’s the notation for matched-pair case-control data: The odds ratio associate with exposure is: The confidence interval is:
Matched Pairs - Example A matched case-control study found 45 pairs in which the case but not the control had a low fruit/veg diet; it found 24 pairs in which the control but not the case had a low fruit/veg diet The odds ratio suggests 88% higher risk in low fruit/veg consumers.
Matched Pair Example, cont. Data are compatible with ORs between 1.14 and 3.07 WinPepi’s PairEtc.exe program A calculates exact confidence intervals for ORs from matched-pair data. Hand calculated limits will be similar except in small samples.
Hypothesis Test, Matched Pairs A. H0: OR = 1 B. McNemar’s test statistic. C. P-values. Convert zstat to P-value with Table B or Table F If fewer than 5 discordancies are expected, use an exact binomial procedure (see text).