620 likes | 842 Views
Finish talking about Association Measures: Odds Ratio. OR=2 of Disease for Exposed vs. Not exposed What is the interpretation?Exposed patients have twice the odds of disease versus patients that were not exposed.". Finish talking about Association Measures: Relative Risk. RR=2.5 of Disease f
E N D
1. HSRP 734: Advanced Statistical MethodsMay 29, 2008
2. Finish talking about Association Measures: Odds Ratio OR=2 of Disease for Exposed vs. Not exposed
What is the interpretation?
“Exposed patients have twice the odds of disease versus patients that were not exposed.”
3. Finish talking about Association Measures:Relative Risk RR=2.5 of Disease for Exposed vs. Not exposed
What is the interpretation?
“Exposed patients are 2.5 times as likely to have the disease versus patients that were not exposed.”
4. Finish talking about Association Measures OR is not close to RR
Unless Pr(disease) for Exposed, Not exposed low
“Rare” disease
5. Finish talking about Association Measures Confidence intervals for Odds Ratio
Confidence intervals for Relative Risk
6. Measures of Disease Association
7. Confidence Interval for Odds Ratio
8. Confidence Interval for Odds Ratio
9. Confidence Interval for Risk Ratio
10. Confidence Interval for Risk Ratio
11. SAS Enterprise:or_rr.sas7bdat
12. SAS websites Online help:
http://support.sas.com/onlinedoc/913/docMainpage.jsp
UCLA: http://www.ats.ucla.edu/stat/SAS/
SAS SUGI:
http://support.sas.com/events/sasglobalforum/previous/index.html
13. Categorical Data Analysis Understand the Multinomial probability mass function
Compute Goodness-of-fit tests and chi-squared tests for association
Test for association in the presence of a possibly confounding third factor
(e.g., disease versus exposure from 3 sites)
14. Categorical Data Analysis Motivation
How do we estimate and test the magnitude of posited relationship when the outcome of interest is categorical?
e.g., An international study examines the relationship between age at first birth and the development of breast cancer
Age = categorized into age groups
15. Categorical Data Analysis
16. Categorical Data Analysis Research question
Is there a relationship between age at first birth and Cancer status?
Better to convert the table into percentages (easier to see)
Turns out that there is a significant relationship (p<0.001)
17. Categorical Data Analysis
Statistical techniques involve
Probability distribution for categorical data
Tests for relationship in a RxC table
R = # of Rows in Table
C = # of Columns in Table
18. Probability Distribution for Categorical Outcomes Fun for Friday night:
Go home and flip a quarter 10,000 times. Determine if there is evidence that one side is falling down more.
19. Probability Distributions for Categorical Data Bernoulli (1 toss of a coin, outcome=H,T)
Binomial (10 tosses of a coin, outcome=0,1,2..,10 heads)
Multinomial (throw 10 balls into 4 pigeon holes ABCD, outcome= (3A,2B,1C,4D))
20. Why use multinomial for testing? Relationship between 2 categorical variables
RxC table analysis
Based on multinomial distribution
21. Why use multinomial for testing? Example:
2 level exposure status (Exposed, Not exposed),
3 level outcome (severe, mild, no disease)
Treat 2x3=6 outcomes as categorical or a multinomial distribution with 6 pigeon holes
The expected probability of the pigeon holes are specified under some kind of assumptions (e.g., independence)
22. Level of Measurement Categorical response
dichotomous
ordinal (>2 categories, ordered)
nominal (>2 categories, not ordered)
Dichotomous use Binomial distribution
Ordinal, Nominal use Multinomial distribution
23. Multinomial Distribution Multinomial experiment:
Experiment consists of n identical and independent trials
Each trial results in one of K outcomes
Let pi be the probability of outcome i
a. Each pi remains constant for each experiment
b.
The pmf for k outcomes is:
Notes:
24. Example of a Multinomial Experiment Consider an unfair die and 6 tosses:
Let
Find the probability of this outcome
25. Simple Multinomial Experiments Classical example: Mendel
Sample from the second generation of seeds resulting from crossing yellow round peas and green wrinkled peas (N=556)
26. Mendel’s Laws of Inheritance suggest that we should expect the following ratios:
9/16, 3/16, 3/16, 1/16
For N = 556, the expected number of each outcome is:
E(YR) = 556 x 9/16 = 312.75
E(YW) = 556 x 3/16 = 104.25
E(GR) = 556 x 3/16 = 104.25
E(GW) = 556 x 1/16 = 34.75
28. Multinomial distribution The observed cell counts are not identical to the expected cell counts
Under the assumption of a multinomial model with the stated probabilities, how might we determine how unlikely it is to observe these data?
29. Chi-square GOF Test Hypothesis: observed cell counts are consistent with the multinomial probabilities
Theoretical result
Require that expected cell counts not too small
Expected counts > 5.
30. Chi-square distribution Remarks about Chi-squared distribution:
Nonsymmetric
Strictly positive
Different chi-squared distribution for each df.
32. Chi-square GOF Test Applying this test to Mendel’s peas example yields
H0: pYR = 9/16, pYW = 3/16, pGR = 3/16, pGW = 1/16
H1: at least one pi differs from hypothesized value
33. Chi-square GOF Test
34. Chi-square GOF Test Therefore, we observed c2 = 0.47 from a multinomial experiment with k = 4. Thus, df = k-1 = 3.
For a = 0.05,
Thus, the observed chi-squared statistic is not greater than the critical value for a = 0.05 and df = 3.
We fail to find evidence that these data depart from the hypothesized probabilities. i.e., model fits well to data
35. Testing association in 2x2 table This method translates to testing cross-tabulation tables for RxC cases
Here the cells are formed by cross-classification of 2 variables
Null hypothesis is the 2 variables are independent
Simplest case : 2x2 table
36. Testing association in 2x2 table Testing for independence or no association
Similar idea to checking goodness-of-fit
Compare what to see to what you hypothesized to be true
You did, in fact, hypothesize “independence”
37. Basic Inference for 2x2 Tables 2x2 Contingency Table
38. Chi-square GOF Test for 2x2 Tables H0: There is no association between row and columns
Under H0, the expected cell counts are the product of the marginal probabilities and the sample size. Why?
The classic Pearson’s chi-squared test of independence
df = (2-1) x (2-1) = 1
Conservatively, we require EXPECTEDij = 5 for all i, j
39. Other Tests for 2x2 Tables
Two alternative tests
Yate’s continuity corrected chi-square statistic
Mantel-Haenszel chi-square statistic
For sufficiently large sample size, all three Chi-squared statistics are approximately equal and all have a Chi-squared distribution with 1 df
40. When to use Chi-square vs. Fisher’s Exact
When the expected cell counts are less than 5, it is better to use the Fisher’s exact test.
41. Summary of the Use of ?2 test Test of goodness-of-fit
Determine whether or not a sample of observed values of some random variable is compatible with the hypothesis that the sample was drawn from a population with a specified distributional form (e.g., specified probabilities of certain events)
42. Summary of the Use of ?2 test
Test of independence
Test the null hypothesis that two criteria of classification (variables) are independent
43. Summary of the Use of ?2 test
Test of homogeneity
Test the null hypothesis that the samples are drawn from populations that are homogeneous with respect to some factor (i.e., no association between group and factor)
44. Summary of the Use of ?2 test
Could consider this test as answering:
“Are the Row factor and Column factor associated?”
45. Categorical Data Analysis Ideas of multinomial and chi-squared test generalize to testing RxC association and RxCxK association
Example:
2 exposure status, 2 disease status, 3 sites
2x2x3 association analysis
46. Test of General Association (R x C Table) Consider a study designed to test whether there exists an association between political party affiliation and residency within specific counties
47. Notation for general RxC table
48. Test of General Association H0: There is no association between rows and columns
H1: There exists a dependence between rows and columns
Under H0,the expected cell counts are the product of the corresponding marginal probabilities and the sample size.
The classic Pearson’s chi squared test of independence
49. SAS Enterprise:chisq.sas7bdat
50. Mantel-Haenszel test Often, there are other factors in a RxC test
Mantel-Haenszel test (or Cochran Mantel Haenzsel CMH) can be used for controlling for “nuisance” factors
Typically used for rxcx2 table
e.g., 2x2x2 cross classification
e.g., Association between disease status and exposure controlling for age group (strata)
51. Stratified Analysis Examples of commonly used strata
Age group
Gender
Study site (hospital, country)
ethnic group
52. Stratified Analysis Myocardial infarction and anticoagulant use by Coronary Care Unit
53. Stratified Analysis Idea: test for an association while controlling for CCU effects
Denote the counts from the first cell within the hth subtable as nh11,
Construct the CMH test of association controlling for CCU
54. Stratified Analysis Test assumes the direction of effect within each table is the same
The Cochran-Mantel-Haenszel approach partially removes the confounding influences of the explanatory variable (e.g., CCU)
May improve power
55. Mantel-Haenszel Test The expected value of nh11 for h = 1,2,…,g is
and the variance of nh11
This leads to the Cochrane-Mantel-Haenszel test
56. Direction of effects across Strata Note that if directions of conditional ORs are not the same, discrepancies between observed and expected from different strata may cancel out one another
Lead to poor power and biased result
57. MH “Pooled” Odds Ratio
58. MH test decision list Z = strata of potential confounder
-> If ORc ˜ (ORZ=1 ˜ ORZ=2 ˜…) Z is not a confounder, report crude OR (ORc)
-> If ORc ? (ORZ=1 ˜ ORZ=2 ˜…) Z is a confounder, report adjusted OR (ORMH)
-> If ORZ=1 ? ORZ=2 ? … Z is an effect modifier, report strata specific OR’s (don’t adjust!)
59. Breslow Day test (More formal approach) Can also test for homogeneity of odds ratio across strata
If Breslow Day test is significant => odds ratios within strata are not homogeneous. Thus, => ORMH would be inappropriate!
60. SAS Enterprise:cmh.sas7bdat
61. Results from cmh.sas7bdat ORcrude = 3.76 (2.01, 7.05)
ORcenter1 = 4.01 (1.67, 9.66)
ORcenter2 = 4.05 (1.55, 10.60)
ORMH = 4.03 (2.11, 7.71)
62. Take home messages Multinomial and the Chi-square test are the “workhorse” for testing of goodness-of-fit
Idea is to compare expected counts (calculated from a pre-determined set of probabilities) and the observed counts
The same idea can be applied to testing statistical assumptions such as no association
CMH test is for testing association when a confounding effect (strata) may be present
63. For Next Class 6/5 HW #1 key posted
HW #2 will be due
Read Kleinbaum Ch. 1,2