210 likes | 289 Views
Comparing Margins of M ultivariate B inary D ata. Bernhard Klingenberg Assoc. Prof. of Statistics Williams College, MA www.williams.edu/~bklingen. Challenges: Associations of various degrees among binary variables Simultaneous Inference
E N D
Comparing Marginsof Multivariate Binary Data Bernhard Klingenberg Assoc. Prof. of Statistics Williams College, MA www.williams.edu/~bklingen
Challenges: Associations of various degrees among binary variables Simultaneous Inference Sparse and/or unbalanced data, Test statistics with discrete support Asymptotic theory questionable Outline • Setup: • Two indep. groups • Response: Vector of k correlated binary variables (multivariate binary) • Goal: • Inference about k margins: • Marginal Risk Differences • Marginal Risk Ratios
Outline • Motivating Examples • From drug safety or animal toxicity/carcinogenicity studies Source: http://us.gsk.com/products/assets/us_advair.pdf
Source: http://www.pfizer.com/files/products/uspi_lipitor.pdf
Outline • Example: AEs from a vaccine trial (flu shot): > head(Y1) # ACTIVE Treatment n1=1971 ID HEADACHE PAIN MYALGIA ARTHRALGIA MALAISE FATIGUE CHILLS 2 1 1 1 1 1 1 1 4 0 1 1 0 0 1 0 5 1 0 0 0 0 0 0 6 1 1 1 1 1 1 1 7 0 0 0 0 0 1 0 9 1 0 1 1 1 1 1 > head(Y2) # PLACEBO Treatment n2=1554 ID HEADACHE PAIN MYALGIA ARTHRALGIA MALAISE FATIGUE CHILLS 1 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 8 0 0 0 0 1 0 0 10 0 0 0 0 0 0 0 11 0 0 0 0 0 0 0 15 0 0 1 0 0 1 0
Notation and Setup • k-dimensional response vectors: Group 1Group 2 • Random sample in each group: Group 1Group 2 • Joint distrib. in each group depends on 2k-1 parameters Group 1Group 2
Comparing Margins • Usually only interested in k margins.Group 1Group 2 • With just two (k=2) adverse events: Group 1Group 2 Headache Headache Pain Pain
Comparing Margins • Differences in marginal incidence rates between Group 1 (Treatment) andGroup 2 (Control) Group1 Group2 Diff HEADACHE 0.26030.2407 0.0196 INJECTION SITE PAIN 0.60880.1384 0.4705 MYALGIA 0.25880.1088 0.1500 ARTHRALGIA 0.08930.0579 0.0314 MALAISE 0.20850.1332 0.0753 FATIGUE 0.24760.2098 0.0378 CHILLS 0.09280.0463 0.0465
Family of Tests • j-thNull Hypothesis: • Unrestricted and restricted MLEs:
Comparing Margins • Estimates of marginal incidence rates and test statistics comparing Group 1 (Treatment) andGroup 2 (Control)
Asymptotic Test • Note: • Asymptotically, multivariate normal with covariance matrix determined by
Asymptotic Test • Correlation Matrix: > round(cov2cor(Sigma),2) d1 d2 d3 d4 d5 d6 d7 d1 1.00 0.04 0.29 0.26 0.38 0.41 0.27 d21.00 0.18 0.09 0.08 0.10 0.01 d3 1.00 0.46 0.35 0.36 0.30 d4 1.00 0.33 0.33 0.32 d5 1.00 0.510.44 d6 1.00 0.37 d7 1.00 > qmvnorm(0.95, tail="both.tails", corr=cov2cor(Sigma)) $quantile [1] 2.656222
Asymptotic Test • Correlation Matrix: > round(cov2cor(Sigma),2) d1 d2 d3 d4 d5 d6 d7 d11.00 0.06 0.33 0.28 0.41 0.41 0.29 d21.00 0.28 0.11 0.15 0.12 0.09 d3 1.00 0.46 0.41 0.36 0.35 d4 1.00 0.32 0.34 0.28 d5 1.00 0.500.47 d6 1.00 0.37 d7 1.00 > qmvnorm(0.95, tail="both.tails", corr=cov2cor(Sigma)) $quantile [1] 2.653783
Permutation Approach • When testing can use Permutation Approach • This assumes distributions are exchangeable (i.e. identical), much stronger assumption than under null • Need two extra conditions: • Sequences of all 0's as or more likely to occur under group 2 (Control) • Sequence of all 1's as or more likely to occur under group 1 (Treatment)
Permutation vs. Asymptotic • Permutation vs. asymptotic distribution of Permut. Distr. Critical Value: (a = 0.05) cperm= 2.655 casympt= 2.654 cBonf= 2.690 Asympt. Distr.
Family of Tests • Results: Raw and Adjusted P-values
Simultaneous Confidence Intervals • Invert family of tests: • Confidence Region: • Simplifies to simultaneous confidence intervals if
Simultaneous Confidence Intervals • Results: Inverting Score test diffLBUB HEADACHE 0.0196 -0.0196 0.0583 PAIN 0.4705 0.4323 0.5069 MYALGIA 0.1500 0.1162 0.1835 ARTHRALGIA 0.0314 0.0078 0.0547 MALAISE 0.0753 0.0416 0.1086 FATIGUE 0.0378 -0.0002 0.0752 CHILLS 0.0465 0.0239 0.0692
Simultaneous Confidence Intervals • We used (and recommend) score statistic • Could use Wald statistic instead • This is equivalent to fitting marginal model via GEE: • asympt. multiv. normal, with (sandwich) covariance matrix (same as before) • Use distribution of for multiplicity adjustment
Simultaneous Confidence Intervals • Results: GEE approach (= inverting Wald test) diffLBUB HEADACHE 0.0196 -0.0194 0.0586 PAIN 0.4705 0.4331 0.5078 MYALGIA 0.1500 0.1164 0.1836 ARTHRALGIA 0.0314 0.0082 0.0546 MALAISE 0.0753 0.0419 0.1087 FATIGUE 0.0378 0.0001 0.0755 CHILLS 0.0465 0.0241 0.0689