190 likes | 388 Views
Lecture 18 Matched Case Control Studies. BMTRY 701 Biostatistical Methods II. Matched case control studies. References: Hosmer and Lemeshow, Applied Logistic Regression http://staff.pubhealth.ku.dk/~bxc/SPE.2002/Slides/mcc.pdf http://staff.pubhealth.ku.dk/~bxc/Talks/Nested-Matched-CC.pdf
E N D
Lecture 18Matched Case Control Studies BMTRY 701 Biostatistical Methods II
Matched case control studies • References: • Hosmer and Lemeshow, Applied Logistic Regression • http://staff.pubhealth.ku.dk/~bxc/SPE.2002/Slides/mcc.pdf • http://staff.pubhealth.ku.dk/~bxc/Talks/Nested-Matched-CC.pdf • http://www.tau.ac.il/cc/pages/docs/sas8/stat/chap49/sect35.htm • http://www.ats.ucla.edu/stat/sas/library/logistic.pdf (beginning page 5)
Matched design • Matching on important factors is common • OP cancer: • age • gender • Why? • forces the distribution to be the same on those variables • removes any effects of those variables on the outcome • eliminates confounding
1-to-M matching • For each ‘case’, there is a matched ‘control • Process usually dictates that the case is enrolled, then a control is identified • For particularly rare diseases or when large N is required, often use more than one control per case
Logistic regression for matched case control studies • Recall independence • But, if cases and controls are matched, are they still independent?
Solution: treat each matched set as a stratum • one-to-one matching: 1 case and 1 control per stratum • one-to-M matching: 1 case and M controls per stratum • Logistic model per stratum: within stratum, independence holds. • We assume that the OR for x and y is constant across strata
How many parameters is that? • Assume sample size is 2n and we have 1-to-1 matching: • n strata + p covariates = n+p parameters • This is problematic: • as n gets large, so does the number of parameters • too many parameters to estimate and a problem of precision • but, do we really care about the strata-specific intercepts? • “NUISANCE PARAMETERS”
Conditional logistic regression • To avoid estimation of the intercepts, we can condition on the study design. • Huh? • Think about each stratum: • how many cases and controls? • what is the probability that the case is the case and the control is the control? • what is the probability that the control is the case and the case the control? • For each stratum, the likelihood contribution is based on this conditional probability
Conditioning • For 1 to 1 matching: with two individuals in stratum k where y indicates case status (1 = case, 0 = control) • Write as a likelihood contribution for stratum k:
Likelihood function for CLR Substitute in our logistic representation of p and simplify:
Likelihood function for CLR • Now, take the product over all the strata for the full likelihood • This is the likelihood for the matched case-control design • Notice: • there are no strata-specific parameters • cases are defined by subscript ‘1’ and controls by subscript ‘2’ • Theory for 1-to-M follows similarly (but not shown here)
Interpretation of β • Same as in ‘standard’ logistic regression • β represents the log odds ratio comparing the risk of disease by a one unit difference in x
When to use matched vs. unmatched? • Some papers use both for a matched design • Tradeoffs: • bias • precision • Sometimes matched design to ensure balance, but then unmatched analysis • They WILL give you different answers • Gillison paper
Another approach to matched data • use random effects models • CLR is elegant and simple • can identify the estimates using a ‘transformation’ of logistic regression results • But, with new age of computing, we have other approaches • Random effects models: • allow strata specific intercepts • not problematic estimation process • additional assumptions: intercepts follow normal distribution • Will NOT give identical results
. xi: clogit control hpv16ser, group(strata) or Iteration 0: log likelihood = -72.072957 Iteration 1: log likelihood = -71.803221 Iteration 2: log likelihood = -71.798737 Iteration 3: log likelihood = -71.798736 Conditional (fixed-effects) logistic regression Number of obs = 300 LR chi2(1) = 76.12 Prob > chi2 = 0.0000 Log likelihood = -71.798736 Pseudo R2 = 0.3465 ------------------------------------------------------------------------------ control | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- hpv16ser | 13.16616 4.988492 6.80 0.000 6.26541 27.66742 ------------------------------------------------------------------------------
. xi: logistic control hpv16ser Logistic regression Number of obs = 300 LR chi2(1) = 90.21 Prob > chi2 = 0.0000 Log likelihood = -145.8514 Pseudo R2 = 0.2362 ------------------------------------------------------------------------------ control | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- hpv16ser | 17.6113 6.039532 8.36 0.000 8.992582 34.4904 ------------------------------------------------------------------------------
. xi: gllamm control hpv16ser, i(strata) family(binomial) number of level 1 units = 300 number of level 2 units = 100 Condition Number = 2.4968508 gllamm model log likelihood = -145.8514 ------------------------------------------------------------------------------ control | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- hpv16ser | 2.868541 .3429353 8.36 0.000 2.1964 3.540681 _cons | -1.464547 .1692104 -8.66 0.000 -1.796193 -1.1329 ------------------------------------------------------------------------------ Variances and covariances of random effects ------------------------------------------------------------------------------ ***level 2 (strata) var(1): 4.210e-21 (2.231e-11) ------------------------------------------------------------------------------ OR = 17.63