1 / 17

Lecture 18 Matched Case Control Studies

Lecture 18 Matched Case Control Studies. BMTRY 701 Biostatistical Methods II. Matched case control studies. References: Hosmer and Lemeshow, Applied Logistic Regression http://staff.pubhealth.ku.dk/~bxc/SPE.2002/Slides/mcc.pdf http://staff.pubhealth.ku.dk/~bxc/Talks/Nested-Matched-CC.pdf

jeb
Download Presentation

Lecture 18 Matched Case Control Studies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 18Matched Case Control Studies BMTRY 701 Biostatistical Methods II

  2. Matched case control studies • References: • Hosmer and Lemeshow, Applied Logistic Regression • http://staff.pubhealth.ku.dk/~bxc/SPE.2002/Slides/mcc.pdf • http://staff.pubhealth.ku.dk/~bxc/Talks/Nested-Matched-CC.pdf • http://www.tau.ac.il/cc/pages/docs/sas8/stat/chap49/sect35.htm • http://www.ats.ucla.edu/stat/sas/library/logistic.pdf (beginning page 5)

  3. Matched design • Matching on important factors is common • OP cancer: • age • gender • Why? • forces the distribution to be the same on those variables • removes any effects of those variables on the outcome • eliminates confounding

  4. 1-to-M matching • For each ‘case’, there is a matched ‘control • Process usually dictates that the case is enrolled, then a control is identified • For particularly rare diseases or when large N is required, often use more than one control per case

  5. Logistic regression for matched case control studies • Recall independence • But, if cases and controls are matched, are they still independent?

  6. Solution: treat each matched set as a stratum • one-to-one matching: 1 case and 1 control per stratum • one-to-M matching: 1 case and M controls per stratum • Logistic model per stratum: within stratum, independence holds. • We assume that the OR for x and y is constant across strata

  7. How many parameters is that? • Assume sample size is 2n and we have 1-to-1 matching: • n strata + p covariates = n+p parameters • This is problematic: • as n gets large, so does the number of parameters • too many parameters to estimate and a problem of precision • but, do we really care about the strata-specific intercepts? • “NUISANCE PARAMETERS”

  8. Conditional logistic regression • To avoid estimation of the intercepts, we can condition on the study design. • Huh? • Think about each stratum: • how many cases and controls? • what is the probability that the case is the case and the control is the control? • what is the probability that the control is the case and the case the control? • For each stratum, the likelihood contribution is based on this conditional probability

  9. Conditioning • For 1 to 1 matching: with two individuals in stratum k where y indicates case status (1 = case, 0 = control) • Write as a likelihood contribution for stratum k:

  10. Likelihood function for CLR Substitute in our logistic representation of p and simplify:

  11. Likelihood function for CLR • Now, take the product over all the strata for the full likelihood • This is the likelihood for the matched case-control design • Notice: • there are no strata-specific parameters • cases are defined by subscript ‘1’ and controls by subscript ‘2’ • Theory for 1-to-M follows similarly (but not shown here)

  12. Interpretation of β • Same as in ‘standard’ logistic regression • β represents the log odds ratio comparing the risk of disease by a one unit difference in x

  13. When to use matched vs. unmatched? • Some papers use both for a matched design • Tradeoffs: • bias • precision • Sometimes matched design to ensure balance, but then unmatched analysis • They WILL give you different answers • Gillison paper

  14. Another approach to matched data • use random effects models • CLR is elegant and simple • can identify the estimates using a ‘transformation’ of logistic regression results • But, with new age of computing, we have other approaches • Random effects models: • allow strata specific intercepts • not problematic estimation process • additional assumptions: intercepts follow normal distribution • Will NOT give identical results

  15. . xi: clogit control hpv16ser, group(strata) or Iteration 0: log likelihood = -72.072957 Iteration 1: log likelihood = -71.803221 Iteration 2: log likelihood = -71.798737 Iteration 3: log likelihood = -71.798736 Conditional (fixed-effects) logistic regression Number of obs = 300 LR chi2(1) = 76.12 Prob > chi2 = 0.0000 Log likelihood = -71.798736 Pseudo R2 = 0.3465 ------------------------------------------------------------------------------ control | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- hpv16ser | 13.16616 4.988492 6.80 0.000 6.26541 27.66742 ------------------------------------------------------------------------------

  16. . xi: logistic control hpv16ser Logistic regression Number of obs = 300 LR chi2(1) = 90.21 Prob > chi2 = 0.0000 Log likelihood = -145.8514 Pseudo R2 = 0.2362 ------------------------------------------------------------------------------ control | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- hpv16ser | 17.6113 6.039532 8.36 0.000 8.992582 34.4904 ------------------------------------------------------------------------------

  17. . xi: gllamm control hpv16ser, i(strata) family(binomial) number of level 1 units = 300 number of level 2 units = 100 Condition Number = 2.4968508 gllamm model log likelihood = -145.8514 ------------------------------------------------------------------------------ control | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- hpv16ser | 2.868541 .3429353 8.36 0.000 2.1964 3.540681 _cons | -1.464547 .1692104 -8.66 0.000 -1.796193 -1.1329 ------------------------------------------------------------------------------ Variances and covariances of random effects ------------------------------------------------------------------------------ ***level 2 (strata) var(1): 4.210e-21 (2.231e-11) ------------------------------------------------------------------------------ OR = 17.63

More Related