210 likes | 394 Views
Design of Statistical Investigations. 14 Case Control Studies. Stephen Senn. Case-Control Study Definition.
E N D
Design of Statistical Investigations 14 Case Control Studies Stephen Senn SJS SDI_14
Case-Control StudyDefinition “The observational epidemiologic study of persons with the disease (or other outcome variable) of interest and a suitable control (comparison, reference) group of person with the disease. The relationship of the attribute to the disease is examined by comparing the diseased or nondiseased group with regard to how frequently the disease is present, or if quantitative, the levels of the attribute in each group . In short the past history of exposure to a suspected risk factor is compared between “cases” and “controls”, persons who resemble the cases in such respects as age and sex but do not have the disease or condition of interest.” Last, J.M. A Dictionary of Epidemiology SJS SDI_14
Schematic Representation of Cohort Study Each point represents a member of the cohort of 10,000 persons SJS SDI_14
200 cases and 200 controls are sampled from diseased and healthy persons respectively SJS SDI_14
The number of cases and controls is a foregone conclusion. Exposure becomes the random variable and is studied as a function of status Note that axes have been exchanged to reflect this SJS SDI_14
Smoking and Lung-CancerObs_7 • Famous study of Hill and Doll • Sampled 1357 cases of lung cancer from four hospitals in the United Kingdom • Sampled 1357 hospital-based controls • Compared the two groups as regards smoking history SJS SDI_14
Doll and Hill DataObs_7 SJS SDI_14
In General SJS SDI_14
A Model for Case-Control Studies Number exposed Number unexposed Probability case if exposed Probability case if unexposed Probability recorded if case Probability recorded if control SJS SDI_14
Expectations etc. SJS SDI_14
Notes • Thus the odds-ratio can be estimated even though nE, nU, l and q are unknown. • However, although the assumption that l and q are equal is not needed, an assumption that they do not vary with exposure is needed. SJS SDI_14
Sources for Controls(Rothman) • Population • using population register • Neighbourhood • For example one or two control from neighbourhood of case • Not suitable for environmental exposure • Random digit dialing • Hospitals or clinics SJS SDI_14
Complete population Can calculate incidence rates Usually expensive Convenient for studying many diseases Can be prospective or retrospective Sampled population Can calculate ratios only Usually less expensive Convenient for studying many exposures Can be prospective or retrospective Cohort and Case Control StudiesCohort Case Control Rothman p 91 SJS SDI_14
The “Delta” Method SJS SDI_14
Variance of a Logit SJS SDI_14
Variance of the log-odds ratio The log-odds ratio is the difference between two logits. Since these are independent, the variance of their difference is the sum of their variances. Thus, in terms of our previous table, we have Note the implications of the variance formula. The variance cannot be reduced beyond the reciprocal of the entry in a given cell by increasing the frequencies of the other cells. SJS SDI_14
S-Plus AnalysisObs_7 #Doll and Hill options(contrasts=c("contr.treatment", "contr.poly")) #set contrast options #To analyse the famous case-control study Outcome<-factor(c("case","case","control","control")) Exposure<-factor(rep(c("smoker","non-smoker"),2)) Freq<-c(1350,7,1296,61) Doll.Hill<-data.frame(Outcome, Exposure, Freq) Doll.Hill OR<-Freq[1]*Freq[4]/(Freq[2]*Freq[3]) l.OR<-log(OR) var<-(1/Freq[1]+1/Freq[2]+1/Freq[3]+1/Freq[4]) SE<-sqrt(var) t<-l.OR/SE LCL<-exp(l.OR-1.96*SE) UCL<-exp(l.OR+1.96*SE) results.1<-data.frame(l.OR,var,SE,t,LCL,OR,UCL) results.1 SJS SDI_14
#Fit results using a log-linear model • fit1<-glm(Freq~Exposure*Outcome,family=poisson) • summary(fit1,cor=F) • #Prepare data to perform logistic regression • Y<-c(Freq[1],Freq[3]) • N<-c(Freq[1]+Freq[2],Freq[3]+Freq[4]) • Exposure2<-factor(c("Smoker","Non-smoker")) • P<-Y/N • DollHill.2<-data.frame(Y,N,P,Exposure2) • DollHill.2 • #Logistic regression • fit2<-glm(P~Exposure2,family=binomial,weight=N) • summary(fit2,cor=F) SJS SDI_14
> Doll.Hill Outcome Exposure Freq 1 case smoker 1350 2 case non-smoker 7 3 control smoker 1296 4 control non-smoker 61 > results.1 l.OR var SE t LCL OR UCL 1 2.205786 0.1607629 0.40095255.501364 4.136784 9.077381 19.91857 Call: glm(formula = Freq ~ Exposure * Outcome, family = poisson) Coefficients: Value Std. Error t value (Intercept) 1.945910 0.3779645 5.148394 Exposure 5.261950 0.3789431 13.885857 Outcome 2.164964 0.3990621 5.425129 Exposure:Outcome -2.2057860.4009525 -5.501364 SJS SDI_14
> DollHill.2 Y N P Exposure2 1 1350 1357 0.9948416 Smoker 2 1296 1357 0.9550479 Non-smoker Call: glm(formula = P ~ Exposure2, family = binomial, weights = N) Coefficients: Value Std. Error t value (Intercept) 3.056164 0.1310154 23.326746 Exposure2 2.2057860.4009483 5.501422 (Dispersion Parameter for Binomial family taken to be 1 ) SJS SDI_14
Questions • Why did Hill and Doll choose a case-control study rather than a cohort study? • We now believe that the choice of controls used in the Hill and Doll study led to an underestimate of odds ratio for lung cancer and smoking why? • Consider the recent controversy over breast implants and connective tissue disease. What difficulty does press-coverage cause for any case-control study in this field? • Why do epidemiologists rarely use more than three controls per case? SJS SDI_14