1 / 18

Lecture 7 GLMs II Binomial Family

Learn about GLMs for binary & proportion data using the Binomial family, with hands-on examples in R analyzing dose-response toxicity data of moths. Understand model building, interpretation, and comparisons.

nmicheal
Download Presentation

Lecture 7 GLMs II Binomial Family

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Advanced Research Skills Lecture 7 GLMs IIBinomial Family Olivier MISSA, om502@york.ac.uk

  2. Outline • Continue our Introduction to Generalized Linear Models. • In this lecture: Illustrate the use of GLMs for proportion and binary data.

  3. Reminder • Binary & Proportion data tend to follow the Binomial distribution • The Canonical link of this glm family is the logit function: • The variance reaches a maximum for intermediate values of p and a minimum at either 0% or 100%.

  4. Three ways to work with binary data • In R, binary/proportion data can be entered into a model as a response in three different ways: • as a numeric vector • (holding the number or proportion of successes) • as a logical vector or a factor • (TRUE or the first factor level will be considered successes). • as a two-column matrix(the first column holding the number of successes and the second column the number of failures).

  5. Dose (micrograms) Sex 1 2 4 8 16 32 Male 1 4 9 13 18 20 Female 0 2 6 10 12 16 Number of dead moths out of 20 tested 1st Example • Toxicity to tobacco budworm (moth) of different doses of trans-cypermethrin. • Batches of 20 moths (of each sex) were put in contact for three days with increasing doses of the pyrethroid. > (dose <- rep(2^(0:5), 2)) [1] 1 2 4 8 16 32 1 2 4 8 16 32 > numdead <- c(1,4,9,13,18,20,0,2,6,10,12,16) > (sex <- factor( rep( c("M","F"), c(6,6) ) )) [1] M M M M M M F F F F F F Levels: F M > SF <- cbind(numdead, numalive=20-numdead)

  6. 1st Example What is modelled is the proportion of successes > modb <- glm(SF ~ sex*dose, family=binomial) > summary(modb) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -1.71578 0.32233 -5.323 1.02e-07 *** sexM -0.21194 0.51523 -0.411 0.68082 dose 0.11568 0.02379 4.863 1.16e-06 *** sexM:dose 0.18156 0.06692 2.713 0.00666 ** --- (Dispersion parameter for binomial family taken to be 1) Null deviance: 124.876 on 11 degrees of freedom Residual deviance: 18.164 on 8 degrees of freedom AIC: 56.275

  7. 1st Example > ldose <- log2(dose) > modb2 <- glm(SF ~ sex*ldose, family=binomial) > summary(modb2) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -2.9935 0.5527 -5.416 6.09e-08 *** sexM 0.1750 0.7783 0.225 0.822 ldose 0.9060 0.1671 5.422 5.89e-08 *** sexM:ldose 0.3529 0.2700 1.307 0.191 --- (Dispersion parameter for binomial family taken to be 1) Null deviance: 124.8756 on 11 degrees of freedom Residual deviance: 4.9937 on 8 degrees of freedom AIC: 43.104

  8. 1st Example > drop1(modb2, test="Chisq") Single term deletions Model: SF ~ sex * ldose Df Deviance AIC LRT Pr(Chi) <none> 4.994 43.104 sex:ldose 1 6.757 42.867 1.763 0.1842 > modb3 <- update(modb2, ~. – sex:ldose) > summary(modb3) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -3.4732 0.4685 -7.413 1.23e-13 *** sexM 1.1007 0.3558 3.093 0.00198 ** ldose 1.0642 0.1311 8.119 4.70e-16 *** --- (Dispersion parameter for binomial family taken to be 1) Null deviance: 124.876 on 11 degrees of freedom Residual deviance: 6.757 on 9 degrees of freedom AIC: 42.867

  9. 1st Example > drop1(modb3, test="Chisq") Single term deletions Model: SF ~ sex + ldose Df Deviance AIC LRT Pr(Chi) <none> 6.757 42.867 sex 1 16.984 51.094 10.227 0.001384 ** ldose 1 118.799 152.909 112.042 < 2.2e-16 *** > shapiro.test(residuals(modb3), type="deviance") Shapiro-Wilk normality test data: residuals(modb3, type = "deviance") W = 0.9666, p-value = 0.8725

  10. 1st Example > par(mfrow=c(2,2)) > plot(modb3)

  11. 1st Example > plot( c(0,1) ~ c(1,32), type="n", log="x", xlab="dose", ylab="Probability") > text(dose, numdead/20, labels=as.character(sex) ) > ld <- seq(0,32,0.5) > lines (ld, predict(modb3, data.frame(ldose=log2(ld), sex=factor(rep("M", length(ld)), levels=levels(sex))), type="response") ) > lines (ld, predict(modb3, data.frame(ldose=log2(ld), sex=factor(rep("F", length(ld)), levels=levels(sex))), type="response"), lty=2, col="red" )

  12. 1st Example > modbp <- glm(SF ~ sex*ldose, family=binomial(link="probit")) > AIC(modbp) [1] 41.87836 > modbc <- glm(SF ~ sex*ldose, family=binomial(link="cloglog")) > AIC(modbc) [1] 43.8663 > AIC(modb3) [1] 42.86747

  13. 1st Example > summary(modb3) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -3.4732 0.4685 -7.413 1.23e-13 *** sexM 1.1007 0.3558 3.093 0.00198 ** ldose 1.0642 0.1311 8.119 4.70e-16 *** --- > exp(modb3$coeff) ## careful it may be misleading (Intercept) sexM ldose 0.031019 3.006400 2.898560 ## odds ration: p / (1-p) > exp(modb3$coeff[1]+modb3$coeff[2]) ## odds for males (Intercept) 0.09325553 logit scale Every doubling of the dose will lead to an increase in the odds of dying over surviving by a factor of 2.899

  14. 2nd Example • Erythrocyte Sedimentation Rate in a group of patients. • Two groups : <20 (healthy) or >20 (ill) mm/hour • Q: Is it related to globulin & fibrinogen level in the blood ? > data("plasma", package="HSAUR") > str(plasma) 'data.frame': 32 obs. of 3 variables: $ fibrinogen: num 2.52 2.56 2.19 2.18 3.41 2.46 3.22 2.21 ... $ globulin : int 38 31 33 31 37 36 38 37 39 41 ... $ ESR : Factor w/ 2 levels "ESR < 20","ESR > 20": 1 1 ... > summary(plasma) fibrinogen globulin ESR Min. :2.090 Min. :28.00 ESR < 20:26 1st Qu.:2.290 1st Qu.:31.75 ESR > 20: 6 Median :2.600 Median :36.00 Mean :2.789 Mean :35.66 3rd Qu.:3.167 3rd Qu.:38.00 Max. :5.060 Max. :46.00

  15. 2nd Example > stripchart(globulin ~ ESR, vertical=T, data=plasma, xlab="Erythrocyte Sedimentation Rate (mm/hr)", ylab="Globulin blood level", method="jitter" ) > stripchart(fibrinogen ~ ESR, vertical=T, data=plasma, xlab="Erythrocyte Sedimentation Rate (mm/hr)", ylab="Fibrinogen blood level", method="jitter" )

  16. 2nd Example factor > mod1 <- glm(ESR~fibrinogen, data=plasma, family=binomial) > summary(mod1) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -6.8451 2.7703 -2.471 0.0135 * fibrinogen 1.8271 0.9009 2.028 0.0425 * --- (Dispersion parameter for binomial family taken to be 1) Null deviance: 30.885 on 31 degrees of freedom Residual deviance: 24.840 on 30 degrees of freedom AIC: 28.840 > mod2 <- glm(ESR~fibrinogen+globulin, data=plasma, family=binomial) > AIC(mod2) [1] 28.97111

  17. 2nd Example > anova(mod1, mod2, test="Chisq") Analysis of Deviance Table Model 1: ESR ~ fibrinogen Model 2: ESR ~ fibrinogen + globulin Resid. Df Resid. Dev Df Deviance P(>|Chi|) 1 30 24.8404 2 29 22.9711 1 1.8692 0.1716 > summary(mod2) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -12.7921 5.7963 -2.207 0.0273 * fibrinogen 1.9104 0.9710 1.967 0.0491 * globulin 0.1558 0.1195 1.303 0.1925 --- (Dispersion parameter for binomial family taken to be 1) Null deviance: 30.885 on 31 degrees of freedom Residual deviance: 22.971 on 29 degrees of freedom AIC: 28.971 The difference in terms of Deviance between these models is not significant, which leads us to select the least complex model

  18. 2nd Example > shapiro.test(residuals(mod1, type="deviance")) Shapiro-Wilk normality test data: residuals(mod1, type = "deviance") W = 0.6863, p-value = 5.465e-07 > par(mfrow=c(2,2)) > plot(mod1)

More Related