260 likes | 540 Views
Logistic Regression Homework Solutions. EPP 245/298 Statistical Analysis of Laboratory Data. Exercise 11.1. Predict risk of malaria from age and log transformed antibody level using logistic regression First examine some plots Then fit the logistic regression Interpret the results
E N D
Logistic Regression Homework Solutions EPP 245/298 Statistical Analysis of Laboratory Data
Exercise 11.1 • Predict risk of malaria from age and log transformed antibody level using logistic regression • First examine some plots • Then fit the logistic regression • Interpret the results • Check model assumptions EPP 245 Statistical Analysis of Laboratory Data
> library(ISwR) Loading required package: survival Loading required package: splines > data(malaria) > summary(malaria) subject age ab mal Min. : 1.00 Min. : 3.00 Min. : 2.0 Min. :0.00 1st Qu.: 25.75 1st Qu.: 5.75 1st Qu.: 29.0 1st Qu.:0.00 Median : 50.50 Median : 9.00 Median : 111.0 Median :0.00 Mean : 50.50 Mean : 8.86 Mean : 311.5 Mean :0.27 3rd Qu.: 75.25 3rd Qu.:12.00 3rd Qu.: 373.8 3rd Qu.:1.00 Max. :100.00 Max. :15.00 Max. :2066.0 Max. :1.00 > attach(malaria) > plot(age,mal) > plot(log(ab),mal) > plot(age, log(ab)) EPP 245 Statistical Analysis of Laboratory Data
> mal.glm <- glm(mal ~ age+log(ab),binomial) > summary(mal.glm) Call: glm(formula = mal ~ age + log(ab), family = binomial) Deviance Residuals: Min 1Q Median 3Q Max -1.8492 -0.7536 -0.4838 0.8809 2.5796 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 2.57234 0.95184 2.702 0.006883 ** age -0.06546 0.06772 -0.967 0.333703 log(ab) -0.68235 0.19552 -3.490 0.000483 *** --- (Dispersion parameter for binomial family taken to be 1) Null deviance: 116.652 on 99 degrees of freedom Residual deviance: 98.017 on 97 degrees of freedom AIC: 104.02 Residual deviance shows no lack of fit EPP 245 Statistical Analysis of Laboratory Data
> summary(glm(mal ~ log(ab),binomial)) Call: glm(formula = mal ~ log(ab), family = binomial) Deviance Residuals: Min 1Q Median 3Q Max -1.9159 -0.7339 -0.4854 0.8813 2.4722 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 2.1552 0.8401 2.565 0.010305 * log(ab) -0.7122 0.1932 -3.686 0.000228 *** --- (Dispersion parameter for binomial family taken to be 1) Null deviance: 116.652 on 99 degrees of freedom Residual deviance: 98.968 on 98 degrees of freedom AIC: 102.97 Number of Fisher Scoring iterations: 4 EPP 245 Statistical Analysis of Laboratory Data
Exercise 11.2 • The 'gvhd' data frame has 37 rows and 7 columns. It contains data from patients receiving a nondepleted allogenic bone marrow transplant, with the purpose of finding variables associated with the development of acute graft-versus-host disease. EPP 245 Statistical Analysis of Laboratory Data
pnr a numeric vector. Patient number. • rcpage a numeric vector. Age of recipient (years). • donage a numeric vector. Age of donor (years). • type a numeric vector, type of leukaemia coded 1: AML, 2: ALL, 3: CML for acute myeloid, acute lymphatic, and chronic myeloid leukaemia. • preg a numeric vector code, indicating whether donor has been pregnant. 0: no, 1: yes. • index a numeric vector giving an index of mixed epidermal cell-lymphocyte reactions. • gvhd a numeric vector code, graft versus host disease. 0: no, 1: yes. • time a numeric vector. Follow-up time • dead a numeric vector code 0: no (censored), 1: yes EPP 245 Statistical Analysis of Laboratory Data
> summary(graft.vs.host) pnr rcpage donage type preg Min. : 1 Min. :13.00 Min. :14.00 Min. :1.000 Min. :0.0000 1st Qu.:10 1st Qu.:20.00 1st Qu.:20.00 1st Qu.:1.000 1st Qu.:0.0000 Median :19 Median :23.00 Median :23.00 Median :2.000 Median :0.0000 Mean :19 Mean :25.43 Mean :25.81 Mean :1.973 Mean :0.2703 3rd Qu.:28 3rd Qu.:29.00 3rd Qu.:34.00 3rd Qu.:3.000 3rd Qu.:1.0000 Max. :37 Max. :43.00 Max. :43.00 Max. :3.000 Max. :1.0000 index gvhd time dead Min. : 0.270 Min. :0.0000 Min. : 41.0 Min. :0.0000 1st Qu.: 0.920 1st Qu.:0.0000 1st Qu.: 177.0 1st Qu.:0.0000 Median : 2.010 Median :0.0000 Median : 667.0 Median :0.0000 Mean : 2.556 Mean :0.4595 Mean : 669.8 Mean :0.4865 3rd Qu.: 3.730 3rd Qu.:1.0000 3rd Qu.:1105.0 3rd Qu.:1.0000 Max. :10.110 Max. :1.0000 Max. :1504.0 Max. :1.0000 > attach(graft.vs.host) > hist(index) > hist(sqrt(index)) > hist(log(index)) EPP 245 Statistical Analysis of Laboratory Data
> summary(glm(gvhd ~ rcpage+donage+type+preg+log(index),binomial)) Deviance Residuals: Min 1Q Median 3Q Max -2.0007 -0.3646 -0.0956 0.5796 1.6828 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -5.47420 2.59647 -2.108 0.0350 * rcpage 0.01597 0.08205 0.195 0.8457 donage 0.10724 0.08262 1.298 0.1943 type2 -0.42824 1.25389 -0.342 0.7327 type3 1.61517 1.30743 1.235 0.2167 preg 1.66236 1.18103 1.408 0.1593 log(index) 1.81874 0.88919 2.045 0.0408 * --- Null deviance: 51.049 on 36 degrees of freedom Residual deviance: 26.252 on 30 degrees of freedom AIC: 40.252 Number of Fisher Scoring iterations: 6 EPP 245 Statistical Analysis of Laboratory Data
> drop1(glm(gvhd ~ rcpage+donage+type+preg+log(index), binomial),test="Chisq") Single term deletions Model: gvhd ~ rcpage + donage + type + preg + log(index) Df Deviance AIC LRT Pr(Chi) <none> 26.252 40.252 rcpage 1 26.291 38.291 0.038 0.84495 donage 1 28.082 40.082 1.829 0.17622 type 2 28.959 38.959 2.706 0.25841 preg 1 28.447 40.447 2.195 0.13849 log(index) 1 32.343 44.343 6.090 0.01359 * EPP 245 Statistical Analysis of Laboratory Data
> drop1(glm(gvhd ~ donage+type+preg+log(index),binomial), test="Chisq") Single term deletions Model: gvhd ~ donage + type + preg + log(index) Df Deviance AIC LRT Pr(Chi) <none> 26.291 38.291 donage 1 28.832 38.832 2.541 0.11092 type 2 29.403 37.403 3.113 0.21092 preg 1 28.812 38.812 2.522 0.11228 log(index) 1 32.630 42.630 6.340 0.01181 * --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 EPP 245 Statistical Analysis of Laboratory Data
> drop1(glm(gvhd ~ donage+preg+log(index),binomial),test="Chisq") Single term deletions Model: gvhd ~ donage + preg + log(index) Df Deviance AIC LRT Pr(Chi) <none> 29.403 37.403 donage 1 33.654 39.654 4.251 0.0392375 * preg 1 31.068 37.068 1.665 0.1968922 log(index) 1 41.723 47.723 12.320 0.0004482 *** --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 EPP 245 Statistical Analysis of Laboratory Data
> drop1(glm(gvhd ~ donage+log(index),binomial),test="Chisq") Single term deletions Model: gvhd ~ donage + log(index) Df Deviance AIC LRT Pr(Chi) <none> 31.068 37.068 donage 1 37.740 41.740 6.671 0.0097993 ** log(index) 1 45.476 49.476 14.407 0.0001473 *** --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 EPP 245 Statistical Analysis of Laboratory Data
> summary(glm(gvhd ~ donage+log(index),binomial),test="Chisq") Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -5.45399 2.08147 -2.620 0.00879 ** donage 0.14594 0.06465 2.257 0.02399 * log(index) 2.17773 0.78986 2.757 0.00583 ** Null deviance: 51.049 on 36 degrees of freedom Residual deviance: 31.068 on 34 degrees of freedom AIC: 37.068 > summary(glm(gvhd ~ donage+sqrt(index),binomial),test="Chisq") Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -8.28968 2.77653 -2.986 0.00283 ** donage 0.14353 0.06218 2.308 0.02100 * sqrt(index) 2.96830 1.08381 2.739 0.00617 ** Null deviance: 51.049 on 36 degrees of freedom Residual deviance: 31.591 on 34 degrees of freedom AIC: 37.591 EPP 245 Statistical Analysis of Laboratory Data
> summary(glm(gvhd ~ donage+log(index),binomial),test="Chisq") Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -5.45399 2.08147 -2.620 0.00879 ** donage 0.14594 0.06465 2.257 0.02399 * log(index) 2.17773 0.78986 2.757 0.00583 ** Null deviance: 51.049 on 36 degrees of freedom Residual deviance: 31.068 on 34 degrees of freedom AIC: 37.068 > summary(glm(gvhd ~ donage+index,binomial),test="Chisq") Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -6.08600 2.10453 -2.892 0.00383 ** donage 0.14075 0.06002 2.345 0.01902 * index 0.94439 0.35914 2.630 0.00855 ** Null deviance: 51.049 on 36 degrees of freedom Residual deviance: 32.362 on 34 degrees of freedom AIC: 38.362 EPP 245 Statistical Analysis of Laboratory Data
Other Issues • If we want to analyze time to GVHD, we could use the time and the censoring variable • This is complex because someone who does not have GVHD may develop it later (one type of censoring) and cannot develop it if dead (another type) • We have competing risks • Some of these issues will be addressed next quarter EPP 245 Statistical Analysis of Laboratory Data
Exercise 11.3 • The function confint() gives parameter confidence intervals for nonlinear models that are more accurate than the ones given by default • This uses a profile likelihood technique, which varies the parameter until the change in likelihood/deviance is too big EPP 245 Statistical Analysis of Laboratory Data
> mal.glm <- glm(mal ~ age+log(ab),binomial) > summary(mal.glm) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 2.57234 0.95184 2.702 0.006883 ** age -0.06546 0.06772 -0.967 0.333703 log(ab) -0.68235 0.19552 -3.490 0.000483 *** -.68235 +- (1.960)(.19552) = (-1.066, -.2991) > library(MASS) > confint(mal.glm) Waiting for profiling to be done... 2.5 % 97.5 % (Intercept) 0.7811646 4.54544887 age -0.2024561 0.06556186 log(ab) -1.0985746 -0.32397509 EPP 245 Statistical Analysis of Laboratory Data
> gvhd.glm <- glm(gvhd ~ donage+log(index),binomial) > summary(gvhd.glm) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -5.45399 2.08147 -2.620 0.00879 ** donage 0.14594 0.06465 2.257 0.02399 * log(index) 2.17773 0.78986 2.757 0.00583 **2.17773 +- (1.960)(0.78986) = (0.6296, 3.7259) 0.14594 +- (1.960)(0.06465) = (0.0192, 0.2727) > confint(gvhd.glm) Waiting for profiling to be done... 2.5 % 97.5 % (Intercept) -10.38957922 -1.9553765 donage 0.03262422 0.2946919 log(index) 0.89230108 4.0712645 EPP 245 Statistical Analysis of Laboratory Data