1 / 21

Introduction to Logistic Regression Analysis

Introduction to Logistic Regression Analysis. Dr Tuan V. Nguyen Garvan Institute of Medical Research Sydney, Australia. Introductory example 1.

napua
Download Presentation

Introduction to Logistic Regression Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Logistic Regression Analysis Dr Tuan V. Nguyen Garvan Institute of Medical Research Sydney, Australia

  2. Introductory example 1 • Gender difference in preference for white wine. A group of 57 men and 167 women were asked to make preference for a new white wine. The results are as follows: Question: Is there a gender effect on the preference ?

  3. Introductory example 2 Fat concentration and preference. 435 samples of a sauce of various fat concentration were tasted by consumers. There were two outcome: like or dislike. The results are as follows: Question: Is there an effect of fat concentration on the preference ?

  4. Consideration … • The question in example 1 can be addressed by “traditional” analysis such as z-statistic or Chi-square test. • The question in example 2 is a bit difficult to handle as the factor (fat concentration ) was a continuous variable and the outcome was a categorical variable (like or dislike) • However, there is a much better and more systematic method to analysis these data: Logistic regression

  5. Odds and odds ratio • Let P be the probability of preference, then the odds of preference is: O = P / (1-P) • Omen = 0.403 / 0.597 = 0.676 • Owomen = 0.209 / 0.791 = 0.265 Odds ratio:OR = Omen / Owomen = 0.676 / 0.265 = 2.55 (Meaning: the odds of preference is 2.55 times higher in men than in women)

  6. Meanings of odds ratio • OR > 1: the odds of preference is higher in men than in women • OR < 1: the odds of preference is lower in men than in women • OR = 1: the odds of preference in men is the same as in women • How to assess the “significance” of OR ?

  7. Computing variance of odds ratio • The significance of OR can be tested by calculating its variance. • The variance of OR can be indirectly calculated by working with logarithmic scale: • Convert OR to log(OR) • Calculate variance of log(OR) • Calculate 95% confidence interval of log(OR) • Convert back to 95% confidence interval of OR

  8. Computing variance of odds ratio • OR = (23/34)/ (35/132) = 2.55 • Log(OR) = log(2.55) = 0.937 • Variance of log(OR): V = 1/23 + 1/34 + 1/35 + 1/132 = 0.109 • Standard error of log(OR) SE = sqrt(0.109) = 0.330 • 95% confidence interval of log(OR) 0.937 + 0.330(1.96) = 0.289 to 1.584 • Convert back to 95% confidence interval of OR Exp(0.289) = 1.33 to Exp(1.584) = 4.87

  9. Logistic analysis by R sex <- c(1, 2) like <- c(23, 35) dislike <- c(34, 132) total <- like + dislike prob <- like/total logistic <- glm(prob ~ sex, family=”binomial”, weight=total) > summary(logistic) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 0.5457 0.5725 0.953 0.34044 sex -0.9366 0.3302 -2.836 0.00456 ** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 7.8676e+00 on 1 degrees of freedom Residual deviance: 2.2204e-15 on 0 degrees of freedom AIC: 13.629

  10. Logistic regression model for continuous factor

  11. Analysis by using R conc <- c(1.35, 1.60, 1.75, 1.85, 1.95, 2.05, 2.15, 2.25, 2.35) like <- c(13, 19, 67, 45, 71, 50, 35, 7, 1) dislike <- c(0, 0, 2, 5, 8, 20, 31, 49, 12) total <- like+dislike prob <- like/total plot(prob ~ conc, pch=16, xlab="Concentration")

  12. Logistic regression model for continuous factor - model • Let p = probability of preference • Logit of p is: Model: Logit(p) = a + b(FAT) where a is the intercept, and b is the slope that have to be estimated from the data

  13. Analysis by using R logistic <- glm(prob ~ conc, family="binomial", weight=total) summary(logistic) Deviance Residuals: Min 1Q Median 3Q Max -1.78226 -0.69052 0.07981 0.36556 1.36871 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 22.708 2.266 10.021 <2e-16 *** conc -10.662 1.083 -9.849 <2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 198.7115 on 8 degrees of freedom Residual deviance: 8.5568 on 7 degrees of freedom AIC: 37.096

  14. Logistic regression model for continuous factor – Interpretation • The odds ratio associated with each 0.1 increase in fat concentration was 2.90 (95% CI: 2.34, 3.59) • Interpretation: Each 0.1 increase in fat concentration was associated with a 2.9 odds of disliking the product. Since the 95% confidence interval exclude 1, this association was statistically significant at the p<0.05 level.

  15. Multiple logistic regression Fracture (0=no, 1=yes) Dependent variables: age, bmi, bmd, ictp, pinp Question: Which variables are important for fracture? id fx age bmi bmd ictp pinp 1 1 79 24.7252 0.818 9.170 37.383 2 1 89 25.9909 0.871 7.561 24.685 3 1 70 25.3934 1.358 5.347 40.620 4 1 88 23.2254 0.714 7.354 56.782 5 1 85 24.6097 0.748 6.760 58.358 6 0 68 25.0762 0.935 4.939 67.123 7 0 70 19.8839 1.040 4.321 26.399 8 0 69 25.0593 1.002 4.212 47.515 9 0 74 25.6544 0.987 5.605 26.132 10 0 79 19.9594 0.863 5.204 60.267 ... 137 0 64 38.0762 1.086 5.043 32.835 138 1 80 23.3887 0.875 4.086 23.837 139 0 67 25.9455 0.983 4.328 71.334

  16. Multiple logistic regression: R analysis setwd(“c:/works/stats”) fracture <- read.table(“fracture.txt”, header=TRUE, na.string=”.”) names(fracture) fulldata <- na.omit(fracture) attach(fulldata) temp <- glm(fx ~ ., family=”binomial”, data=fulldata) search <- step(temp) summary(search)

  17. Bayesian Model Average (BMA) analysis Library(BMA) xvars <- fulldata[, 3:7] y <- fx bma.search <- bic.glm(xvars, y, strict=F, OR=20, glm.family="binomial") summary(bma.search) imageplot.bma(bma.search)

  18. Bayesian Model Average (BMA) analysis > summary(bma.search) Call: Best 5 models (cumulative posterior probability = 0.8836 ): p!=0 EV SD model 1 model 2 model 3 model 4 model 5 Intercept 100 -2.85012 2.8651 -3.920 -1.065 -1.201 -8.257 -0.072 age 15.3 0.00845 0.0261 . . . 0.063 . bmi 21.7 -0.02302 0.0541 . . -0.116 . -0.070 bmd 39.7 -1.34136 1.9762 . -3.499 . . -2.696 ictp 100.0 0.64575 0.1699 0.606 0.687 0.680 0.554 0.714 pinp 5.7 -0.00037 0.0041 . . . . . nVar 1 2 2 2 3 BIC -525.044 -524.939 -523.625 -522.672 -521.032 post prob 0.307 0.291 0.151 0.094 0.041

  19. Bayesian Model Average (BMA) analysis > imageplot.bma(bma.search)

  20. Summary of main points • Logistic regression model is used to analyze the association between a binary outcome and one or many determinants. • The determinants can be binary, categorical or continuous measurements • The model is logit(p) = log[p / (1-p)] = a + bX, where X is a factor, and a and b must be estimated from observed data.

  21. Summary of main points • Exp(b) is the odds ratio associated with an increment in the determinant X. • The logistic regression model can be extended to include many determinants: logit(p) = log[p / (1-p)] = a + bX1 + gX2 + dX3 + …

More Related