360 likes | 798 Views
Logistic Regression. November 2, 2004 Curtis A. Parvin, Ph.D. Associate Professor and Director of Informatics and Statistics Division of Laboratory Medicine Phone: 454-8699 email: parvin@wustl.edu. Regression.
E N D
Logistic Regression November 2, 2004 Curtis A. Parvin, Ph.D. Associate Professor and Director of Informatics and Statistics Division of Laboratory Medicine Phone: 454-8699 email: parvin@wustl.edu
Regression • Relate one or more independent (predictor) variables to a dependent (outcome) variable • Ordinary linear regression • Continuous outcome variable • Determine the relationship between a continuous outcome variable and the predictor variable(s) • Logistic regression • Binary outcome variable • Determine the relationship between the probability of the outcome occurring and the predictor variable(s)
Example: Relationship between gestational age at birth and whether an infant is breast feeding at time of hospital discharge
Probability, Odds, and the Logit Transform • Probabilities range between zero and one • Odds = P/(1-P) • Odds range between zero and infinity • Logit = ln(P/(1-P)) • The logit transform ranges between negative infinity and infinity
Logistic Regression • Model the logarithm of the odds of an outcome as a linear combination of predictor variables • Logit = ln(P/(1-P) = b0+b1X1+b2X2+. . . • Estimate the coefficients b0, b1, b2 based on a random sample of subjects’ data • Determine which of the predictors are “good” • Assess model fit • Use the model to predict future cases
Odds and Odds Ratios • Odds is the probability of an event occurring divided by the probability of the event not occurring • An odds ratio is the ratio of the odds for two different groups • An odds ratio = 1 implies equal risk in the two groups • Example: the calculated odds ratio for breast feeding at hospital discharge for GA=32 compared to GA=28 is 4.0/0.5 = 8.0
Logistic Regression Coefficients and Odds Ratios • If ln(P/(1-P)) = b0+b1X1+b2X2+. . ., then b1, b2, … are slope coefficients reflecting rates of change • ln(odds(X1+1)) – ln(odds(X1)) = b1 • ln(odds(X1+1)/odds(X1)) = b1 • odds(X1+1)/odds(X1) = exp(b1) • exp(b1) represents the odds ratio associated with a 1 unit increase in X1 • exp(k*b1) = odds ratio for a k unit increase in X1 • Breast feeding example: the odds of breast feeding at hospital discharge increase by a factor of exp(.577) = 1.78 for each additional week of GA
One Binary Outcome and One Binary Predictor • Case-Control Study • Disease • Cases Controls • Risk Yes a b • Factor No c d • Odds Ratio (OR)= a/c = a/b = ad • b/d c/d bc
Example: CHD and Age (Dichotomized at 55 Years) 2X2 Table calculation: OR = (21/22)/(6/51) = 8.11 Logistic Regression: ln(OR) = -0.841 + 2.094 * Age OR = exp(2.094) = 8.11
Multiple Predictor Variables • The independent variables (predictors, risk factors) can be categorical or continuous • Example: TDx-FLM II and gestational age as predictors of risk for respiratory distress syndrome (RDS) • TDx-FLM II measures mg surfactant/g of albumin in amniotic fluid
Logistic Regression Parameter Estimates ------------------------------------------------------------------------------ rds | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- tdxflm | -.1136873 .0159786 -7.11 0.000 -.1450048 -.0823699 ga | -.2912549 .1129665 -2.58 0.010 -.5126652 -.0698446 _cons | 12.8149 3.879407 3.30 0.001 5.211399 20.41839 ------------------------------------------------------------------------------ ln(P(RDS)/(1-P(RDS)) = 12.81 - 0.114*TDxFLM - 0.291*GA Odds Ratio for a 1 g/mg increase in TDxFLM: e-0.114 = 0.89 Odds Ratio for a 1 week increase in GA: e-0.291 = 0.75 ------------------------------------------------------------------------------ rds | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- tdxflm | .892537 .0142615 -7.11 0.000 .8650182 .9209313 ga | .7473252 .0844227 -2.58 0.010 .5988973 .9325387 ------------------------------------------------------------------------------
Using the Logistic Model to Predict Risk of RDS • We can use the logistic model equation to; • Identify variables that are significant predictors • calculate the absolute risk (probability) of RDS (may give biased estimates) • calculate the relative risk (odds ratio) of RDS • develop a classifier for diagnosing RDS
Logistic Regression Parameter Estimates ------------------------------------------------------------------------------ rds | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- tdxflm | -.1136873 .0159786 -7.11 0.000 -.1450048 -.0823699 ga | -.2912549 .1129665 -2.58 0.010 -.5126652 -.0698446 _cons | 12.8149 3.879407 3.30 0.001 5.211399 20.41839 ------------------------------------------------------------------------------ ln(P(RDS)/(1-P(RDS)) = 12.81 - 0.114*TDxFLM - 0.291*GA Odds Ratio for a 1 g/mg increase in TDxFLM: e-0.114 = 0.89 Odds Ratio for a 1 week increase in GA: e-0.291 = 0.75 ------------------------------------------------------------------------------ rds | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- tdxflm | .892537 .0142615 -7.11 0.000 .8650182 .9209313 ga | .7473252 .0844227 -2.58 0.010 .5988973 .9325387 ------------------------------------------------------------------------------
Absolute Risk of RDS based on TDX FLM II and gestational age (for RDS prevalence of 8.5%)
Odds ratios for RDS relative to a TDX FLM II ratio of 70 mg/g at 37 weeks gestational age
Logistic Regression Predicted Probabilities and Classification with 0.20 cutoff TDxFLM GA RDS Logistic P Classify 75 30 0 .0115517 0 TN 7 31 1 .9521286 1 TP 14.8 31 1 .8912354 1 TP 18.3 31 1 .8462539 1 TP 27 31 1 .6718219 1 TP 22 31 0 .7832782 1 FP 29 31 0 .6198854 1 FP 135 31 0 .0000095 0 TN 4 32 1 .9543484 1 TP 15 32 1 .8568574 1 TP 16.5 32 1 .8346432 1 TP 25 32 1 .6575863 1 TP 44.2 32 1 .1779585 0 FN 35.5 32 0 .3679177 1 FP 41 32 0 .2374989 1 FP 48 32 0 .1232235 0 TN 49 32 0 .1114575 0 TN 55.8 32 0 .0547323 0 TN 59 32 0 .0386864 0 TN 59 32 0 .0386864 0 TN
Other Prediction Methods • Artificial Neural Networks • Tu JV. Advantages and disadvantages of using artificial neural networks versus logistic regression for predicting medical outcomes. J Clin Epidemiol 1996;49:1225-31. • Linear or Quadratic Discriminant Analysis • Classification and Regression Trees (CART) • Multivariate Adaptive Regression Splines (MARS)
Other Flavors of Logistic Regression • Ordinal Logistic Regression • More than two ordered groups • Multinomial Logistic Regression • (Polychotomous, Polytomous, Discrete Choice) • More than two unordered groups • Conditional Logistic Regression • Matched pairs data (1:1 or 1:M matching)
Software Packages that perform Logistic Regression • STATA • SAS • SPSS • Others
References • Hosmer DW, Lemeshow S. Applied logistic regression, 2nd ed., New York, NY: John Wiley & Sons, 2000. • Kleinbaum DG. Logistic regression: a self-learning text. New York, NY: Springer-Verlag, 1994. • Bagley SC, White H, Golumb BA. Logistic regression in the medical literature: standards for use and reporting, with particular attention to one medical domain. J Clin Epidemiol 2001;54:979-85. • (http://www.sciencedirect.com/science/publications/journal) • Ostir GV, Uchida T. Logistic regression: a nontechnical review. Am J Phys Med Rehabil 2000;79:565-72. • (pdf file available online through Ovid gateway) • http://www.ioa.pdx.edu/newsom/pa551/lectur21.htm • http://personal.ecu.edu/whiteheadj/data/logit/