190 likes | 1.07k Views
Logistic Regression. In logistic regression the outcome variable is binary, and the purpose of the analysis is to assess the effects of multiple explanatory variables, which can be numeric and/or categorical, on the outcome variable. Requirements for Logistic Regression.
E N D
Logistic Regression In logistic regression the outcome variable is binary, and the purpose of the analysis is to assess the effects of multiple explanatory variables, which can be numeric and/or categorical, on the outcome variable.
Requirements for Logistic Regression The Following need to be specified: • An outcome variable with two possible categorical outcomes (1=success; 0=failure). • A way to estimate the probability P of the outcome variable. • A way of linking the outcome variable to the explanatory variables. • A way of estimating the coefficients of the regression equation, as well as their confidence intervals. • A way to test the goodness of fit of the regression model.
Measuring the Probability of Outcome The probability of the outcome is measured by the odds of occurrence of an event. If P is the probability of an event, then (1-P) is the probability of it not occurring. Odds of success = P / 1-P
The Logistic Regression The joint effects of all explanatory variables put together on the odds is Odds = P/1-P = e α+ β1X1 + β2X2 + …+βpXp Taking the logarithms of both sides Log{P/1-P} = log α+β1X1+β2X2+…+βpXp Logit P = α+β1X1+β2X2+..+βpXp The coefficients β1, β2, βp are such that the sums of the squared distance between the observed and predicted values (i.e. regression line) are smallest.
The Logistic Regression Logit p = α+ β1X1 +β2X2 + .. + βpXp α represents the overall disease risk β1 represents the fraction by which the disease risk is altered by a unit change in X1 β2 is the fraction by which the disease risk is altered by a unit change in X2 ……. and so on. What changes is the log odds. The odds themselves are changed by eβ If β= 1.6 the odds are e1.6 = 4.95
Analysis in Logistic Regression - 1 The study to be analysed is about the use of radioisotope thallium while the subject is made to exercise. 100 subjects underwent both thallium exercise and cardiac catheterisation. Some were on propranol. Change in heart rate if more than 85% of maximum, E.C.G. and occurrence of pain during exercise were recorded.
Interpreting the Computer Printout Logistic Regression Table Odds 95% CI Predictor Coef SE Coef Z P Ratio Lower Upper Constant -0.9349 0.5165 -1.81 0.070 Stn 0.03080 0.01482 2.08 0.038 1.03 1.00 1.06 Propran 0.6000 0.4844 1.24 0.215 1.82 0.71 4.71 HrtRate -0.4234 0.4735 -0.89 0.371 0.65 0.26 1.66 IscExr -0.6322 0.6601 -0.96 0.338 0.53 0.15 1.94 Sex -0.2996 0.4780 -0.63 0.531 0.74 0.29 1.89 PnExr 0.6953 0.4009 1.73 0.083 2.00 0.91 4.40 Log-Likelihood = -57.650 Test that all slopes are zero: G = 12.907, DF = 6, P-Value = 0.045 Goodness-of-Fit Tests Method Chi-Square DF P Pearson 60.350 38 0.012 Deviance 66.811 38 0.003 Hosmer-Lemeshow 14.243 6 0.027
Interpreting the Computer Printout - 2 Table of Observed and Expected Frequencies: (See Hosmer-Lemeshow Test for the Pearson Chi-Square Statistic) Group Value 1 2 3 4 5 6 7 8 Total 1 Obs 2 1 2 5 5 6 9 4 34 Exp 2.4 4.1 2.9 3.1 3.7 5.0 5.7 7.0 0 Obs 15 18 9 5 5 6 2 6 66 Exp 14.6 14.9 8.1 6.9 6.3 7.0 5.3 3.0 Pairs Number Percent Summary Measures Concordant 1664 74.2% Somers' D 0.51 Discordant 524 23.4% Goodman-Kruskal Gamma 0.52 Ties 56 2.5% Kendall's Tau-a 0.23 Total 2244 100.0%
Regression Diagnostics • In logistic regression Residual = 1− Estimated probability. Residuals for each subject are calculated standardised and plotted against probability. Eight diagnostic plots are available, four dealing with residuals and four with leverage. • These plots are demonstrated in the slides that follow.