1 / 60

Logistic Regression Analysis

Logistic Regression Analysis. Gerrit Rooks 30-03-10. This lecture. Why do we have to know and sometimes use logistic regression ? What is the model? What is maximum likelihood estimation ? Logistics of logistic regression analysis Estimate coefficients Assess model fit

Download Presentation

Logistic Regression Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LogisticRegressionAnalysis Gerrit Rooks 30-03-10

  2. Thislecture • Why do we have to know and sometimesuselogisticregression? • What is the model? What is maximum likelihoodestimation? • Logistics of logisticregressionanalysis • Estimatecoefficients • Assess model fit • Interpretcoefficients • Check residuals • An SPSS example

  3. Suppose we have 100 observationswithinformationaboutanindividualsage and wetherornotthisindivual had some kind of a heartdisease (CHD)

  4. A graphicrepresentation of the data

  5. Suppose, as a researcher I aminterested in the relationbetweenage and the probability of CHD

  6. To try to predict the probability of CHD, I canregress CHD onAge pr(CHD|age) = -.54 +.0218107*Age

  7. However, linearregression is not a suitable model forprobalities. pr(CHD|age) = -.54 +.0218107*Age

  8. In thisgraphfor 8 agegroups, I plotted the probability of having a heartdisease (proportion)

  9. Instead of a linearprobality model, I need a non-linearone

  10. Somethinglikethis

  11. This is the logisticregression model

  12. Predictedprobabilities are alwaysbetween 0 and 1 similar to classic regression analysis

  13. Logistics of logisticregression • How do we estimate the coefficients? • How do we assess model fit? • How do we interpret coefficients? • How do we check regression assumptions?

  14. Logistics of logisticregression • How do we estimate the coefficients? • How do we assess model fit? • How do we interpret coefficients? • How do we check regression? assumptions ?

  15. Maximum likelihoodestimation • Method of maximum likelihoodyieldsvaluesfor the unknown parameters whichmaximize the probability of obtaining the observed set of data. Unknown parameters

  16. Maximum likelihoodestimation • First we have to construct the likelihoodfunction (probability of obtaining the observed set of data). Likelihood = pr(obs1)*pr(obs2)*pr(obs3)…*pr(obsn) Assumingthatobservations are independent

  17. The likelihoodfunction (for the CHD data) Giventhat we have 100 observations I summarize the function

  18. Log-likelihood • For technicalreasons the likelihood is transformed in the log-likelihood LL= ln[pr(obs1)]+ln[pr(obs2)]+ln[pr(obs3)]…+ln[pr(obsn)]

  19. The likelihoodfunction (for the CHD data) A cleveralgorithmgivesusvaluesfor the parameters b0 and b1 thatmaximize the likelihood of this data

  20. Estimation of coefficients: SPSS Results

  21. Thisfunction fits verygood, othervalues of b0 and b1 giveworseresults

  22. Illustration 1: suppose we chose .05X instead of .11X

  23. Illustration 2: suppose we chose .40X instead of .11X

  24. Logistics of logisticregression • Estimate the coefficients • Assess model fit • Interpret coefficients • Check regression assumptions

  25. Logistics of logisticregression • Estimate the coefficients • Assess model fit • Between model comparisons • Pseudo R2 (similar to multiple regression) • Predictiveaccuracy • Interpret coefficients • Check regression assumptions

  26. Model fit: Between model comparison The log-likelihood ratio test statistic can be used to test the fit of a model full model reducedmodel The test statistic has a chi-square distribution

  27. Between model comparisons: likelihood ratio test full model reducedmodel The model includingonlyanintercept Is oftencalled the empty model. SPSS usesthis model as a default.

  28. Between model comparisons: Test canbeusedforindividualcoefficients full model reducedmodel

  29. Between model comparison: SPSS output This is the test statistic, and it’sassociated significance 29.31 = -107,35 – 2LL(baseline)  -2LL(baseline) = 136,66

  30. Just like in multiple regression, pseudo R2 ranges 0.0 to 1.0 Cox and Snell cannottheoreticallyreach 1 Nagelkerke adjustedsothatitcanreach 1 Overall model fitpseudo R2 log-likelihood of the model that you want to test log-likelihood of model before any predictors were entered NOTE: R2 in logistic regression tends to be (even) smaller than in multiple regression

  31. Overall model fit: Classificationtable We correctlypredict 74% of ourobservation

  32. Overall model fit: Classificationtable 14 cases had a CHD whileaccording to our model thisshouldnt have happened.

  33. Overall model fit: Classificationtable 12 cases didnt have a CHD whileaccording to our model thisshould have happened.

  34. Logistics of logisticregression • Estimate the coefficients • Assess model fit • Interpret coefficients • Check regression assumptions

  35. Logistics of logisticregression • Estimate the coefficients • Assess model fit • Interpret coefficients • Direction • Significance • Magnitude • Check regression assumptions

  36. Interpreting coefficients: direction We canrewriteour LRM as follows: into:

  37. Interpreting coefficients: direction original b reflects changes in logit: b>0 -> positive relationship exponentiated b reflects the changes in odds: exp(b) > 1 -> positive relationship 39

  38. Interpreting coefficients: direction We canrewriteour LRM as follows: into:

  39. Interpreting coefficients: direction original b reflects changes in logit: b>0 -> positive relationship exponentiated b reflects the changes in odds: exp(b) > 1 -> positive relationship 41

  40. Testing significance of coefficients • In linear regression analysis this statistic is used to test significance • In logistic regression something similar exists • however, when b is large, standard error tends to become inflated, hence underestimation (Type II errors are more likely) estimate t-distribution standard error of estimate Note: This is not the WaldStatistic SPSS presents!!!

  41. Interpreting coefficients: significance SPSS presents While Andy Field thinks SPSS presents this:

  42. 3. Interpreting coefficients: magnitude The slope coefficient (b) is interpreted as the rate of change in the "log odds" as X changes … not very useful. exp(b) is the effect of the independent variable on the odds, more useful for calculating the size of an effect 44

  43. Magnitude of association: Percentage change in odds (Exponentiatedcoefficienti- 1.0) * 100

  44. For our age variable: Percentage change in odds = (exponentiated coefficient – 1) * 100 = 12% A one unit increase in previous will result in 12% increase in the odds that the person will have a CHD So if a soccer player is one year older, the odds that (s)he will have CHD is 12% higher Magnitude of association 46

  45. Anotherway: Calculatingpredictedprobabilities So, forsomebody 20 yearsold, the predictedprobability is .04 For somebody 70 yearsold, the predictedprobability is .91

  46. Checking assumptions • Influential data points & Residuals • FollowSamanthas tips • Hosmer & Lemeshow • Divides sample in subgroups • Checkswhetherthere are differencesbetweenobserved and predictedbetweensubgroups • Test shouldnotbe significant, ifso: indication of lack of fit

  47. Hosmer & Lemeshow Test divides sample in subgroups, checkswhetherdifferencebetweenobserved and predicted is aboutequal in these groups Test shouldnotbe significant (indicatingnodifference)

  48. Examiningresiduals in lR • Isolatepointsforwhich the model fits poorly • Isolateinfluential data points

More Related