1 / 27

Logistic regression

Logistic regression. Who survived Titanic?. The sinking of Titanic. Titanic sank April 14th 1912 with 2228 souls 705 survived. A dataset of 1309 passengers survived. Who survived?. The data. Sibsp is the number of siblings and/or spouses accompanying

dareh
Download Presentation

Logistic regression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Logistic regression Who survived Titanic?

  2. The sinking of Titanic • Titanic sank April 14th 1912 with 2228 souls 705 survived. • A dataset of 1309 passengers survived. • Who survived?

  3. The data • Sibsp is the number of siblings and/or spouses accompanying • Parsc is the number of parents and/or children accompanying • Some values are missing • Can we predict who will survive titanic II?

  4. Analyzing the data in a (too) simple manner • Associations between factors without considering interactions

  5. Analyzing the data in a (too) simple manner • Associations between factors without considering interactions

  6. Analyzing the data in a (too) simple manner • Associations between factors without considering interactions

  7. Analyzing the data in a (too) simple manner • Associations between factors without considering interactions

  8. Analyzing the data in a (too) simple manner • Associations between factors without considering interactions

  9. Could we use multiple linear regression to predict survival?

  10. Logit transformation is modeled linearly • The logistic function

  11. The sigmodal curve

  12. The sigmodal curve • The intercept basically just ‘scale’ the input variable

  13. The sigmodal curve • The intercept basically just ‘scale’ the input variable • Large regression coefficient → risk factor strongly influences the probability

  14. The sigmodal curve • The intercept basically just ‘scale’ the input variable • Large regression coefficient → risk factor strongly influences the probability • Positive regression coefficient →risk factor increases the probability

  15. Logistic regression of the Titanic data

  16. Logistic regression of the Titanic data • Summary of data • Coding of the dependent variable • Coding of the categorical explanatory variable: • First class: 1 • Second class: 2 • Third class: reference

  17. Logistic regression of the Titanic data • A fit of the null-model, basically just the intercept. Usually not interesting • The total probability of survival is 500/1309 = 0.382. Cutoff is 0.5 so all are classified as non-survivers. • Basically tests if the null-model is sufficient. It almost certainly is not. • Shows that survival is related to pclass (which is not in the null-model)

  18. Logistic regression of the Titanic data • Omnibus test: Uses LR to describe if the adding the pclass variable to the model makes it better. It did! But better than the null-model, so no surprise. • Model Summary. Other measures of the goodness of fit. • Classification table: By including pclass 67.7 passengers were correctly categorized. • Variables in the equation: first line repeats that pclass has a significant effect on survival. B is the logistic fittet parameter. Exp(B) is the odds rations, so the odds of survival is 4.7 (3.6-6.3) times higher than passengers on third class (reference class)

  19. Logistic regression of the Titanic data now adding family relations • ‘3 or more’ is set as reference groups by SPSS

  20. Logistic regression of the Titanic data now adding family relations • The model correctly classify 79.1% of the passengers

  21. Logistic regression of the Titanic data now adding family relations • Basically all factors seems to affect the probability of survival.

  22. How was it with age? • Linear associations are easy to model, because the factor enters the predictive value directly. • But it is not really look linear, maybe a third order polynomial? • Three new factors for age is calculated: first, second, and third order of the age divided by the standard diviation.

  23. How was it with age? • The third-order age factor did not add significantly to the model. • By adding third order polynomial the model can correctly categorize 79.4 vs 79.1 before. • ParChild is no longer a significant factor and can be omitted from the model

  24. Using the model to predict survival • Omitting the second and third order age and ParChild factors • What is the probability that a 25 year old woman accompanied only by her husband holding a second class ticket would survive Titanic? • z = -3.929 • -0.589*(-5)/14.41 • +1.718 • +2.552 • +0.926 = 1.4714

  25. Analysing interaction of selected factors • pclass * sex, age * sex, pclass * Siblings/Parents • But the model does not converge…

  26. Analysing interaction of selected factors • Collapsing the sibling/spouse number eradicated their mutual interaction

  27. Is it realistic that Leonardo survives and the chick dies?

More Related