1 / 27

Logistic Regression

Discover the fundamentals of logistic regression modeling with dichotomous dependent variables. Learn how to estimate probabilities and interpret odds ratios in predicting outcomes. This approach is essential when traditional OLS regression falls short.

rtalmage
Download Presentation

Logistic Regression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Logistic Regression Modeling with Dichotomous Dependent Variables

  2. A New Type of Model… • Dichotomous Dependent Variable: • Why did someone vote for Bush or Kerry? • Why did residents own or rent their houses? • Why do some people drink alcohol and others don’t? • What determined if a household owned a car?

  3. Dependent Variable… • Is binary, with a yes or a no answer • Can be coded, 1 for yes and 0 for no. • There are no other valid responses.

  4. Problem: OLS Regression does not model the relationship well

  5. Solution: Use a Different Functional Form • The properties we need: • The model should be bounded by 0 and 1 • The model should estimate a value for the dependent variable in terms of the probability of being in one category or the other, e.g., a owner or renter; or a Bush voter or Kerry voter

  6. Solution, cont. • We want to know the probability, p, that a particular case falls in the 0 or the 1 category. • We want to derive a model which gives good estimates of 0 and 1, or put another way, that a particular case is likely to be a 0 or a 1.

  7. Solution: A Logistic Curve

  8. The Logistic Function • Probability that a case is a 0 or a 1 is distributed according to the logistic function.

  9. Remember probabilities… • Probabilities range from 0 to 1. • Probability: frequency of being in one category relative to the total of all categories. • Example: The probability that the first card dealt in a card game is a queen of hearts is 1/52 (one in 52). • It does us no good to “predict” a value of .5 as in the linear regression model.

  10. But can we manipulate probabilities to estimate the logistic function? • Steps: • Convert probabilities to odds ratios • Convert odds ratios to log odds or logits

  11. Manipulating probabilities to estimate the logistic function LIST V2 V3 V4 V5 /N=13 Case number P 1-P P/1-P ln(P/1-P) 1 0.010 0.990 0.010 -4.595 2 0.050 0.950 0.053 -2.944 3 0.100 0.900 0.111 -2.197 4 0.200 0.800 0.250 -1.386 5 0.300 0.700 0.429 -0.847 6 0.400 0.600 0.667 -0.405 7 0.500 0.500 1.000 0.000 8 0.600 0.400 1.500 0.405 9 0.700 0.300 2.333 0.847 10 0.800 0.200 4.000 1.386 11 0.900 0.100 9.000 2.197 12 0.950 0.050 19.000 2.944 13 0.990 0.010 99.000 4.595

  12. Logistic Function

  13. Logistic Function

  14. Steps…. • Log odds = a + bx • Odds ratio = Exponentiate (a + bx) • Probability is distributed according to the logistic function

  15. An Example • Determinants of Homeownership: • Age of the householder • Age of the householder squared • Building Type • Year house was built • Householder’s Ethnicity • Occupational status scale

  16. Calculating the Model • Maximum Likelihood Estimation (not OLS) • Estimates of the b’s, standard errors, t ratios and p values for coefficients • Coefficients are estimates of the impact of the independent variable on the logit of the dependent variable

  17. Logistic Regression Model • Parameter Estimate S.E. t-ratio p-value • 1 CONSTANT -6.976 1.501 -4.647 0.000 • 2 AGE 0.250 0.060 4.132 0.000 • 3 AGESQ -0.002 0.001 -3.400 0.001 • 4 BLDGTYP2$_cottage 0.036 0.277 0.131 0.895 • 5 BLDGTYP2$_duplex -1.432 0.328 -4.363 0.000 • 6 YEAR 0.061 0.022 2.757 0.006 • 7 GERMAN 0.706 0.264 2.677 0.007 • 8 POLISH 0.777 0.422 1.841 0.066 • 9 OCCSCALE 0.190 0.091 2.074 0.038

  18. Logistic Regression model, cont. • Parameter Odds Ratio Upper Lower • 2 AGE 1.284 1.445 1.140 • 3 AGESQ 0.998 0.999 0.997 • 4 BLDGTYP2$_cottage 1.037 1.784 0.603 • 5 BLDGTYP2$_duplex 0.239 0.454 0.125 • 6 YEAR 1.063 1.109 1.018 • 7 GERMAN 2.026 3.398 1.208 • 8 POLISH 2.175 4.972 0.951 • 9 OCCSCALE 1.209 1.446 1.011 • Log Likelihood of constants only model = LL(0) = -303.864 • 2*[LL(N)-LL(0)] = 85.180 with 8 df Chi-sq p-value = 0.000 • McFadden's Rho-Squared = 0.140

  19. Converting Odds Ratios to Probabilities • Odds ratio = P/1-P. • For Germans, compared with the omitted category (Americans and other ethnicities) controlling for other variables, 2.026 = P/(1-P) • Germans are more likely to own houses than Americans. • Can we be more specific?

  20. Calculating Probability of a Case • Log odds of homeownership = -6.976 + .250Age - .002Agesquared + .036 cottage – 1.432 duplex + .061Year + .706 German + .777 Polish + .190 occscale • Plug in values and solve the equation. • Exponentiate the result to create the odds • Convert the odds to a probability for the case.

  21. Calculations • Log odds of homeownership = -6.976 + .250Age - .002Agesquared + .036 cottage – 1.432 duplex + .061Year + .706 German + .777 Polish + .190 occscale • For a 40 year old skilled, American born worker, living in a residence built in 1892: • Log odds of homeownership = -6.976 + .250*40 - .002*1600 + .061* 5 + .190*3 • Log odds = .699

  22. Calculations, cont. • log odds = .699 • odds = anti log or exponentiation of.699 = 2.012 • odds = P/(1-P) = 2.012 • Solve for P. The result is .67.

  23. More calculations…. • How about a 40 year old German skilled worker in an 1892 residence? • Log odds of homeownership = -6.976 + .250Age - .002Agesquared + .036 cottage – 1.432 duplex + .061Year + .706 German + .777 Polish + .190 occscale • Log odds = -6.976 + .250*40 - .002*1600 + .061* 5 + .706 + .190*3 = 1.405 • Note as well that .699 + .706 = 1.405. • Note as well that .699 * 2.026 (or the odds ratio for the variable “German”) = 1.405

  24. More calculations • Convert the log odds to odds, e.g., take the antilog of 1.405 = 4.076. • Odds = 4.076 = P/(1-P). • Solve for P. P = .803. • So the probability of the increase in home ownership between Americans and Germans is from .67 to .803 or about 13%.

  25. More calculations • For a 30 year old American worker in a residence built in 1892: • Log odds = -6.976 + .250*30 - .002*900 + .061*5 + .190*3 = -0.401 • Odds = Antilog of (-.401) = 0.670 • Probability of ownership = .670/1.670 = 0.401

  26. Classification Table • Model Prediction Success Table • Actual Predicted Choice Actual • Choice Response Reference Total • Response 281.647 85.353 367.000 • Reference 85.353 58.647 144.000 • Pred. Tot. 367.000 144.000 511.000 • Correct 0.767 0.407 • Success Ind. 0.049 0.125 • Tot. Correct 0.666 • Sensitivity: 0.767 Specificity: 0.407 • False Reference: 0.233 False Response: 0.593

  27. Extending the Logic… • Logistic Regression can be extended to more than 2 categories for the dependent variable, for multi response models • Classification Tables can be used to understand misclassified cases • Results can be analyzed for patterns across different values of the independent variables.

More Related