230 likes | 412 Views
D/RS 1013. Logistic Regression. Some Questions. Do children have a better chance of surviving a severe illness than adults? Can income, credit history, & education distinguish those who will repay a loan from those who will not?
E N D
D/RS 1013 Logistic Regression
Some Questions • Do children have a better chance of surviving a severe illness than adults? • Can income, credit history, & education distinguish those who will repay a loan from those who will not? • Are clients with high scores on a personality test more likely to respond to psychotherapy than clients with low scores? • Can scores on a math pretest predict who will pass or fail a course?
Answering these questions • Linear regression? • Why not? • Logistic regression answers same questions as discriminant, without assumptions about the data
Logistic regression • expect a nonlinear relationship • s shaped (sigmoidal) curve • curve never below zero or above 1 • predicted values interpreted as probability of group membership
Logistic Curve • math data, scores of1-11 on pretest,fail = 0pass = 1
Residuals • generally small, largest in middle of curve. • actual value-predicted value • pretest score of 5, who passed the test, • 1(actual value) - .21(predicted value) = .79(residual or estimation error). • two possible residual values for each value of predictor
Assumptions • outcomes on the DV are mutually exclusive and exhaustive • sample size recommendations range from 10-50 cases per IV • "too small sample" can lead to: • extremely high parameter estimates and standard errors • failure to converge
Assumptions (cont.) • either increase cases or decrease predictors • large samples required for maximum likelihood estimation
Testing the Overall Model • "constant only" model • no IVs entered • first -2 log likelihood • full model • all IVs entered • second -2 log likelihood • difference is the overall "model" Chi-square, if p<.05, the model provides classification power
Coefficients and Testing • natural log of the odds ratio associated with the variable • convert to odds by raising “e” to the B power • significance of each is tested via the associated Wald statistic • similar to t used to test coefficients in linear regression, p < .05 indicates that the coefficient is not zero
Coefficient Interpretation • interpret odds ratios, not actual coefficients, sign of the B coefficients gives us information • positive B coefficient: odds increase as predictor increases • negative B coefficient: odds decrease as predictor increases
Coefficient Interpretation (cont.) • take exp(B) converts coefficient to odds • change in odds associated with one unit increase in predictor • to see change with two unit increase in predictor • would multiply B by 2 prior to raising e to that power • would calculate e(2Bi)
The Logistic Model • where: Ŷi = estimated probability • u = A + BX (in our math example) • or more generally (multiple predictors) • u = A +B1X1+B2X2+…+BkXk (k=# predictors)
Applying the Model • math data, constant and intercept found to be: • A=-14.79 and B=2.69, • pretest score of 5, we want to find the probability of passing
Converting to Odds • p(target)/p(other) = • .2075/.7925 = .2618
Applying the Model (cont.) • Pretest score of 7, u = -14.79 + 2.69(7) = 4.04 • odds are .9827/.0173 = 56.8263
Crosschecking • 56.8263/.2618 = 217.03, which not coincidentally equals (within rounding error): • e2(2.69) = e5.38 = 217.022, since we moved 2 units-multiply B by 2 prior to finding exp(B).
Confidence Intervals for Coefficients • odds ratios for coefficients presented with 95% confidence intervals • if one is in the CI, coefficient is not statistically significant at the .05 level
Classification Table • Same idea as classification results (confusion matrix) in discriminant analysis. • Overall % accuracy=N(on diagonal)/total N • Sensitivity - % of target group accurately classified • Specificity - % of "other group" correctly classified
Final Points • general procedure • fit the model • remove ns predictors • rerun • reporting only significant predictors • cross-validation • generate/modify model with half-test the classification with other half