1 / 14

Logistic Regression & Discriminant Analysis

Logistic Regression & Discriminant Analysis. If the dependent variable Z is categorical rather than continuous, the population regression models become non-linear. The two primary non-linear models used are logistic regression and linear discriminant analysis.

Download Presentation

Logistic Regression & Discriminant Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Logistic Regression & Discriminant Analysis • If the dependent variable Z is categorical rather than continuous, the population regression models become non-linear. The two primary non-linear models used are logistic regression and linear discriminant analysis. • Logistic regression – also known as logit analysis when Xs are categorical – and known as multinomial logit analysis when Z contains more than 2 categories. • Discriminant analysis – usually linear discriminant analysis (LDA) is used, occasionally quadratic discriminant analysis (QDA).

  2. Dichotomous Dependent Variable • Here we provide some basics for logistic regression and discriminant analysis for dependent variable Z dichotomous, say Z = 1,0. The conditional expectation of Z given X then simplifies to the probability that Z=1: • E(Z|X) = prob(Z=1|X)*1 + prob(Z=0|X)*0 = prob(Z=1|X) • Since probabilities are bounded, the linear regression model Prob(Z=1|X)=XB is no longer appropriate.

  3. PROB 1.0 XB = (probability) 0 Note: Probabilities can be less than 0 or greater than 1 as estimated by OLS or GLS.

  4. Logit or Log-Odds Units In dealing with dichotomous dependent variables it is useful to work in units of odds (and log-odds) rather than probabilities in order to express the non-linear model in a linear form. Some definitions: Some useful equations for logistic distribution: PROB(Z=1|X) = eXB / (1+ eXB) PROB(Z=0|X) = 1/ (1+ eXB) Thus,

  5. Logistic Regression PROB(Z=1|X) = eXB / (1+ eXB) 1.0 Logistic regression is a non-linear probability model but linear (as a logit model) when re-expressed in logit (log-odds) units. 0.5 XB = (logit) 0 Prob = .5 translates into Odds = 1.0 or Logit = 0

  6. Correct Classification Rates: Classification Tables Measure Accuracy of Prediction Given the predicted logit score Li for respondent i from a given model, predict (classify) that respondent as follows: Predict Z=1 if Li > c, Predict Z=0 if Li≤ c Where cut-point c = 0 usually (corresponding to predicted prob = .5) Simulated data example: Models estimated on training data (N1 = N2 = 25), and applied to Validation data (N1 = N2 = 2,500). Sensitivity = 92.2% P(Li > c|ZPC1 = 1) Specificity = 91.8% P(Li≤ c|ZPC1 = 0) Accuracy = (2295 + 2305)/5000 = 92% AUC = .978

  7. Area Under Curve (AUC) Statistic ROC curve plots the sensitivity and specificity for every possible cut-point. Irrelevant model score yields diagonal ROC (green line), with AUC = .5. AUC represents the probability that a randomly selected case from group 1 will have a higher model score than a randomly selected case from group 2. Above, AUC = .978 for blue ROC curve. Cut-point = 0 Sensitivity = 92.2% Specificity = 91.8%

  8. Linear Discriminant Analysis • Alternative approach to estimate regression coefficients in logistic regression model • Appropriate when Xs are treated as random and continuous.

  9. Assumptions Made in Linear Discriminant Analysis (LDA) • The vector of predictor variables X follows a multivariate normal distribution within each group Z=1 & Z=0 • The variance-covariance matrix  is identical within each Z group • Note: If  is not identical within each group, get quadratic equation (QDA)

  10. Comparison of LDA vs. Logit Analysis • Under LDA assumptions it follows that: • the discriminant function is linear in X • predicted probability of Z=1 follows logistic distribution • coefficient estimates more efficient than logistic regression approach. • However, if X contains some dichotomous (or polytomous) variables, the LDA may result in serious biases, since • the discriminant function will in general not be linear but will include interaction terms.

  11. Logistic Regression Alternative • Estimate the probabilities of group membership directly (bypass the discriminant function) • No assumption of normality • Recommended by some statisticians over discriminant analysis even when X is multivariate normal • If all predictors are categorical, log-linear modeling software can be used (“logit analysis”)

  12. Numerical problems • Collinearity and near-collinearity can occur in logistic regression and LDA just as in ordinary regression. • Logistic regression also has another problem called (perfect) separation.

  13. Data Illustrating Perfect Separation X treated as continuous X values below 6 all indicate Y=1 X values at or above 6 all indicate Y=1

  14. Fitted Model Estimates for B are not unique. ML estimate for B is –infinity. ‘Failure to converge’ warning from SPSS Logistic Regression. Latent GOLD output produces a warning message telling you that the solution is not identifiable (i.e., not unique): “1-Class Ordinal Regression Model Estimation Warnings! See Iteration Detail”

More Related