390 likes | 550 Views
Logistic Regression. Chongming Yang Research Support Center FHSS College. Rules of Logarithm. Log ( uv ) = Log (u) + Log (v) Log (u/v) = Log (u) - Log (v) Log ( u ) v = v Log (u). Rules of Exponentiation (0<a<1). a m a n = a m + a n a m /a n = a m – a n
E N D
Logistic Regression Chongming Yang Research Support Center FHSS College
Rules of Logarithm • Log (uv) = Log (u) + Log (v) • Log (u/v) = Log (u) - Log (v) • Log (u)v = v Log (u)
Rules of Exponentiation(0<a<1) • aman = am + an • am/an = am – an • (am)n = amn
Exponential & Logarithmic • Inverse of One Another • Y = ax • X = Loga(y)
Assumptions of Linear Regression Yi = + Xi + i Yi continuous & unbounded expected or mean (i)= 0 I = normally distributed not correlated with predictors Absence of perfect multicollinearity No measurement error in all variables
Violation of LR Assumptions Dichotomous Dependent Variable (DV) Unordered Categorical (Nominal) DV Ordered Categorical (Ordinal) DV
Natural Logarithmic Transformation(Binary DV) Let p = probability of an event
Interpretation of Coefficients(odds ratio) Dichotomous predictor X1: The predicted odds of a positive response for group A is ? times the odds for the group B. The odds of a positive response for group a is ?% higher than the odds for group B. Continuous predictor X2: One unit increase is associated with ?% increase in the predicted odds of X
Interpretation • See Handout
Interpretation of Interaction Definition: The effect of a covariate depends on the level of another covariate. Interpretation: Plug in some values of two variables Plot estimated logit Interpret interaction effect only when main effects is present
Likelihood Ratio Test of 0, 1… Likelihood Ratio Test = Deviance = -2log (likelihood of fitted model / likelihood of Saturated model) likelihood of Saturated model=1 Deviance = -2log (likelihood of fitted model)
2Test of 0, 1… 1. 2 =-2Ln(likelihood of without x )/ (likelihood model with x) 2. Degree of Freedom = j - (p+1) where j = (# of Categories) + (# of continuous variables) p = # of parameters,
Hosmer-Lemeshow Test(2) (grouping percentile of estimated p) Where g = 10, k = 1..10, n' = number of subjects in kth group, ck= # of covariate patterns, p¯ = average estimated probability, df= g-2
Wald Test of 0, 1… W = / se() (se = standard error) Normal Distribution test
Multinomial Logistic Regression(non-ordered categorical DV) P = probability of a response category Pi1 + Pi2 + Pi3 = 1
Interpretation • See handout
Ordinal Logistic Models Adjacent Category Model Compare two adjacent categories
Adjacent Categories Model Let j be an ordinal scale j = 1… j & j+1 = two adjacent categories Model
Practice • Run Logistic Regression Using ‘binary.sav’ • DV = Admit • IV = gre, gpa, rank • Annotated output: http://www.ats.ucla.edu/stat/spss/dae/logit.htm
Pseudo R-squared(based on Likelihood) • Explained Variability • Improvement from null model to fitted model • Square of correlation (predicted and observed)
Psudo R Square • Cox & Snell • Improvement of full model over intercept model • Nagelkerke • Improvement of full model over intercept model • McFadden • adjusted R-squared in OLS • penalizing a model with too many predictors http://www.ats.ucla.edu/stat/mult_pkg/faq/general/Psuedo_RSquareds.htm
Practice (continued) • Run Multinomial Logistic Regression Using ‘mlogit.sav’ • DV= Brand • IV = female, age • Annotated output: http://www.ats.ucla.edu/stat/spss/dae/mlogit.htm
Practice (continued) • Run Ordinal Logistic Regression Using ologit.sav • DV= admit • IV = gre, gpa, topnotch • Annotated output: http://www.ats.ucla.edu/stat/SPSS/dae/ologit.htm
Practical Issues 1. Low Ratio of Cases to Variables Problem: Extremely large parameter estimates and standard errors Solution: Collapse categories Delete the offending category Delete discrete predictors
Practical Issues 2. Inadequacy of Expected Frequencies & Power Problems: Lower power with small frequency cells Solution: Accept low power Collapse categories or delete discrete predictors Evaluate model fit with 2
Practical Issues 3. Presence of multicollinearity Problem: Large standard errors, or estimates Solution: Run multiway frequency tables to identify categorical variables Run correlations to identify continuous variables Delete theoretically less important predictors or combine with other procedures
Practical Issues • Rare events may be appropriate for poisson regression or negative binomial regression.
References Allison, P. D. (Logistic regression using the SAS system. NC, Cary: SAS Institute, Inc. Hosmer, D. W. & Lemeshow, S. (2000). Applied logistic regression. New York: John Wiley & Sones, Inc. Menard, S. (1994). Applied logistic regression analysis. Thousand Oaks, CA: Sage Publications, Inc. Liao, T. F. (1994). Interpreting Probability models: logit, probit, and other generalized linear models. Thousand Oaks, CA: Sage Publications, Inc. Long, S.J. & Freese, J. (2006). Regression models for categorical dependent variables using stata. College Station, Texus: Stata press