310 likes | 326 Views
LOGISTIC REGRES SION. BIN A R Y LOGISTIC REGRES SSION (BLR). L iterature.
E N D
Literature • Applied logistic regression /Hosmer, Lemeshow (1989,2000,375 p.), Regression models for Categorical and Limited Dependent Data, /Scott Long(1997,296 p.), Logistic regression :a primer /Fred C. Pampel (2000,85 p.).Categorical Data Analysis/Agresti (2002,710 p.) • In Czech Řeháková-Nebojte se logistické regrese (Soc. časopis 4:475-492) • LR in SPSS-Discovering statistics using SPSS for Windows :advanced techniques for the beginner /Andy Field (2000,2005,2009) andNorusis
Assumptions, variables • Dependent variable • a) binary (binary logistic regression) - today • b) ordinal(ordinal regression)– not included • c) nominal (polytomous logistic regression) – later • Ind. variables: all types • Close technique: diskriminant analysis (more assumptions for DA, normality of ind. vars)
Binary logistic reg Model • Probability • Odd • Odds ratio • Logit-natural logarithm of odds • Equation for binary log reg and back transformations
Equation, ind. vars interactions • If ind. vars are binary or nominal: use dummy (SPSS will do it for us)- last category=reference category) • interaction-see linear regression
Basic questions in BLR • Does the model fit to the data? (LR: F-test and R2) a (BLR: pseudo R2 and LR test or Hosmer-Lemeshow test) • Evaluate importance and stat. significance of ind. vars (LR: t-testy and beta coeffs, BLR: Wald test and standardized coeffs) • Does my data fulfill ? (LR: linear relationship between ind and dep. vars, LR: relationship between ind vars and logit)
Estimation • No usage of OLS • Basic technique: : ML (see also loglinear models, structural equation modelling etc.) • Iterative solution, more steps, impossible to solve without computer
Example • Usage of Inet - WIP 2006 • Intro to data set, selection of vars and exploration
BLR in SPSS • Equation, Wald’s tests • menu-Analyze-Regression-Binary Logistic
Automated inclusion of vars • forward (1) • backward (2) • a) LR(likelihood-ratio) – based on overall test • b) Wald– based on partial test • c ) conditional-simpler version of LR, • 6 combinations (1 or 2 vs a)-c)) • Reco: use forward and LR or conditional
Menu SPSS Categorical-definenominalvariables and reference category Save • residuals and influentials Hosmer-Lemeshow test Classification Plot
Syntax LOGISTIC REGRESSION VAR=inet /METHOD=ENTER age edu /CLASSPLOT /PRINT=GOODFIT CI(95) /CRITERIA PIN(.05) POUT(.10) ITERATE(20) CUT(.5) .
Outputs • Estimates (Variables in the Equation) • Interpretation of coeff: Change of logit if ind. var. increase by one unit (continuous ind. var) or in comparison with reference category (dummy or binary vars) • exp(B): change in odds if ind. var increase by one unit (if the scale is long use more than one unit, e.g. 10, 100, 1000)
Other outputs • Wald test • CI for exp(B) -95 % confindence interval for exp(B) • LR tests in table Changes of Goodness-of-Fit: • Model-comparison of our model and model including only interpcept • Block-changes in blocks • Step-changes in Steps if more models are fitted
Pseudo R2 • Close to (coeff of determination in LR • Can not be interpreted as explained variance (dep. var. is binary) • More formulae exists • Cox and Snell R2- range (0;1) (never achive 1) • Nagelkerke R2- modifief Cox and Snell (can achieve 1) • Mc Fadden- R2- range (0;1) (never achive 1) • The most frequently used is Nagelkerke R2
Logistic regression - outputs • Classification table – percentage of correctly classified cases • Histogram of estimated probabilities
Probability curve in BLR • How to prepare this curve? • Tool in MS Excel
Standardized coeffs • Not included in SPSS outputs • No special option • Necessary to standardized predictors and compute once again • Possible to compare strenght of predictors
Reco for publishing • Necessity to differentiate logits, odds • Keep in mind which categories are compared • Coeffs, standardized coeffs • LR test, Wald’s tets, Hosmer Lemeshow test, pseud R2 (mostly Nagelkerke) • It is good to publish classification table or only percentage of correctly classified cases
PLR • Dep. var. nominal • More equations • Comparison with the last category of dep. var. • menu-Analyze-Regression-Multinomial Logistic • Data: library.sav
SPSS- options Categorical vs. Covariate Statistics • Clasification Table Model • Ind. vars and interactions Model • Predicted category • Options – transformations
Syntax NOMREG election BY gender WITH rightor libaut astat intaln extaln anomie age edyrs /CRITERIA = CIN(95) DELTA(0) MXITER(100) MXSTEP(5) LCONVERGE(0) PCONVERGE(1.0E-6) SINGULAR(1.0E-8) /MODEL /INTERCEPT = INCLUDE /PRINT = CLASSTABLE FIT PARAMETER SUMMARY LRT .
Assumptions • One equation with tresholds • Necessary to test whether lines are paralel (if not use PLR)
Syntax PLUM q30crec BY q36rec /CRITERIA = CIN(95) DELTA(0) LCONVERGE(0) MXITER(100) MXSTEP(5) PCONVERGE(1.0E-6) SINGULAR(1.0E-8) /LINK = LOGIT /PRINT = CELLINFO FIT PARAMETER SUMMARY TPARALLEL . *PLUM-PoLytomous Universal Model (SPSS 10)
Ordinal regression – outputs • Estimates (Variables in the Equation) • Interpretation of coeffs-change in logit if ind. var. Increaseby 1 unit (cont. vars.) or in comparison with reference category (dummy or binary vars); use exp(B)-meaning: change in odd (for higher category in comparison with lower one)
Pseudo R2 • Cox and Snell R2- range (0;1) (never achive 1) • Nagelkerke R2- modifief Cox and Snell (can achieve 1) • Mc Fadden- R2- range (0;1) (never achive 1)
Close techniques • Discrimination analysis • Loglinear analysis • Logit analysis • Classification and regression trees • Following procedures – ROC curves