Advanced Research Methods II 04/13/2009

Advanced Research Methods II04/13/2009 Logistic Regression

Topic Overview • What is it? • Basic procedures: • Estimating the model coefficients • Interpreting the logistic regression coefficients • Assessing Model fit • Evaluating association • Logistic Regression vs. Discriminant Analysis • Conducting LR using SPSS

What is it? • Analysis examining relationship between IV(s) (can be continuous or categorical) and a categorical DV • Why can’t we use multiple linear regression (MLR)? • (Recall for MLR): • MLR predicts the value of the DV from a linear combination of the IVs • Predicted value of a DV = Mean of all cases that have the same values on the IVs. • Y = b0 + b1X1 + b2X2 +… bkXk + e • When the DV is categorical (0, 1) • Mean of all cases that have the same values on the IVs  Probability of being in category 1 for those cases with such values on the IV • So, can MLR be used to predict probabilities of being in one category? • Problem: Probability ranges from 0.00 to 1.00, while the linear combination of the IVs can range from ( -  to + )

What is it? • Why can’t we use multiple linear regression (MLR)? • Using MLR would violate assumptions of linearity and homoscedasticity • Solution: Find a function to transform probability so that the resulting variable can freely vary (i.e., having values from - to + ) and therefore can be used as the dependent variable to be predicted by the linear combination of the IVs. • That function is: ln[p/(1-p)] =Logit p = Natural logarithm of the odd. • z is a natural logarithm of O (i.e., z= ln(O)] when ez=O (e = 2.71828) • O has values ranging from 0 to +  • z has values ranging from - to +  (z is negative when O <1; z = 0 when O=1)

What is it? • Odd: ratio of two probabilities O = p/(1-p) (O ranges from 0 to +  ) Eg. Dependent variable: Dropout (0 = Staying; 1 = Leaving) Probability of leaving: 0.2  Probability of staying = 1 – 0.2=0.8 Odd of leaving = 0.20/0.80 = 0.25 • Log odd = Logit(p) = Ln(O) = Ln[p/(1-p)] (logit(p) ranges from - to + ) for above example: logit(.20)= -1.39 Other examples p = .30 O = 0.43 logit(p) = -0.85 p = .40 O = 0.67 logit(p) = -0.41 p = .50 O = 1.00 logit(p) = 0.00 p = .60 O = 1.50 logit(p) = 0.41 p = .70 O = 2.33 logit(p) = 0.85

What is it? Logistic Regression: Determining a linear combination of the IVs such that it has the highest correlation with the logit of the probability of belonging to the category coded as 1.00 in the original DV.  That means estimating a set of coefficients b (b0 – bk) logit(p) = ln(O)= ln[p/(1-p)] = b0 + b1x1 + b2x2 + …. bkxk + e

Basic Procedures Estimating the Coefficients • Based on Maximum Likelihood (ML) procedure • Estimating a set of coefficients (weights) for the IVs that maximize the likelihood (probability) of observing the (pattern of) data through iterations. • For computational convenience, the ML procedure minimizes the value -2LL (natural logarithm of the likelihood of observing the current data, multiplied by minus 2). • -2LL is distributed as a Chi-square with (n – k) degree of freedom (n = sample size, k = number of IVs).

Basic Procedures Interpreting the logistic regression coefficients: • Meaning of the regression coefficients in MR: Y = b0 + b1X1 + b2X2 +.. bkXk. b1 = Average change in Y associated with a one-unit change in X1 In logistic regression: Ln(O) = b0 + b1X1 + b2X2 +.. bkXk. b1 = Average change in Ln(O) associated with a one-unit change in X1  When X1 increases one unit, the new odd is equal to the current one multiplied by eb1 eb1 is called odd ratio (OR; ratio of the new and old odds).

Basic Procedures Interpreting the logistic regression coefficients (cont.) - When b1 > 0  OR > 1  Increase in X1 leads to increase in the odd (probability) of belonging to category 1. - When b1 < 0  OR < 1  Increase in X1 leads in decrease in the odd (probability) of belonging to category 1. - When X1 is a categorical variable (e.g., male = 0, female = 1), eb1 is the odd ratio of females and males. (e.g., the ratio of the odd that a female student will drop out to the odd that a male student will drop out). - Note: Odd ratio (ratio of the odds) is different from risk ratio (ratio of the probabilities).

Basic Procedures Assessing model fit: • Using -2LL (distributed as Chi-square – larger when the model is not good) • Based on hierarchically nested model: Comparing the -2LL of model with all the IVs to that of the “null” model, that is, model with no IVs. • Null hypothesis H0: both models fit similarly (i.e., the IVs are not good predictors) • Alternative hypothesis H1: Model with all the IVs fit better (i.e., the IVs are good predictors) • If the Chi-square is significant  Reject the H0.

Basic Procedures Evaluating the association (“effect size”): • “R-square-like” indices: • Based on -2LL • Conceptually analogous to the R-square in MLR in the sense that they reflect the proportion of information in the data explainable by the IVs. • Cox and Snell’s R-square (R2M) • Range: 0 ≤ R2M≤ 1 • Affected by base rate • Nagelkerke’s R-square (R2N) • Computed by dividing R2M by its maximum value. • Range: 0≤R2N ≤ 1 • Also affected by base rate.

Basic Procedures Evaluating the association (“effect size”): • “R-square-like” indices (cont.) • Likelihood R-square (R2L) (L0 = -2LL for the null model, LM= -2LL for the full model with all IVs) • Range: 0≤ R2L≤ 1 • Not affected by base rate • Probably the best index • Not provided directly by SPSS but can be calculated from the -2LL of the null model and that of the full model. • Classification accuracy (based on classification table) • Same as in DA • Can be used to compared DA and LR

Discriminant Analysis and Logistic Regression • LR requires less stringent assumptions  Can be used when the IVs include both continuous and categorical variables or when the Box’s test suggests that the assumption of equal within-group variances-covariances is not tenable. • LR used Maximum Likelihood estimation  requires larger sample sizes. • Comparing solutions provided by LR and DA: Inconclusive (Fan & Wang, 1999).

Conducting LR using SPSS Data

Advanced Research Methods II 04/13/2009

Advanced Research Methods II 04/13/2009

Presentation Transcript

Scientific Methods II: Correlational Research

Advanced Statistical Methods for Translational Research

Research Methods in Psychology

Research Design Mixed Methods

Research Methods Lecture 5 Advanced STATA

Research Methods II

Protein Methods II

Advanced Library Research Methods

Advanced Methods

Advanced Research Methods II 03/25/2009

Quantitative Research? (ED 910: Advanced Methods in Educational Research)

Northern Advanced Research Training Initiative Experimental Research Methods

Research Methods: Lecture 04

Usability Research Methods

ADVANCED STATISTICAL METHODS IN AGRICULTURAL RESEARCH

Data Analysis, Modelling and Research Methods: Qualitative Methods II

Psychology 242 Research Methods II

Research Methods II