Logistic Regression

Logistic Regression

Conceptual Framework - LR • Dependent variable: two categories with underlying propensity • (yes/no) • (absent/present) • Independent variables: • Continuous • Categorical • (Treat the same way as linear regression)

Maximum Likelihood • Equations to difficult to solve • Have to “plug in” numbers until you minimize the error • Minimizing error is the same thing as maximizing the likelihood • Can solve equations that cannot be algebraically solved, but have some kind of best solution in theory.

Two Categories • Natural Categories vs. Underlying Propensity • Categories and Attenuation • Logistic infers the variance that is removed by shrinking the categories to two

The Odds Ratio • Different from the probability of an event occurring (p) • This is the probability of an event occurring divided the probability of that event not occurring: • An odds ratio of 1 is the same as p=.5 (just as likely to happen as not happen!) • An example: if men steal the remote 60% of the time they are 1.5 times more likely than females to steal the remote

The Logistic Model

Logistic Equation Which is really just: You are not predicting Y…you are predicting the (log) odds of Y!

Coefficients • Same as linear regression: • Make the best weight for each variable. We call these B coefficients • Figure out how good your guess for B is based on the data. We call this the standard error. • Different from linear regression: • Interpretation of B: need to take the exp(B) [because we are using a different relationship to Y. • We are asking whether exp(B) is different than 1. • (And we have to pick between the likelihood ratio test and Wald)

Test of Coefficients • Wald Test: • Just using the Maximum Likelihoood standard error to ask questions about the parameters. Very analogous to the tests used on coefficients in linear regression • Likelihood ratio test • This is a generalized way of testing an arbitrary hypothesis in maximum likelihood using a chi-squared distribution. • These two tests are asymptotically equivalent (i.e. as you sample size goes to infinity). • Both are really just testing if exp(B) is different from 1 and they usually agree

Interpretation of LR • Omnibus test: Chi-square • “Do these variables, collectively, do a better job than the mean only model?” • Explain variance: NagelkerkeR2 • Maximum likelihood version of R-squared: “What percentage of the model misfit have I explained?” • Model fit: Hosmer & Lemeshowtest • “Is this model correctly classifying the number of cases you would expect?” • Classification: Percentage Correct • This will make sense to a lot of people, but might not be the statistical measure you are looking for.

Interpretation of Odds Ratio • Odds Ratio (OR) or Exp(B) • This is the odds of the event occurring divided by the odds of the event not occurring: • For example, and OR of 1.2 means that for every unit increase in the independent variable, the dependent variable is 1.2 times more likely to occur. (or 20% more likely to occur) • An OR more than one means the DV is more likely to occur with increases in the IV • An OR less than one means the DV is less likely to occur with increases in the IV

Some Variations • Binary LR: • DV has 2 attributes • IVs can be categorical or continuous • Multinomial LR: • DV has 3 or more attributes (not ordered) • IVs can be categorical or continuous • Ordinal LR: • DV has 3 or more ordered attributes • IVs can be categorical or continuous

Example of LR Presentation • Research question: • What is the effect of goal attainment, social integration, academic integration, and sex on extrinsic motivation?

Introduction • Extrinsic motivation – going to college to receive external rewards, such as increased finances & status (high=1; low=2). • Goal attainment – the development of educational goals, such as grade attainment (scale 1-7). • Academic integration – college integration related to academic success, such as doing homework, studying, asking professors questions (scale 1-7). • Social integration – college integration related to social success, such as making friends or ‘hanging out’ at school (scale 1-7).

Binary Logistic Regression Binary Logistic Regression of Goal Attainment, Social & Academic Integration, & Sex on Extrinsic Motivation Variable B Wald Exp(B) Goal -0.139 1.958 0.87 Social -0.273 5.318 0.76* Academic -0.502 10.523 0.60** Sex -1.070 13.936 0.34*** (M=1; F=0) Constant 4.142 χ2 = 50.65, df= 4, p < .001; n=337; Nagelkerke R2 = .213 High Extrinsic Motivation = 1; Low Extrinsic Motivation = 2; p<.001; *p<.05; **p<.01; ***p<.001

Discussion • Multivariate analyses suggests that college students who: • were socially integrated • were academically integrated • and male were more likely of being highly extrinsically motivated

Logistic Regression