1 / 18

Logistic Regression

Logistic Regression. Conceptual Framework - LR. Dependent variable: two categories with underlying propensity (yes/no) (absent/present) Independent variables: Continuous Categorical (Treat the same way as linear regression). Maximum Likelihood. Equations to difficult to solve

radley
Download Presentation

Logistic Regression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Logistic Regression

  2. Conceptual Framework - LR • Dependent variable: two categories with underlying propensity • (yes/no) • (absent/present) • Independent variables: • Continuous • Categorical • (Treat the same way as linear regression)

  3. Maximum Likelihood • Equations to difficult to solve • Have to “plug in” numbers until you minimize the error • Minimizing error is the same thing as maximizing the likelihood • Can solve equations that cannot be algebraically solved, but have some kind of best solution in theory.

  4. Two Categories • Natural Categories vs. Underlying Propensity • Categories and Attenuation • Logistic infers the variance that is removed by shrinking the categories to two

  5. The Odds Ratio • Different from the probability of an event occurring (p) • This is the probability of an event occurring divided the probability of that event not occurring: • An odds ratio of 1 is the same as p=.5 (just as likely to happen as not happen!) • An example: if men steal the remote 60% of the time they are 1.5 times more likely than females to steal the remote

  6. The Logistic Model

  7. Logistic Equation Which is really just: You are not predicting Y…you are predicting the (log) odds of Y!

  8. Coefficients • Same as linear regression: • Make the best weight for each variable. We call these B coefficients • Figure out how good your guess for B is based on the data. We call this the standard error. • Different from linear regression: • Interpretation of B: need to take the exp(B) [because we are using a different relationship to Y. • We are asking whether exp(B) is different than 1. • (And we have to pick between the likelihood ratio test and Wald)

  9. Test of Coefficients • Wald Test: • Just using the Maximum Likelihoood standard error to ask questions about the parameters. Very analogous to the tests used on coefficients in linear regression • Likelihood ratio test • This is a generalized way of testing an arbitrary hypothesis in maximum likelihood using a chi-squared distribution. • These two tests are asymptotically equivalent (i.e. as you sample size goes to infinity). • Both are really just testing if exp(B) is different from 1 and they usually agree

  10. Interpretation of LR • Omnibus test: Chi-square • “Do these variables, collectively, do a better job than the mean only model?” • Explain variance: NagelkerkeR2 • Maximum likelihood version of R-squared: “What percentage of the model misfit have I explained?” • Model fit: Hosmer & Lemeshowtest • “Is this model correctly classifying the number of cases you would expect?” • Classification: Percentage Correct • This will make sense to a lot of people, but might not be the statistical measure you are looking for.

  11. Interpretation of Odds Ratio • Odds Ratio (OR) or Exp(B) • This is the odds of the event occurring divided by the odds of the event not occurring: • For example, and OR of 1.2 means that for every unit increase in the independent variable, the dependent variable is 1.2 times more likely to occur. (or 20% more likely to occur) • An OR more than one means the DV is more likely to occur with increases in the IV • An OR less than one means the DV is less likely to occur with increases in the IV

  12. Some Variations • Binary LR: • DV has 2 attributes • IVs can be categorical or continuous • Multinomial LR: • DV has 3 or more attributes (not ordered) • IVs can be categorical or continuous • Ordinal LR: • DV has 3 or more ordered attributes • IVs can be categorical or continuous

  13. Example of LR Presentation • Research question: • What is the effect of goal attainment, social integration, academic integration, and sex on extrinsic motivation?

  14. Introduction • Extrinsic motivation – going to college to receive external rewards, such as increased finances & status (high=1; low=2). • Goal attainment – the development of educational goals, such as grade attainment (scale 1-7). • Academic integration – college integration related to academic success, such as doing homework, studying, asking professors questions (scale 1-7). • Social integration – college integration related to social success, such as making friends or ‘hanging out’ at school (scale 1-7).

  15. Binary Logistic Regression Binary Logistic Regression of Goal Attainment, Social & Academic Integration, & Sex on Extrinsic Motivation Variable B Wald Exp(B) Goal -0.139 1.958 0.87 Social -0.273 5.318 0.76* Academic -0.502 10.523 0.60** Sex -1.070 13.936 0.34*** (M=1; F=0) Constant 4.142 χ2 = 50.65, df= 4, p < .001; n=337; Nagelkerke R2 = .213 High Extrinsic Motivation = 1; Low Extrinsic Motivation = 2; p<.001; *p<.05; **p<.01; ***p<.001

  16. Discussion • Multivariate analyses suggests that college students who: • were socially integrated • were academically integrated • and male were more likely of being highly extrinsically motivated

More Related