1 / 18

Sociology 709 Martin Lecture 6: March 5, 2009

Today's topic: alligators. Increases in development and human activity in Florida have caused concern about how alligators might be affected.Suppose we are interested in how the feeding habits of alligators change as they grow.Alligators tend to specialize in a given type of prey, although they

juliette
Download Presentation

Sociology 709 Martin Lecture 6: March 5, 2009

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Sociology 709 (Martin) Lecture 6: March 5, 2009 Models with multiple nominal (multinomial) outcomes: alligators! Logit approaches for multinomial outcomes, and their weaknesses. Constructing and applying the multinomial logit model Limitations of the multinomial model. Alternatives to the multinomial model. Practice

    2. Today’s topic: alligators Increases in development and human activity in Florida have caused concern about how alligators might be affected. Suppose we are interested in how the feeding habits of alligators change as they grow. Alligators tend to specialize in a given type of prey, although they may change specializations. Researchers examined the stomach contents of 219 alligators to see what they tend to eat. (Delany and Moore, 1987)

    3. Alligator data: Outcome categories – what alligators eat: fish invertebrates reptiles birds other Explanatory variables: size 1: > 2.3 meters 0: < 2.3 meters sex 1: male 2: female lake Hancock Oklawaha Trafford George

    4. Using logit models for more than two outcomes. We could construct logit models to compare dichotomous outcomes (e.g. “eats fish” = 1 or 0) Problem: we lose a lot of information that way. example: if we do a logit on “eats fish” = 1 or 0, we may not get a statistically significant effect of size if big alligators: are less likely to eat invertebrates (“eats fish” = 0) and more likely to eat reptiles (“eats fish” = 0)

    5. Using logit models for more than two outcomes. We could limit the data set two only two outcomes, then do logit models to compare the dichotomous outcomes (e.g. remove reptile, bird, and other, so that “eats fish = 1 or 0” refers only to fish and invertebrates) This approach is called the baseline logit approach, also called the separate fitting approach.

    6. Definition of a multinomial logit: In a multinomial logit model, we have a set of covariates that predicts ln(p2/p1), ln(p3/p1), … all at the same time. where p1 , p2 , p3 , … refer to all possible outcome categories and where p1 refers to the comparison category. Also called the simultaneous fitting approach

    7. Definition of a multinomial logit: Equation for a multinomial logit model: ln (p2/(p1)) = ?02 + ?12x1 + ?22x2 + ?32x3 + … ln (p3/(p1)) = ?03 + ?13x1 + ?23x2 + ?33x3 + … ?12 refers to the change in the log odds of outcome p2 relative to outcome p1 associated with a unit change in x1. ?13 refers to the change in the log odds of outcome p3 relative to outcome p1 associated with a unit change in x1. For p1= fish, p2=inverts, x1=size, ?12 tells you the change in the log odds of eating inverts relative to eating fish associated with a one-unit increase in the size of the alligator. (Not the change in the log odds of eating inverts!)

    8. How does a multinomial logit involve simultaneous comparisons? ln (p2/(p1)) = ?02 + ?12x1 + ?22x2 + ?32x3 + … ln (p3/(p1)) = ?03 + ?13x1 + ?23x2 + ?33x3 + … In a multinomial model, ?12x1 (the coefficient for an increase in size) is estimating both the change in p1(eats fish) and the change in p2 (eats inverts) for a one unit increase in size.

    9. A limitation of the multinomial logit model. One limitation of multinomial models is that they assume “independence from irrelevant alternatives”, or IIR In other words, the odds of outcome j vs. outcome k do not depend on what other outcomes (l, m, n) are available. Example with this data: the effect of size on eating fish vs eating invertebrates, given that reptiles (e.g. turtles) are also available.

    10. Independence of irrelevant alternatives: the red bus – blue bus problem Outcome of interest = modes of transportation option probability car 1/3 red bus 1/3 p(car)/p(red bus) = 1 blue bus 1/3 If we now remove the third option (the “blue bus” line stops doing business in the area, with schedules and prices identical to the “red bus” line), we would predict the following based on our previous mlogit results. p(car)/p(red bus) = 1 option probability car 1/2 red bus 1/2

    11. Assuming independence: the red bus – blue bus problem If the red bus and the blue bus are competing directly, a more realistic prediction could be… option probability car 1/3 (no change in p(car)) red bus 2/3 p(car)/p(red bus) = 1/2 Alternately, if the blue bus only serves routes not served by the red bus, a realistic prediction could be… option probability car 2/3 p(car)/p(red bus) = 2 red bus 1/3 (no change in p(red))

    12. Implications of assuming independence What does it matter that we have violated the independence of irrelevant alternatives assumption in an mlogit model? 1.) The coefficients for the model are still unbiased 2.) The interpretations of the coefficients may be incorrect. (e.g. Will the alligators switch to turtles if the invertebrates all disappear? The answer might depend on the size of the fish population in the lake)

    13. Another weakness of multinomial logits Difficulty in interpretation: in a model with fish as the baseline, ?12 for size in the model comparing with inverts refers to the change for “eats inverts” relative to “eats fish” for a unit increase in size. An unprepared reader will probably interpret ?12 as referring to a change in the odds of “eats inverts” for a unit increase in size. Wrong, wrong. (There are no good procedures for extracting predicted proportions from a multinomial logit model.) Your job is to explain your results to the reader, and that takes time and effort and descriptive stats.

    14. Alternatives to multinomial logits Collapse the categories to only two, then do a logit model. Loses much of the information Sometimes you have strong theoretical and empirical evidence that the categories should be in a specific order, but not evidence to support a linear regression model. In such cases you might use ordered logit models

    15. Alternatives to multinomial logits Sometimes when you have richer, more individualized data, you encounter problems The multinomial model can cope if x-variables have different effects on different outcomes, but what if the actual value of the x-variables depend on the outcomes? For example, in a red bus/blue bus model, a key factor in the choice of transportation could be the commute time (xc), but xc could vary by individual and by mode of transportation. In such cases you can use variants of logit models called conditional logit models. see Long, p. 178-181.

    16. Practice with multinomial models Since 1972, religious beliefs and activities have been changing in the United States. “Mainstream” religions have had declining attendance. Fundamentalist religions have experienced some resurgence There has also been an increase in the proportion of Americans who do not report any religious beliefs.

    17. Practice with multinomial models It is possible to use the GSS to categories religious groups in several ways. . *1 = fundamentalist Christian (mostly Protestant) . *2 = nonfundamentalist Christian . *3 = no religion reported . *4 = Catholic . *5 = Jewish . *6 = other valid answers We wish to analyze shifts in these categories over time, since 1972. . generate year72 = year-1972

    18. Readout from multinomial models . * mlogit version 1: linear term for year . mlogit relcat year72, basecategory(2) Multinomial logistic regression Number of obs = 46016 LR chi2(5) = 749.20 Prob > chi2 = 0.0000 Log likelihood = -67134.049 Pseudo R2 = 0.0055 ------------------------------------------------------------------------------ relcat | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- 1 | year72 | .0124607 .0012327 10.11 0.000 .0100446 .0148768 _cons | -.1551288 .0228966 -6.78 0.000 -.2000054 -.1102523 -------------+---------------------------------------------------------------- 3 | year72 | .0425831 .0018637 22.85 0.000 .0389303 .0462359 _cons | -1.901194 .0384445 -49.45 0.000 -1.976544 -1.825844 -------------+---------------------------------------------------------------- 4 | year72 | .0102774 .0013077 7.86 0.000 .0077143 .0128405 _cons | -.3500655 .0242397 -14.44 0.000 -.3975745 -.3025566 -------------+---------------------------------------------------------------- 5 | year72 | .0037489 .0034239 1.09 0.274 -.0029619 .0104597 _cons | -2.708437 .062884 -43.07 0.000 -2.831688 -2.585187 -------------+---------------------------------------------------------------- 6 | year72 | .0553417 .0033317 16.61 0.000 .0488116 .0618717 _cons | -3.44513 .0730876 -47.14 0.000 -3.588379 -3.301881 ------------------------------------------------------------------------------ (Outcome relcat==2 is the comparison group)

    19. Summary of multinomial logits Multinomial models allow you to estimate models with several outcomes at once. The STATA commands are similar to the commands for logit models. You must be careful in interpreting your results: the precise definitions of the beta coefficients are confusing because the assumption of independence is often wrong, the real- world implications of the coefficients may be hard to judge.

More Related