180 likes | 369 Views
Today's topic: alligators. Increases in development and human activity in Florida have caused concern about how alligators might be affected.Suppose we are interested in how the feeding habits of alligators change as they grow.Alligators tend to specialize in a given type of prey, although they
E N D
1. Sociology 709 (Martin)Lecture 6: March 5, 2009
Models with multiple nominal (multinomial) outcomes: alligators!
Logit approaches for multinomial outcomes, and their weaknesses.
Constructing and applying the multinomial logit model
Limitations of the multinomial model.
Alternatives to the multinomial model.
Practice
2. Today’s topic: alligators Increases in development and human activity in Florida have caused concern about how alligators might be affected.
Suppose we are interested in how the feeding habits of alligators change as they grow.
Alligators tend to specialize in a given type of prey, although they may change specializations.
Researchers examined the stomach contents of 219 alligators to see what they tend to eat. (Delany and Moore, 1987)
3. Alligator data: Outcome categories – what alligators eat:
fish invertebrates reptiles
birds other
Explanatory variables:
size 1: > 2.3 meters 0: < 2.3 meters
sex 1: male 2: female
lake Hancock Oklawaha
Trafford George
4. Using logit models for more than two outcomes. We could construct logit models to compare dichotomous outcomes (e.g. “eats fish” = 1 or 0)
Problem: we lose a lot of information that way.
example: if we do a logit on “eats fish” = 1 or 0, we may not get a statistically significant effect of size if big alligators:
are less likely to eat invertebrates (“eats fish” = 0)
and more likely to eat reptiles (“eats fish” = 0)
5. Using logit models for more than two outcomes. We could limit the data set two only two outcomes, then do logit models to compare the dichotomous outcomes
(e.g. remove reptile, bird, and other, so that “eats fish = 1 or 0” refers only to fish and invertebrates)
This approach is called the baseline logit approach, also called the separate fitting approach.
6. Definition of a multinomial logit: In a multinomial logit model, we have a set of covariates that predicts ln(p2/p1), ln(p3/p1), …
all at the same time.
where p1 , p2 , p3 , … refer to all possible outcome categories
and where p1 refers to the comparison category.
Also called the simultaneous fitting approach
7. Definition of a multinomial logit: Equation for a multinomial logit model:
ln (p2/(p1)) = ?02 + ?12x1 + ?22x2 + ?32x3 + …
ln (p3/(p1)) = ?03 + ?13x1 + ?23x2 + ?33x3 + …
?12 refers to the change in the log odds of outcome p2 relative
to outcome p1 associated with a unit change in x1.
?13 refers to the change in the log odds of outcome p3 relative
to outcome p1 associated with a unit change in x1.
For p1= fish, p2=inverts, x1=size, ?12 tells you the change in the log odds of eating inverts relative to eating fish associated with a one-unit increase in the size of the alligator.
(Not the change in the log odds of eating inverts!)
8. How does a multinomial logit involve simultaneous comparisons? ln (p2/(p1)) = ?02 + ?12x1 + ?22x2 + ?32x3 + …
ln (p3/(p1)) = ?03 + ?13x1 + ?23x2 + ?33x3 + …
In a multinomial model, ?12x1 (the coefficient for an increase in size) is estimating both the change in p1(eats fish) and the change in p2 (eats inverts) for a one unit increase in size.
9. A limitation of the multinomial logit model. One limitation of multinomial models is that they assume “independence from irrelevant alternatives”, or IIR
In other words, the odds of outcome j vs. outcome k do not depend on what other outcomes (l, m, n) are available.
Example with this data: the effect of size on eating fish vs eating invertebrates, given that reptiles (e.g. turtles) are also available.
10. Independence of irrelevant alternatives: the red bus – blue bus problem Outcome of interest = modes of transportation
option probability
car 1/3
red bus 1/3 p(car)/p(red bus) = 1
blue bus 1/3
If we now remove the third option (the “blue bus” line stops doing business in the area, with schedules and prices identical to the “red bus” line), we would predict the following based on our previous mlogit results.
p(car)/p(red bus) = 1 option probability
car 1/2
red bus 1/2
11. Assuming independence: the red bus – blue bus problem If the red bus and the blue bus are competing directly, a more realistic prediction could be…
option probability
car 1/3 (no change in p(car))
red bus 2/3 p(car)/p(red bus) = 1/2
Alternately, if the blue bus only serves routes not served by the red bus, a realistic prediction could be…
option probability
car 2/3 p(car)/p(red bus) = 2
red bus 1/3 (no change in p(red))
12. Implications of assuming independence What does it matter that we have violated the independence of irrelevant alternatives assumption in an mlogit model?
1.) The coefficients for the model are still unbiased
2.) The interpretations of the coefficients may be incorrect. (e.g. Will the alligators switch to turtles if the invertebrates all disappear? The answer might depend on the size of the fish population in the lake)
13. Another weakness of multinomial logits Difficulty in interpretation: in a model with fish as the baseline, ?12 for size in the model comparing with inverts refers to the change for “eats inverts” relative to “eats fish” for a unit increase in size.
An unprepared reader will probably interpret ?12 as referring to a change in the odds of “eats inverts” for a unit increase in size. Wrong, wrong.
(There are no good procedures for extracting predicted proportions from a multinomial logit model.)
Your job is to explain your results to the reader, and that takes time and effort and descriptive stats.
14. Alternatives to multinomial logits Collapse the categories to only two, then do a logit model.
Loses much of the information
Sometimes you have strong theoretical and empirical evidence that the categories should be in a specific order, but not evidence to support a linear regression model.
In such cases you might use ordered logit models
15. Alternatives to multinomial logits Sometimes when you have richer, more individualized data, you encounter problems
The multinomial model can cope if x-variables have different effects on different outcomes, but what if the actual value of the x-variables depend on the outcomes?
For example, in a red bus/blue bus model, a key factor in the choice of transportation could be the commute time (xc), but xc could vary by individual and by mode of transportation.
In such cases you can use variants of logit models called conditional logit models.
see Long, p. 178-181.
16. Practice with multinomial models Since 1972, religious beliefs and activities have been changing in the United States.
“Mainstream” religions have had declining attendance.
Fundamentalist religions have experienced some resurgence
There has also been an increase in the proportion of Americans who do not report any religious beliefs.
17. Practice with multinomial models It is possible to use the GSS to categories religious groups in several ways.
. *1 = fundamentalist Christian (mostly Protestant)
. *2 = nonfundamentalist Christian
. *3 = no religion reported
. *4 = Catholic
. *5 = Jewish
. *6 = other valid answers
We wish to analyze shifts in these categories over time, since 1972.
. generate year72 = year-1972
18. Readout from multinomial models . * mlogit version 1: linear term for year
. mlogit relcat year72, basecategory(2)
Multinomial logistic regression Number of obs = 46016
LR chi2(5) = 749.20
Prob > chi2 = 0.0000
Log likelihood = -67134.049 Pseudo R2 = 0.0055
------------------------------------------------------------------------------
relcat | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
1 |
year72 | .0124607 .0012327 10.11 0.000 .0100446 .0148768
_cons | -.1551288 .0228966 -6.78 0.000 -.2000054 -.1102523
-------------+----------------------------------------------------------------
3 |
year72 | .0425831 .0018637 22.85 0.000 .0389303 .0462359
_cons | -1.901194 .0384445 -49.45 0.000 -1.976544 -1.825844
-------------+----------------------------------------------------------------
4 |
year72 | .0102774 .0013077 7.86 0.000 .0077143 .0128405
_cons | -.3500655 .0242397 -14.44 0.000 -.3975745 -.3025566
-------------+----------------------------------------------------------------
5 |
year72 | .0037489 .0034239 1.09 0.274 -.0029619 .0104597
_cons | -2.708437 .062884 -43.07 0.000 -2.831688 -2.585187
-------------+----------------------------------------------------------------
6 |
year72 | .0553417 .0033317 16.61 0.000 .0488116 .0618717
_cons | -3.44513 .0730876 -47.14 0.000 -3.588379 -3.301881
------------------------------------------------------------------------------
(Outcome relcat==2 is the comparison group)
19. Summary of multinomial logits Multinomial models allow you to estimate models with several outcomes at once.
The STATA commands are similar to the commands for logit models.
You must be careful in interpreting your results:
the precise definitions of the beta coefficients are confusing
because the assumption of independence is often wrong, the real- world implications of the coefficients may be hard to judge.