450 likes | 651 Views
Advantages and disadvantages of models we already know. Exponential models
E N D
1. Sociology 709 (Martin)Lecture 12: April 23, 2009 Statistical models for rates: part 2 of 2.
Other baseline models you might read about.
Practice with rates and survivor proportions
Possible problems with rate models
Readings to look at:
Vaupel and Yashin. 1985. “Heterogeneity’s Ruses.” American Statistician 39(3):176-185.
4. Piecewise constant exponential model To specify a piecewise constant model, set aside one duration as the comparison interval, then define other intervals with a time-varying covariate.
Once you have done this, you can estimate a coefficient for an interval just as you estimate a coefficient for any other covariate.
In this example, ?1 and ?2 estimate coefficients for time intervals, not x variables, and ?0 estimates an intercept for the omitted time interval
5. Data set-up for a piecewise constant model To
6. Data set-up for a piecewise constant model A few notes about the data set-up:
Starting duration = 9 means 9.0000
Ending duration = 20 means 20.9999
It is possible to start in month 21 and have a second birth in month 21.
It is not possible to start in month 0 and have a second birth in month 0. (In other words, STATA will drop cases with a twin first birth.)
7. How on earth do we set up data like this? Answer: tell STATA to do it (or do it in SAS).
. * set up for a piecewise exponential model
. stset dur, fail(birth2) id(id)
. stsplit durcat, at(9 21 33 69)
(6790 observations created)
. egen durgroup = group(durcat)
(34 missing values generated)
. gen dur0008 = durgroup==1
. gen dur0920 = durgroup==2
. gen dur2132 = durgroup==3
. gen dur3368 = durgroup==4
. gen dur69p = durgroup==5
8. . *event history model with piecewise exponential baseline for duration
. * since first birth
. streg age1_15 age1_16 age1_18 age1_25 hispanic nhblack nhother dur0008 dur092
> 0 dur2132 dur69p, dist(exp) nohr
Exponential regression -- log relative-hazard form
No. of subjects = 2918 Number of obs = 9708
No. of failures = 1741
Time at risk = 124953
LR chi2(11) = 882.26
Log likelihood = -2785.6657 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
_t | Coef. Std. Err. z P>|z| [95% Conf. Interval]
---------+--------------------------------------------------------------------
age1_15 | -.1538871 .1253363 -1.228 0.220 -.3995418 .0917676
age1_16 | .0472311 .0719714 0.656 0.512 -.0938302 .1882924
age1_18 | .0901565 .0592117 1.523 0.128 -.0258964 .2062093
age1_25 | -.016424 .0898077 -0.183 0.855 -.1924439 .1595959
hispanic | .1417642 .0705131 2.010 0.044 .0035609 .2799674
nhblack | .0476215 .0676632 0.704 0.482 -.0849961 .180239
nhother | .3124815 .109814 2.846 0.004 .0972501 .527713
dur0008 | -3.444608 .2618926 -13.153 0.000 -3.957908 -2.931308
dur0920 | -.2045279 .0635638 -3.218 0.001 -.3291106 -.0799452
dur2132 | .3671508 .0596125 6.159 0.000 .2503124 .4839892
dur69p | -.8130398 .097524 -8.337 0.000 -1.004183 -.6218962
_cons | -4.0599 .0517029 -78.524 0.000 -4.161236 -3.958565
------------------------------------------------------------------------------
23. Right censoring Causes of right censoring:
#1: experiencing the event, which makes the respondent no longer at risk of the event.
#2: experiencing some competing event which makes the respondent no longer at risk of the event.
#3: interview occurs, and no further information is available for that respondent
Effects of right censoring:
increases standard errors of estimates, particularly survivor estimates.
makes models subject to bias if the baseline is improperly specified.
24. Losing data at the left side of the duration function Left-censoring: we do not have information dating back to the starting duration for a person, but we know the duration at which we begin to have valid observations.
Example: a health study which examines death rates due to tuberculosis. The key explanatory variable is a genetic marker. The death rate due to tuberculosis is duration-dependent, and the survey asks the respondents to recall the duration since first symptoms.
25. A more serious problem: left-truncation Left-truncation: we do not have information dating back to the starting duration for a person, and we do not know the duration at which we begin to have valid observations.
Example: a health study which examines death rates due to tuberculosis. The key explanatory variable is a genetic marker. The death rate due to tuberculosis is duration-dependent, and there is no reliable measure of the duration since first symptoms.
28. Next possible problem: nonproportional hazards The problem: What happens if a covariate has a certain effect at some durations, but a different or no effect at other durations?
If this happens, estimates for the coefficient depend on the duration of observation. This leads to confusion and the possibility of manipulation of the results.
Example: 2nd birth rates for teen mothers may indeed be higher at certain durations than at others.
41. Practice with hazard models: How women’s suffrage movements succeeded
“An event history analysis provides evidence that gendered opportunity structures helped bring about the political successes of the suffragists. Results suggest the need for a broader understanding of opportunity structure than one rooted simply in formal political opportunities.”
McCammon et al., 2001
42. How women’s suffrage movements succeeded Outcome of interest:
State or territory passes major suffrage legislation
Gendered opportunity structures include…
new-woman index (college students, doctors, lawyers, and women’s movements)
proportion of neighboring states with women’s suffrage
World War I years (lagged)
43. Interpreting the coefficients from the Women’s Suffrage Paper. By how much does the rate of passage of suffrage laws change in a state if (all else equal):
The new-woman index increases by a point?
What is the new-woman index?
2 of 5 neighboring states have granted suffrage?
The state is a Western state?
There is a state prohibition law?
Suffragists are using “separate spheres” arguments?
The year is 1905 instead of 1915?
44. Possible problems: Problems of causal inference: How do we know that x caused y?
new woman index
proportion of neighboring states
World War I years
State prohibition laws and other time-forward variables
If we accept a causal inference, do we accept the substantive intepretation?
Is proportion of neighboring states a “gendered opportunity structure”?
45. Possible problems: Whether versus when
“Whether” is a problem, because key explanatory variables are strongly correlated with time.
How much does the “new woman index” vary across states in a single year?
“When” is a problem, because of a lack of variation
4 states enacted suffrage from 1869 to 1909,
8 from 1910 to 1914
17 from 1917 to 1919
Do controls for decade help?
46. Possible problems: Sample size and degrees of freedom
Table 2 lists 25 covariates, and the text mentions more that were tried and discarded
Number of cases varies from 1,161 to 2,358. (in State*months)
However, Table 1 lists only 31 “events”, with 2 outside the time frame of the study, so this model is severely overidentified.
(What p-values did the authors use?)
Multiple events from the same case.
Should we drop duplicate events?
If so, should we drop the first or the second event?