290 likes | 445 Views
Multilevel Event History Models with Applications to the Analysis of Recurrent Employment Transitions. Fiona Steele. Outline. The discrete-time approach Multilevel models and examples for: Recurrent events Multiple states Handling large datasets Examples of other applications
E N D
Multilevel Event History Models with Applications to the Analysis of Recurrent Employment Transitions Fiona Steele
Outline • The discrete-time approach • Multilevel models and examples for: • Recurrent events • Multiple states • Handling large datasets • Examples of other applications • Estimation/software
Why use discrete-time methods? • Events times are often measured in discrete time units, e.g. months or years. • Straightforward to allow and test for non-proportional hazards. • We can use familiar models for discrete response data. For more complex data structures and processes, we can use existing estimation procedures for multilevel models.
Restructuring data for a discrete-time analysis: Individual-based file
Restructuring data for a discrete-time analysis: Person-period file
A simple discrete-time logit model We can fit a logit regression model of the form: The covariates xtjcan be constant over time or time-varying. ztj is vector of functions of time (e.g. polynomials or dummy variables) and αTztjis the logit of the baseline hazard function. Other link functions possible, e.g. clog-log or probit.
Recurrent events • Analyse duration of periods of continuous exposure (episodes), e.g. employment episodes, birth intervals, partnerships • There may be unobserved individual-specific (i.e. time-invariant) factors which affect the probability of an event for all of an individual’s episodes • referred to as unobserved heterogeneity or frailty
Hierarchical data structure Repeated events lead to a two-level hierarchical structure Level 2: Individuals Level 1: Episodes
is probability of event in time interval t during episode i of individual j are covariates which might be time-varying or defined at the episode or individual level random effect representing unobserved characteristics of individual j – unobserved heterogeneity or frailty u j Assume 2-level model for recurrent events
Example: women’s employment • Duration of non-employment spells; event is (re)entry into employment • Data are subsample from British Household Panel Study: 1401 women, 2290 episodes and 15314 person-year records • Employment, birth and union histories collected retrospectively at wave 2. These were linked to subsequent panel data to form continuous histories • Focus on effects of duration non-employed and time-varying indicators of number and age of children, but also adjust for age, characteristics of previous job (if any)
Unobserved individual heterogeneity • Estimated standard deviation of woman-level random effect is 0.65 (se=0.09) • significant variation between women in log-odds of entering employment due to unmeasured time-invariant characteristics • Failure to account for unobserved heterogeneity (UH) leads to overstatement of negative duration effects and understatement of positive duration effects • After accounting for UH, effects of time-varying covariates (e.g. duration and number/age children) are subject-specific, i.e. within-woman effects
Duration effects before and after allowing for unobserved heterogeneity * p<0.05
Estimates from multilevel logit model of entry into employment * p<0.05
Modelling transitions between multiple states An individual may pass through various ‘states’, e.g. employment and non-employment. Suppose there are 2 states, and denote by pstij the probability of a transition from state s. where (u1j, u2j) ~ bivariate normal Note: Generalises to multinomial logit for > 2 states
Multiple states: data structure (1) Start with an episode-based file, e.g. States are employment (E) and non-employment (NE) Notes: (i) t in years; (ii) EVENTij=1 if uncensored, 0 if censored; (iii) age, in years, at start of episode.
Multiple states: data structure (2) Convert to discrete-time format: Eij dummy for Employment, NEij dummy for Non-Employment
Example: transitions between employment and non-employment • corr(u1j, u2j)=0.58, se=0.13, so large positive residual correlation between E→NE and NE→E • Women with high (low) chance of entering E tend to have a high (low) chance of leaving E • Positive correlation arises from two sub-groups: short spells of E and NE, and longer spells of both types • BUT little impact on estimates for child indicators on (re)entry into employment
Handling large datasets • Although flexible, a drawback of the discrete-time approach is that the analysis file can be very large. This is a particular problem when we wish to fit complex models with multiple correlated random effects. • Two possible approaches: • Group time intervals • More efficient algorithms, e.g. reparameterisation in MCMC estimation (Browne et al. 2009)
Grouped time intervals Suppose we analyse 6-month rather than monthly intervals. Need to allow for different lengths of exposure time. In any 6-month interval, some will have the event or be censored after 1st month while others will be exposed for full 6 months. Denote by ntij exposure time in grouped interval t. Estimate binomial logit model with response ytij and denominator ntij Note: intervals do not need to be the same width.
Example of grouped time intervals Suppose an individual is observed to have an event during the 17th month, and we wish to group durations into 6-month intervals (t).
Implications of aggregation • Need to assume that hazard function is constant within the grouped intervals. • Need to fix values of time-varying covariates within intervals, e.g. value at start. • In practice, aggregation has little impact on estimated baseline hazard or effects of episode/individual-level covariates. But impact on coefficients of time-varying covariates can be substantial.
Examples of other applications • Hospital admissions: length of stay or duration between admissions • Repeated episodes nested within patients if multiple admissions • Hospital and GP effects using cross-classified multilevel model (GPs refer to multiple hospitals, and hospitals take patients from multiple GPs) • Area effects on mortality or fertility • Repeated birth intervals (for fertility) for individuals nested within areas
Area effects on mortality: alternative approaches • As in employment example, set up person-period file with multiple records per person, e.g. Kravdal(2006) • Define a single binary response for each person and include number of years of exposure as offset in a Poisson regression, e.g. Tarkiainen et al. (2009). Could also treat as binomial response (as for grouped time intervals). • If few, categorical covariates apply Poisson regression to aggregate data (1 record for each combination of t and covariate values)
Area effects on mortality: Multilevel Poisson modelling of aggregate data (1) • Suppose we want to estimate effect of age, sex and area characteristics on individual mortality risk • Suppose we group age into four 5-year age categories. Then for each area define 8 cells, one for each age-sex combination • For area j denote by yij the observed number of deaths for age-sex cell i • Denote the total population at risk of mortality in cell i of area j by nij, or might use expected number of deaths Eij
Area effects on mortality: Multilevel Poisson modelling of aggregate data (2) • Analyse (yij, nij) using 2-level Poisson model • Define age and sex dummies characterising cells and include these and area-level variables as predictors • Application to cancer mortality: Langford and Day (2001) - No. deaths for small areas (i) within regions (j) within EC nations (k). Covariates at regional level • Application to teenage conception: Diamond et al. (2002) • No. conceptions for age-year cell (i) within electoral wards (j). Deprivation indicators at ward level
Software • Recurrent events and multiple states. Any software for multilevel binary responses • Binomial models for grouped intervals. GLLAMM, MLwiN, WinBUGS • Simultaneous equations models for correlated processes. aML, GLLAMM, MLwiN, Sabre, WinBUGS. aML is the most general (mixed response types at different levels)
References Browne, W. J., Steele, F., Golalizadeh, M. & Green, M. (2009). The use of simple reparameterisations in MCMC estimation of multilevel models with applications to discrete-time survival models. JRSS A,172, 579-598. Diamond, I., Clements, S., Stone, N. and Ingham, R. (2002) Spatial variation in teenage conceptions in south and west England. Journal of the Royal Statistical Society, Series A, 162: 273-289. Goldstein, H., Pan, H. and Bynner, J. (2004) “A flexible procedure for analysing longitudinal event histories using a multilevel model.” Understanding Statistics, 3: 85-99. Kravdal, Ø (2006) Does place matter for cancer survival in Norway? A multilevel analysis of the importance of hospital affiliation and municipality socio-economic resources. Health and Place, 12: 527-537. Langford, I. H. and Day, R.J. (2001) Poisson Regression. In A.H. Leyland and H. Goldstein (ed) Multilevel Modelling of Health Statistics. London: Wiley. Chapter 4.
References Steele, F., Goldstein, H. and Browne, W. (2004) “A general multistate competing risks model for event history data, with an application to a study of contraceptive use dynamics.” Statistical Modelling, 4: 145-159. Steele, F. (2011) Multilevel discrete-time event history models with applications to the analysis of recurrent employment transitions (with discussion). Australian and New Zealand Journal of Statistics (to appear). Tarkiainen, L., Martikainen, P., Laaksonen, M. and Leyland, A.H. (2009) Comparing the effects of neighbourhood characteristics on all-cause mortality using two hierarchical areal units in the capital region of Helsinki. Health and Place, 16: 409-412. See also downloadable materials: http://www.cmm.bris.ac.uk/MLwiN/tech-support/workshops/materials/models.shtml http://www.cmm.bris.ac.uk/MLwiN/tech-support/workshops/materials/eha.shtml