1 / 34

Dynamic analysis of binary longitudinal data

Explore modeling missingness, dynamic covariates, martingale residual processes in longitudinal binary data analysis. Case study with Blue Bay project data from Salvador, Brazil.

colleenw
Download Presentation

Dynamic analysis of binary longitudinal data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dynamic analysis of binary longitudinal data Ørnulf Borgan Department of Mathematics University of Oslo Based on joint work with Rosemeire L. Fiaccone, Robin Henderson and Mauricio L. Barreto

  2. Outline: - An example of binary longitudinal data: The Blue Bay project - Modelling missingness for longitudinal binary data (including the relation to independent censoring in event history analysis) - An additive model for longitudinal binary data - Dynamic covariates - Martingale residual processes - Concluding comments

  3. Blue Bay project: Bahia State, Brazil (size of France) State capital Salvador (pop: 2.5 mill.)

  4. Public works and education in the areas of sanitation and environment executed by the Bahia State Government since 1997 Cost: more than $1 billion Belgica 2002 Belgica 1996

  5. Data: Daily data on diarrhoeafor almost a thousand children (one per family) Collected at home visits Oct 2000 to Jan 2002 Children less than 3 years of age at entry Diarrhoea: three or more fluid motions a day Episode of diarrhoea: sequence of days with diarrhea until at least two consecutive clear days

  6. The reduced prevalence/incidence over time may reflect improved health over the study period, or may be an artefact due to ageing of the cohort

  7. Social, demographic and economic characteristics collected at entry to the study:

  8. Follow-up information on 10 children: Under observation: New episode: X Ongoing episode: X Drop-out: O

  9. Pattern of missing observations for all 926 children: Non-available data collector Police strike Carnival St. John's day Christmas Day

  10. Three types of missingness: - Late entries (16% of children) - Drop-outs (21% of children) - Intermittent missingness (20% of observations)

  11. Features of the data: Longitudinal binary data Four time scales: calendar, age, study, episode Calendar time used as basic time scale Aims: Study factors of importance for incidence and prevalence of diarrhoea and how diarrhoea incidence and prevalence vary over calendar time Ignored (for this talk): Spatial associations Other non-independence

  12. Conditions on the missingness are defined for this model Modelling missingness: Joint model for binary data and missingness Model for binary data without missingness Model for observed data Parameters of interest are defined for this model Statistical methods are derived and studied for this model We need to relate the models for the three situations (starting with models for one individual)

  13. Model without missingness Observations for child i is a binary time series Here if the child starts a new episode of diarrhea at day t(has diarrhoea at day t) Let be the s-algebra generated by the fixed and external time-varying covariates for child i is the information that had been available on child i by day t had there been no missingness

  14. Introduce the conditional probabilities The aim for our analysis is to study how the vary over time and how they depend on covariates, including dynamic covariates that are functions of for s < t This differs from the common approach in longitudinal data analysis, where the focus is on the marginal probabilities

  15. Joint model for binary longitudinal data and missingness Introduce the observation process for individual i We need to consider the larger filtration: where is generated by and external aspects of the observation process for child i

  16. We make two assumption on the missingness: • These assumption correspond to: • sequential MAR in longitudinal data analysis • independent censoring in event history analysis

  17. Modelling the observable data Binary observations for individual i : Observed filtration: (Note that we for convenience have included in the definition of )

  18. Then: We will assume that is predictable, implying that the time-dependent dynamic covariates used for regression modelling depend only on observables Thus:

  19. Intoduce The are martingale differences is a discrete time martingale Predictable variation process:

  20. Modelling the relation between individuals Denote by Ftthe information available to the researcher on all children by day t We impose the following assumptions: (i) (ii) The assumptions are weaker than independence Nevertheless they are debatable [(i) in particular] for the diarrhoea data Note that (ii) implies that the martingales and are orthogonal

  21. An additive model for longitudinal binary data Have the decomposition Let xi1t ,…, xiptbe predictable covariates for child i at day t Consider the model

  22. Conditional on "the past" Ft-1 we at day t have i.e. a linear regression model We may estimate the by ordinary least squares at each day t (quick!) The estimates for each day will be quite unstable, but they may be accumulated over time to get stable estimates for the cumulative regression coefficients

  23. Some estimated cumulative regression coefficients for a model for incidence with fixed covariates (may be interpreted as expected numbers)

  24. We have (using "obvious" matrix notation) martingale transformation Properties may be derived using martingale methods as for Aalen's additive hazards model for time-continuous event history data. In particular is approximately multivariate normal with a covariance matrix that may be estimated by

  25. Dynamic covariates How can past episodes of diarrhoea be used to predict future episodes?

  26. Consider dynamic covariates of the form: with Yisincidence (prevalence) of diarrhoea Use t = 30 days and r = 0.01 below

  27. A dynamic covariate may be on the causal pathway between a fixed covariate and the event process The inclusion of dynamic covariates in the analysis may distort the estimation of the effects of the fixed covariates To avoid such distortion we at each time t regress the dynamic covariates on the fixed covariates and use the residuals from these fits as new covariates This procedure keeps the effect of the fixed covariates the same as in the model without the dynamic covariates

  28. Cumulative regression coefficients for incidence: Average number of days with diarrhoea Average number of diarrhoea episodes Also: male, 3 or more per bedroom, contaminated water source, open sewerage, rain affected accommodation, young mother

  29. Martingale residual processes martingale transformation Examples of standardized martingale residual processes (standardized by model based SDs)

  30. Empirical standard deviations of the martingale residual processes:

  31. Cumulative regression coefficients for prevalence: Diarrhoea previous day (lag 1) Average number of days with diarrhoea Baseline Lag 2 (residual effect) Lag 3 (residual effect) Lag 4 (residual effect) Also: male, age, 3 or more per bedroom, poor street, contaminated water storage and source, standing water, open sewerage, rain affected accommodation, young mother

  32. Prevalence: empirical standard deviations of the martingale residual processes

  33. Not Markovian!

  34. Concluding comments: A dynamic additive model provides a flexible framework for analyzing longitudinal binary data The method illustrate how ideas and approaches from event history analysis may be useful for analysis of longitudinal data Advantage: method is computationally very quick Drawback: incidence and prevalence are not restricted to the range 0 to 1 Methodological work is needed, in particular on methods for model selection and goodness-of-fit

More Related