1 / 50

Relating models to data: A review

Relating models to data: A review. P.D. O’Neill University of Nottingham. Caveats. Scope is strictly limited Review with a view to future challenges. Outline. Why relate models to data? How to relate models to data Present and future challenges. Outline. Why relate models to data?

ismet
Download Presentation

Relating models to data: A review

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Relating models to data: A review P.D. O’Neill University of Nottingham

  2. Caveats Scope is strictly limited Review with a view to future challenges

  3. Outline Why relate models to data? How to relate models to data Present and future challenges

  4. Outline Why relate models to data? How to relate models to data Present and future challenges

  5. 1. Why relate models to data? 1. Scientific hypothesis testing e.g. Can within-host heterogeneity of susceptibility to HIV explain decreasing prevalence? e.g.Did control measuresalone control SARS in Hong Kong?

  6. 1. Why relate models to data? 2. Estimation e.g. What is R0? e.g.What is the efficacy of a vaccine?

  7. 1. Why relate models to data? 3. What-if scenarios e.g. What would have happened if transport restrictions were in place sooner in the UK foot and mouth outbreak? e.g.How much would school closure prevent spread of influenza?

  8. 1. Why relate models to data? 4. Real-time analyses e.g. Has the epidemic finished yet? e.g. Are control measures effective?

  9. 1. Why relate models to data? 5. Calibration/parameterisation e.g. What range of parameter values are sensible for simulation studies?

  10. Outline Why relate models to data? How to relate models to data Present and future challenges

  11. 2. How to relate models to data 2.1 Fitting deterministic models Options include (i) “Estimation from the literature” (ii) Least-squares / minimise metric (iii) Can be Bayesian (Elderd, Dukic and Dwyer 2006)

  12. 2. How to relate models to data 2.2 Fitting stochastic models Available methods depend heavily on the model and the data.

  13. 2. How to relate models to data 2.2 Fitting stochastic models (i) Explicit likelihood e.g. Longini-Koopman model for household data (Longini and Koopman, 1982)

  14. 2. How to relate models to data P (Avoid infection from housemate) = p SEIR model within household P (Avoid infection from outside) = q Given data on final outcome in (independent) households, can formulate likelihood L (p,q)

  15. 2. How to relate models to data 2.2 Fitting stochastic models (i) Explicit likelihood (continued) Related household models examples: Bayesian analysis (O’Neill at al., 2000) Multi-type models (van Boven et al., 2007)

  16. 2. How to relate models to data 2.2 Fitting stochastic models (i) Explicit likelihood (continued) Methods include Max likelihood (e.g. Longini and Koopman, 1982) EM algorithm (e.g. Becker, 1997) MCMC (e.g. O’Neill et al., 2000) Rejection sampling (e.g. Clancy and O’Neill, 2007)

  17. 2. How to relate models to data 2.2 Fitting stochastic models (ii) No explicit likelihood Can arise due to model complexity and/or insufficient data

  18. Ever-infected Two-level mixing model 2. How to relate models to data Never-infected Sample Unseen

  19. 2. How to relate models to data • Individual-based • transmission models involve • unseen infection times

  20. 2. How to relate models to data • Even detailed data from • studies generally only give • bounds on unseen infection • times – e.g. infection occurs • between last –ve test and first • +ve test

  21. 2. How to relate models to data 2.2 Fitting stochastic models (ii) No explicit likelihood Solutions include: Use a simpler approximating model e.g. use pseudolikelihood, e.g. Ball, Mollison and Scalia-Tomba, 1997

  22. Ever-infected Two-level mixing model 2. How to relate models to data Never-infected Explicit interactions between households

  23. Ever-infected Two-level mixing model -> independent households model 2. How to relate models to data Never-infected In a large population, households are approximately independent

  24. 2. How to relate models to data 2.2 Fitting stochastic models (ii) No explicit likelihood Solutions include: Use a simpler approximating model e.g. discrete-time model instead of a continuous time model (e.g. Lekone and Finkenstädt, 2006)

  25. 2. How to relate models to data 2.2 Fitting stochastic models (ii) No explicit likelihood Solutions include: Direct approach – e.g. Martingale methods (Becker, 1989)

  26. 2. How to relate models to data 2.2 Fitting stochastic models (ii) No explicit likelihood Solutions include: Data augmentation: add in “missing data” or extra model parameters to formulate a likelihood

  27. 2. How to relate models to data 2.2 Fitting stochastic models (ii) No explicit likelihood: Data augmentation (continued) Common example - model describes individual-to-individual transmission - observe times of case ascertainment, test results, etc, but not times of infection/exposure - augment data with missing infection/exposure times

  28. Infectivity starts Infectivity ends TI 2. How to relate models to data TE Exposure time = +ve test Not observed Observed data = -ve test Höhle et al. (2005)

  29. 2. How to relate models to data 2.2 Fitting stochastic models (ii) No explicit likelihood: Data augmentation (continued) Data-augmentation methods include MCMC (e.g. Gibson and Renshaw, 1998; O’Neill and Roberts, 1999; Auranen et al., 2000) EM algorithm (e.g. Becker, 1997)

  30. 2. How to relate models to data 2.2 Fitting stochastic models (ii) No explicit likelihood: Data augmentation (continued) Data-augmentation methods can also be used in less “obvious” settings e.g. final size data for complex models

  31. Ever-infected Two-level mixing model 2. How to relate models to data  Data Never-infected Augment parameter space using links to describe potential infections Demiris and O’Neill, 2005

  32. Outline Why relate models to data? How to relate models to data Present and future challenges

  33. 3. Present & future challenges 3.1 Large populations/complex models Current methods often struggle with large-scale problems. e.g: Large population, Many missing data, Many hard-to-estimate parameters/covariates

  34. 3. Present & future challenges 3.1 Large populations/complex models e.g. UK foot & Mouth outbreak 2001 Keeling et al. (2001) stochastic discrete-time model, parameterised via likelihood estimation and tuning/ simulation. Attempting to fit this kind of model using “standard” Bayesian/MCMC methods does not work well.

  35. 3. Present & future challenges Large data set and many missing data can cause problems for standard (and also non-standard) MCMC

  36. 3. Present & future challenges 3.1 Large populations/complex models e.g. Measles data Cauchemez and Ferguson (2008) discuss the problems that arise when fitting a standard SIR model to large-scale temporal aggregated data in a large population using standard methods.

  37. 3. Present & future challenges 3.1 Large populations/complex models Problems of this kind are usually tackled via approximations (e.g. of the model itself). Challenge: Can generic non-approximate methods be found?

  38. 3. Present & future challenges 3.2 Data augmentation Comment: this technique is surprisingly powerful and is (probably) under-developed.

  39. 3. Present & future challenges 3.2 Data augmentation e.g. Cauchemez and Ferguson (2008) use a novel MCMC data-augmentation scheme using a diffusion model to approximate an SIR epidemic model.

  40. 3. Present & future challenges 3.2 Data augmentation e.g. For final size data, instead of imputing a graph describing infection pathways, could instead impute generations of infection (joint work with Simon White). This can lead to much faster MCMC algorithms.

  41. Ever-infected Two-level mixing model 3. Present & future challenges Never-infected Imputing edges in graph

  42. Ever-infected Two-level mixing model 3. Present & future challenges Never-infected 2 Infection chain = {1, 3, 1, 2, 1} 1 2 3 4 2 5 4

  43. 3. Present & future challenges 3.2 Data augmentation e.g. Augmented data can also (sometimes) be used to bound quantities of interest. Clancy and O’Neill (2008) show how to obtain stochastic bounds on R0 and other quantities by considering “minimal” and “maximal” configurations of unobserved infection times in an SIR model.

  44. 3. Present & future challenges x x x x x Observed removal times 3.2 Data augmentation x Imputed infection times

  45. 3. Present & future challenges x x x x x Observed removal times Soon as possible 3.2 Data augmentation x Imputed infection times

  46. 3. Present & future challenges x x x x x Observed removal times Late as possible 3.2 Data augmentation x Imputed infection times Can show that “Soon as possible” maximises R0 but that minimal value is not necessarily given by “Late as possible” – use Linear Programming to find actual solution. General idea also applicable to final outcome data

  47. 3. Present & future challenges 3.3 Model fit and model choice Various methods are used in the literature to assess model fit, e.g. Simulation-based methods; use of Bayesian predictive distribution; standard methods where applicable; Bayesian p-values

  48. 3. Present & future challenges 3.3 Model fit and model choice Likewise for model choice methods include AIC, RJMCMC Challenge Better understanding of pros and cons of such methods

  49. References B. D. Elderd, V. M. Dukic, and G. Dwyer (2006) Uncertainty in predictions of disease spread and public health responses to bioterrorism and emerging diseases. PNAS 103, 15693-15697 I.M. Longini, Jr and J.S. Koopman (1982) Household and community transmission parameters from final distributions of infections in households. Biometrics 38, 115-126. P.D. O'Neill, D. J. Balding, N. G. Becker, M. Eerola and D. Mollison (2000) Analyses of infectious disease data from household outbreaks by Markov Chain Monte Carlo methods. Applied Statistics 49, 517-542. M. Van Boven, M. Koopmans, M. D. R. van Beest Holle, A. Meijer, D. Klinkenberg, C. A. Donnelly and H.A.P. Heesterbeek (2007) Detecting emerging transmissibility of Avian Influenza virus in human households. PLoS Computational Biology 3, 1394-1402. D. Clancy and P.D. O'Neill (2007) Exact Bayesian inference and model selection for stochastic models of epidemics amonga community of households. Scandinavian Journal of Statistics 34, 259-274. N.G. Becker (1997) Uses of the EM algorithm in the analysis of data on HIV/AIDS and other infectious diseases. Statistical Methods in Medical Research 6, 24-37. F.G. Ball, D. Mollison and G-P. Scalia-Tomba (1997) Epidemic models with two levels of mixing. Annals of Applied Probability 7, 46-89. M. Höhle, E. Jørgensen. and P.D. O'Neill (2005) Inference in disease transmission experiments by using stochastic epidemic models. Applied Statistics 54, 349-366.

  50. References… N. G. Becker (1989) Analysis of Infectious Disease Data. Chapman and Hall, London. G. Gibson and E. Renshaw (1998). Estimating parameters in stochastic compartmental models usingMarkov chain methods. IMA Journal of Mathematics Applied in Medicine and Biology 15, 19-40. P.D. O’Neill and G.O. Roberts (1999) Bayesian inference for partially observed stochastic epidemics. Journal of the Royal Statistical Society Series A 162, 121-129. K. Auranen, E. Arjas, T. Leino and A. K. Takala (2000) Transmission of pneumococcal carriage in families: a latent Markov process model for binary longitudinal data. Journal of the American Statistical Association 95, 1044-1053. P.E. Lekone and B.F. Finkenstädt  (2006) Statistical Inference in a stochastic epidemic SEIR model with control intervention: Ebola as a case study.  Biometrics 62, 1170-1177.  M.J. Keeling, M.E.J. Woolhouse, D.J. Shaw, L. Matthews, M. Chase-Topping, D.T. Haydon, S.J. Cornell, J. Kappey, J. Wilesmith, B.T. Grenfell (2001). Dynamics of the 2001 UK Foot and Mouth Epidemic: Stochastic Dispersal in a Heterogeneous Landscape. Science 294, 813-817. S. Cauchemez and N.M. Ferguson (2008). Likelihood-based estimation of continuous-time epidemic models from time-series data: application to measles transmission in London. Journal of the Royal Society Interface 5, 885-897. D. Clancy and P.D. O'Neill (2008) Bayesian estimation of the basic reproduction number in stochastic epidemic models. Bayesian Analysis, in press.

More Related