1 / 30

Data and Statistics: New methods and future challenges Phil O’Neill University of Nottingham

Data and Statistics: New methods and future challenges Phil O’Neill University of Nottingham . Professors: How they spend their time. Professors: How they spend their time. 1. High-resolution genetic data 2. Model assessment . 1. High-resolution genetic data 2. Model assessment .

lexine
Download Presentation

Data and Statistics: New methods and future challenges Phil O’Neill University of Nottingham

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data and Statistics: New methods and future challengesPhil O’NeillUniversity of Nottingham

  2. Professors: How theyspend their time

  3. Professors: How theyspend their time

  4. 1. High-resolution genetic data2. Model assessment

  5. 1. High-resolution genetic data2. Model assessment

  6. Gardy 2011 NEJM

  7. “High-resolution genetic data”: what are they? individual-level data on the pathogen can be taken at single or multiple time points  high-dimensional e.g. whole genome sequences proportion of individuals sampled could be high/low  becoming far more common due to cost reduction

  8. “High-resolution genetic data”: what use are they? better inference about transmission paths more reliable estimates of epi quantities? understand evolution of the pathogen

  9. .

  10. . A C C C T T G G G A A A .....

  11. Modelling and Data Analysis methodsTwo kinds of approaches exist:1. Separate genetic and epidemic components (e.g. Volz, Rasmussen) 2. Combine genetic and epidemic components (e.g. Ypma, Worby, Morelli)

  12. 1. Separate genetic and epidemic componentse.g: - estimate phylogenetic tree - given the tree, fit epidemic modelor - cluster individuals into genetically similar groups - given the groups, fit multi-type epidemic model

  13. 1. Separate genetic and epidemic components + “Simple” approach + Avoids complex modelling- Ignores any relationship between transmission and genetic information

  14. 2. Combine genetic and epidemic componentse.g: - model genetic evolution explicitly - define model featuring both genetic and epidemic parts

  15. 2. Combine genetic and epidemic components + “Integrated” approach - Is modelling too detailed? - Initial conditions: typical sequence?+/- Model differences between individuals instead?

  16. 1. High-resolution genetic data2. Model assessment

  17. “Model assessment”: what is it? Does our model fit the data? Is there a better model?

  18. “Model assessment”: why do it? Poor fit sheds doubt on conclusions from modelling Model choice can be a tool for directly addressing questions of interest

  19. Linear regression: yk= axk + b + ek, ek ~ N(0,v)Minimise distance of model mean from observed data

  20. Linear regression: yk= axk + b + ek, ek ~ N(0,v)Minimise distance of model mean from observed data

  21. For outbreak data: What are the right residuals? Should observed or unobserved data be compared to the model? (Streftaris and Gibson) Mean model may only be available via simulation Is the mean the right quantity to consider?

  22. For outbreak data: What are the right residuals? Should observed or unobserved data be compared to the model? (Streftaris and Gibson) Mean model may only be available via simulation Is the mean the right quantity to consider?

  23. Simulation-based approaches to model fit: Forward simulation – “close” to data? Choice of summary statistics? Close ties to ABC methods (McKinley, Neal)

  24. Approaches to model choice  Hypermodels/saturated models Bayesian non-parametric methods Bayesian methods e.g. RJMCMC Mixture models

  25.  Hypermodels/saturated modelse.g. Infection rates βS or βSI or βSI0.5 in an SIR model? Instead use βSI and estimate  (O’Neill and Wen)

  26.  Bayesian non-parametric methodse.g. Infection rate β(t)SI or β(t) in an SIR model; Estimate β(t) in a Bayesian non-parametric manner using Gaussian process machinery (Kypraios,O’Neill and Xu; Knock and Kypraios)

  27.  Reversible Jump MCMCe.g. Distinct models (usually small number), estimate Bayes factors by running MCMC on union of parameter spaces (O’Neill; Neal and Roberts; Knock and O’Neill)

  28.  Mixture modelse.g. Given two models (f, g), create mixture model f(x) =  g(x) + (1-  ) h(x);estimation of  enables estimation of Bayes Factors (Kypraios and O’Neill)

More Related