1 / 53

A Data-driven Epidemiological Model

A Data-driven Epidemiological Model. Stephen Eubank, Christopher Barrett, Madhav V. Marathe GIACS Conference on "Data in Complex Systems" Palermo, Italy, April 7-9 2008. Data driven epidemiological models. Complex system Data driven, individual-based simulation Privacy and accuracy issues.

Download Presentation

A Data-driven Epidemiological Model

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Data-driven Epidemiological Model Stephen Eubank, Christopher Barrett, Madhav V. Marathe GIACS Conference on"Data in Complex Systems"Palermo, Italy, April 7-9 2008 Network Dynamics and Simulation Science Laboratory

  2. Data driven epidemiological models • Complex system • Data driven, individual-based simulation • Privacy and accuracy issues Network Dynamics and Simulation Science Laboratory

  3. What’s so complex about epidemiology? Consider an “outbreak” among 4 people susceptible infectious removed Network Dynamics and Simulation Science Laboratory

  4. Outbreaks can be represented as Markov processes A given configuration of the system probabilistically transitions into any of several other configurations. Even a small system has many possible configurations. Network Dynamics and Simulation Science Laboratory

  5. Very little data is available to estimate this process Historically, we (partially) observe 1 or 2 Markov chains We want to estimate transition probabilities on every edge Network Dynamics and Simulation Science Laboratory

  6. Aggregation simplifies the model … #I 4 3 2 1 0 0 1 2 3 4 #S … at the cost of reduced information content. p(C’t+1 | C’t) is less informative than p(Ct+1 | Ct) when C’  C, Network Dynamics and Simulation Science Laboratory

  7. Other assumptions further simplify the model … … but are unwarranted in social systems, where components are Heterogenous (distinguishable) Intentional (behavior not determined by physical laws) Network Dynamics and Simulation Science Laboratory

  8. Aggregation naturally makes contact with observations Observations of outbreaks often ignore heterogeneity and intention, and provide only point estimates. Network Dynamics and Simulation Science Laboratory

  9. “An approximate answer to the right problem is worth a good deal more than an exact answer to an approximate problem” - J. Tukey “All models are wrong, but some are useful” - G.E.P. Box A system is complex “if its behavior crucially depends on the details of its parts.” - G. Parisi

  10. Interaction approach simplifies process itself Interactions among system components completely determine transition probabilities among configurations replaced with  Network Dynamics and Simulation Science Laboratory

  11. Calibrating with unexpectedly rich data • For aerosol borne pathogens, the probability of transmission is related to physical proximity, duration, etc. • The interaction approach reduces to estimating a social network. • There is much more data available for this than for outbreaks. • But it is not directly observable. Network Dynamics and Simulation Science Laboratory

  12. How can we estimate a social network?

  13. A possible approach we didn’t use • Consider a subset of random networks subject to certain constraints • Constraints should be relevant to the global dynamics, i.e. epidemics • But what are those? A “chicken or the egg” problem: It would seem offhand that a taxonomy of “nets” … would arise naturally from the consideration of the statistical parameters... But the statistical parameters themselves are singled out on the basis of taxonomic considerations, which have yet to be clarified. - Anatol Rapoport and William Horvath, Behav Sci. 1961, 6, 279–291 Network Dynamics and Simulation Science Laboratory

  14. Questions to drive model development • What is the optimal targeted allocation of antivirals used prophylactically or therapeutically to mitigate influenza pandemic? • What combination of targeted antivirals and feasible, community-based, non-pharmaceutical interventions (e.g. closing schools, allowing liberal leave from work) can best delay an outbreak from becoming epidemic for several months? 1 & 2  Models must compare changes in social network with changes in transmissibility This is an example of policy informatics for complex systems Network Dynamics and Simulation Science Laboratory

  15. Interventions specified naturally by effect on network No single “knob” reduces overall transmission by 50% Network Dynamics and Simulation Science Laboratory

  16. Step 1. Create a synthetic population • Census data • Individual demographics • Age and gender • Household characteristics • Size and Income Network Dynamics and Simulation Science Laboratory

  17. Successive refinement of synthetic data • Start from a proto-population, e.g. a list of ids. • Add observed data • Capture correlations in data using statistical models (iterative proportional fitting from Public Use Microdata) • Start from a proto-population, e.g. a list of ids. • Add observed data • Start from a proto-population, e.g. a list of ids. Network Dynamics and Simulation Science Laboratory

  18. Step 2. Assign activities, locations & times • Locations • Dunn and Bradstreet data • Activity surveys • Matched to households by demographics • Matched to locations by activity type & travel time Network Dynamics and Simulation Science Laboratory

  19. Successive refinement of synthetic data • Surveys are very different kinds of data sources than census • This step depends on data fusion capability • Some values may be outcomes of very large games, not statistical models • Surveys are very different kinds of data sources than census • This step depends on data fusion capability Network Dynamics and Simulation Science Laboratory

  20. So far: a typical family’s day Work Lunch Work Carpool Carpool Shopping Home Home Car Car Daycare Bus School Bus time Network Dynamics and Simulation Science Laboratory

  21. Overlapping families’ days create a social network Network Dynamics and Simulation Science Laboratory

  22. Successive refinement of synthetic data • Gives us a generative model for contacts • More powerful than traditional encapsulated agents • Note: each byte of data / person adds ~300 MB to the database Network Dynamics and Simulation Science Laboratory

  23. Using data for purposes other than intended Possibly the only epidemiological model that hasbeen calibrated using automobile traffic counts! (Because the same activity model generates both transportation demand and contact networks) Network Dynamics and Simulation Science Laboratory

  24. Activities adapt to situation & generate network changes Home Home Network Dynamics and Simulation Science Laboratory

  25. Derive disease interaction from social network Interactions only need to get a few things right: • Susceptibility • Infectivity as a function of time since exposure Network Dynamics and Simulation Science Laboratory

  26. Modeling pandemic influenza • Nobody knows what pandemic flu will look like • Assume something like seasonal flu, but with less immunity • Create several “flu” bugs in siico • Moderate (10% attack rate) • Strong (20 - 25% attack rate) • Catastrophic (> 50% attack rate) • For each, fix other characteristics: • Incubation period: 2-3 days • Infectious period: 2-5 days Network Dynamics and Simulation Science Laboratory

  27. Resolution, fidelity, and accuracy are different • Resolution describes level of aggregation, e.g. individuals vs populations • Fidelity describes the completeness of the representation’s features,e.g. age vs (age, gender, income, household size, education) • Accuracy describes the correctness of features and correlationse.g. is mixing by age derived from social network correct? “Validity” (always for a particular question) depends on all 3. Network Dynamics and Simulation Science Laboratory

  28. Effect of changes in social networks (above) on disease dynamics (below)

  29. Characterizing the resulting network Network Dynamics and Simulation Science Laboratory

  30. Degree Distribution, location-location

  31. Degree Distribution, people-people

  32. Sensitivity to parameters

  33. Sensitivity to parameters

  34. Assortative Mixing • Static people - people projection is assortative • by degree (~0.25) • but not as strongly by age, income, household size, … This is • Like other social networks • Unlike • technological networks, • Erdos-Renyi random graphs • Barabasi-Albert networks Network Dynamics and Simulation Science Laboratory

  35. Removing high degree people useless

  36. Removing high degree locations better

  37. Summary • Complex systems models are hungry for detail (= data) • Privacy & extrapolation require “synthetic” data, combining observations (declarative), statistical models, and simulation results (procedural) • Validity of synthetic data depends on resolution, fidelity, accuracy, and the question it is intended to answer Network Dynamics and Simulation Science Laboratory

  38. When is this model simpler? Notation: x and y are states of a component at time t and t+1 • Components’ states are updated independently:# parameters  • Interactions are pairwise independent:# parameters  Network Dynamics and Simulation Science Laboratory

  39. When is this model simpler? • Most components do not interact directly:# parameters  • Only one state transition, S  I, is affected by interactions:# parameters  Network Dynamics and Simulation Science Laboratory

  40. Architecture

  41. Computational Resources • Demonstration experiment • 8 experiments (exp ids: 1083 to 1090) • 24 cells with 200 days and 25 reps • Computations performed • 291 million contacts * 200 days * 25 reps * 24 cells = 34.92quadrillion transmission evaluations • Time Requirements • Single processor: 2 years 340 days • Small cluster (10 nodes, 4 cores): 26 days 18 hours • Current IDAC cluster: > 3 hours Network Dynamics and Simulation Science Laboratory

  42. Example Located Synthetic Population Network Dynamics and Simulation Science Laboratory

  43. Example Route Plans first person in household second person in household

  44. Time Slice of a Typical Family’s Day Network Dynamics and Simulation Science Laboratory

  45. How much does detail matter? How much does detail matter? • Interaction picture: • Dynamics of outbreak depend on topology • How and how much? • What differences in network topology are relevant to prevention/mitigation • What statistics capture difference? • Answer staring us in the face (see above): • Overall attack rate is a function of the topology of the network • Other measures for other questions • Attack rate by transmissibility as function of edges retained • Vulnerability of a subset as function of edges retained • Distribution of vulnerabilities as function of edges retained Network Dynamics and Simulation Science Laboratory

More Related