540 likes | 788 Views
A Data-driven Epidemiological Model. Stephen Eubank, Christopher Barrett, Madhav V. Marathe GIACS Conference on "Data in Complex Systems" Palermo, Italy, April 7-9 2008. Data driven epidemiological models. Complex system Data driven, individual-based simulation Privacy and accuracy issues.
E N D
A Data-driven Epidemiological Model Stephen Eubank, Christopher Barrett, Madhav V. Marathe GIACS Conference on"Data in Complex Systems"Palermo, Italy, April 7-9 2008 Network Dynamics and Simulation Science Laboratory
Data driven epidemiological models • Complex system • Data driven, individual-based simulation • Privacy and accuracy issues Network Dynamics and Simulation Science Laboratory
What’s so complex about epidemiology? Consider an “outbreak” among 4 people susceptible infectious removed Network Dynamics and Simulation Science Laboratory
Outbreaks can be represented as Markov processes A given configuration of the system probabilistically transitions into any of several other configurations. Even a small system has many possible configurations. Network Dynamics and Simulation Science Laboratory
Very little data is available to estimate this process Historically, we (partially) observe 1 or 2 Markov chains We want to estimate transition probabilities on every edge Network Dynamics and Simulation Science Laboratory
Aggregation simplifies the model … #I 4 3 2 1 0 0 1 2 3 4 #S … at the cost of reduced information content. p(C’t+1 | C’t) is less informative than p(Ct+1 | Ct) when C’ C, Network Dynamics and Simulation Science Laboratory
Other assumptions further simplify the model … … but are unwarranted in social systems, where components are Heterogenous (distinguishable) Intentional (behavior not determined by physical laws) Network Dynamics and Simulation Science Laboratory
Aggregation naturally makes contact with observations Observations of outbreaks often ignore heterogeneity and intention, and provide only point estimates. Network Dynamics and Simulation Science Laboratory
“An approximate answer to the right problem is worth a good deal more than an exact answer to an approximate problem” - J. Tukey “All models are wrong, but some are useful” - G.E.P. Box A system is complex “if its behavior crucially depends on the details of its parts.” - G. Parisi
Interaction approach simplifies process itself Interactions among system components completely determine transition probabilities among configurations replaced with Network Dynamics and Simulation Science Laboratory
Calibrating with unexpectedly rich data • For aerosol borne pathogens, the probability of transmission is related to physical proximity, duration, etc. • The interaction approach reduces to estimating a social network. • There is much more data available for this than for outbreaks. • But it is not directly observable. Network Dynamics and Simulation Science Laboratory
A possible approach we didn’t use • Consider a subset of random networks subject to certain constraints • Constraints should be relevant to the global dynamics, i.e. epidemics • But what are those? A “chicken or the egg” problem: It would seem offhand that a taxonomy of “nets” … would arise naturally from the consideration of the statistical parameters... But the statistical parameters themselves are singled out on the basis of taxonomic considerations, which have yet to be clarified. - Anatol Rapoport and William Horvath, Behav Sci. 1961, 6, 279–291 Network Dynamics and Simulation Science Laboratory
Questions to drive model development • What is the optimal targeted allocation of antivirals used prophylactically or therapeutically to mitigate influenza pandemic? • What combination of targeted antivirals and feasible, community-based, non-pharmaceutical interventions (e.g. closing schools, allowing liberal leave from work) can best delay an outbreak from becoming epidemic for several months? 1 & 2 Models must compare changes in social network with changes in transmissibility This is an example of policy informatics for complex systems Network Dynamics and Simulation Science Laboratory
Interventions specified naturally by effect on network No single “knob” reduces overall transmission by 50% Network Dynamics and Simulation Science Laboratory
Step 1. Create a synthetic population • Census data • Individual demographics • Age and gender • Household characteristics • Size and Income Network Dynamics and Simulation Science Laboratory
Successive refinement of synthetic data • Start from a proto-population, e.g. a list of ids. • Add observed data • Capture correlations in data using statistical models (iterative proportional fitting from Public Use Microdata) • Start from a proto-population, e.g. a list of ids. • Add observed data • Start from a proto-population, e.g. a list of ids. Network Dynamics and Simulation Science Laboratory
Step 2. Assign activities, locations & times • Locations • Dunn and Bradstreet data • Activity surveys • Matched to households by demographics • Matched to locations by activity type & travel time Network Dynamics and Simulation Science Laboratory
Successive refinement of synthetic data • Surveys are very different kinds of data sources than census • This step depends on data fusion capability • Some values may be outcomes of very large games, not statistical models • Surveys are very different kinds of data sources than census • This step depends on data fusion capability Network Dynamics and Simulation Science Laboratory
So far: a typical family’s day Work Lunch Work Carpool Carpool Shopping Home Home Car Car Daycare Bus School Bus time Network Dynamics and Simulation Science Laboratory
Overlapping families’ days create a social network Network Dynamics and Simulation Science Laboratory
Successive refinement of synthetic data • Gives us a generative model for contacts • More powerful than traditional encapsulated agents • Note: each byte of data / person adds ~300 MB to the database Network Dynamics and Simulation Science Laboratory
Using data for purposes other than intended Possibly the only epidemiological model that hasbeen calibrated using automobile traffic counts! (Because the same activity model generates both transportation demand and contact networks) Network Dynamics and Simulation Science Laboratory
Activities adapt to situation & generate network changes Home Home Network Dynamics and Simulation Science Laboratory
Derive disease interaction from social network Interactions only need to get a few things right: • Susceptibility • Infectivity as a function of time since exposure Network Dynamics and Simulation Science Laboratory
Modeling pandemic influenza • Nobody knows what pandemic flu will look like • Assume something like seasonal flu, but with less immunity • Create several “flu” bugs in siico • Moderate (10% attack rate) • Strong (20 - 25% attack rate) • Catastrophic (> 50% attack rate) • For each, fix other characteristics: • Incubation period: 2-3 days • Infectious period: 2-5 days Network Dynamics and Simulation Science Laboratory
Resolution, fidelity, and accuracy are different • Resolution describes level of aggregation, e.g. individuals vs populations • Fidelity describes the completeness of the representation’s features,e.g. age vs (age, gender, income, household size, education) • Accuracy describes the correctness of features and correlationse.g. is mixing by age derived from social network correct? “Validity” (always for a particular question) depends on all 3. Network Dynamics and Simulation Science Laboratory
Effect of changes in social networks (above) on disease dynamics (below)
Characterizing the resulting network Network Dynamics and Simulation Science Laboratory
Assortative Mixing • Static people - people projection is assortative • by degree (~0.25) • but not as strongly by age, income, household size, … This is • Like other social networks • Unlike • technological networks, • Erdos-Renyi random graphs • Barabasi-Albert networks Network Dynamics and Simulation Science Laboratory
Summary • Complex systems models are hungry for detail (= data) • Privacy & extrapolation require “synthetic” data, combining observations (declarative), statistical models, and simulation results (procedural) • Validity of synthetic data depends on resolution, fidelity, accuracy, and the question it is intended to answer Network Dynamics and Simulation Science Laboratory
When is this model simpler? Notation: x and y are states of a component at time t and t+1 • Components’ states are updated independently:# parameters • Interactions are pairwise independent:# parameters Network Dynamics and Simulation Science Laboratory
When is this model simpler? • Most components do not interact directly:# parameters • Only one state transition, S I, is affected by interactions:# parameters Network Dynamics and Simulation Science Laboratory
Computational Resources • Demonstration experiment • 8 experiments (exp ids: 1083 to 1090) • 24 cells with 200 days and 25 reps • Computations performed • 291 million contacts * 200 days * 25 reps * 24 cells = 34.92quadrillion transmission evaluations • Time Requirements • Single processor: 2 years 340 days • Small cluster (10 nodes, 4 cores): 26 days 18 hours • Current IDAC cluster: > 3 hours Network Dynamics and Simulation Science Laboratory
Example Located Synthetic Population Network Dynamics and Simulation Science Laboratory
Example Route Plans first person in household second person in household
Time Slice of a Typical Family’s Day Network Dynamics and Simulation Science Laboratory
How much does detail matter? How much does detail matter? • Interaction picture: • Dynamics of outbreak depend on topology • How and how much? • What differences in network topology are relevant to prevention/mitigation • What statistics capture difference? • Answer staring us in the face (see above): • Overall attack rate is a function of the topology of the network • Other measures for other questions • Attack rate by transmissibility as function of edges retained • Vulnerability of a subset as function of edges retained • Distribution of vulnerabilities as function of edges retained Network Dynamics and Simulation Science Laboratory