330 likes | 339 Views
Explore the use of disease signatures to better understand and predict disease outbreaks. Learn about the challenges of communication in epidemiology and the need for clear parameter definitions. Discover how disease signatures can be built and applied to different populations to improve accuracy in disease analysis.
E N D
Disease signatures – a simple combinatorial-type exploitation of them for our own evil purposes • Prof. Nina H. Fefferman • Visiting DIMACS from : • Tufts Univ. School of Medicine, Dept. Public Health and Family Medicine
Plan for today: • Looking very quickly at traditional SIR models • Communication problems • Tweaking parameter definitions • Using these definitions to clear up communication • Building disease signatures • Decomposing reported disease into component signature curves • Checking this method against reality • Where this method can take us from here…
A quick look at SIR models I(t) = number of infected S(t) = number of susceptibles R(t) = number of recovered in the population at time t And if we want spatial spread : Keep R, but I(t, x, y) and S(t, x, y) become functions of position (x, y), and a is replaced by an expression involving two other constants related to the rate at which the infection diffuses through space Go ask HHS or NIH or CDC for a and b for the next flu season so our models can predict it. Good luck. Pictures of equations stolen from : http://maven.smith.edu/~callahan/ili/pde.html
Leads us to :Communication Problems Parameters/Variables used by epidemiologists are warm and fuzzy and not rigorously defined So modelers made up their own (you just saw them) – these aren’t things doctors/public health people can really measure we can’t get accurate parameter values Example: MANY people are worried about outbreaks There is no good definition of what constitutes an outbreak BIG problem (mostly just ignored) Modelers use the concept of R0 – the reproductive number of disease (in the differential equation model, it’s the ratio of S to a/b) It’s when the average number of new infections caused by contact with a current infection is greater than 1
Communication Problems cont. R0 gives us a rigorous definition of something good, but not of what we really need ‘outbreak’ to mean Really, if we think about it, public health people want ‘outbreak’ to refer to “times when we need to pay attention to disease spread for some reason” How can we say this mathematically?
Communication Problems cont. What can public health people/ doctors measure (at least sometimes)? • Infectivity: Probability of becoming infectious after becoming exposed • Attack rate:Probability of developing disease after becoming exposed • Pathogenicity : Probability of developing disease after becoming infected • Virulence :Probability of dying after becoming ill • Immunogenicity: Attack rate for re-exposure
Tweaking Parameter Definitions So : • E(X,T)= Probability of exposure in population X at time T • I = Probability of infection from exposure • ST = Probability that infection at time 0 leads to manifestation of symptoms at time T (a distribution function which does not need to sum to one if not all of the infected develop symptoms) Really, these are all functions of time, but my journal referees got upset with functions, so most are now subscripts • CT = Probability that infection takes T days to become contagion • MT = Probability that the time from the onset of symptoms to death from the disease is T days • NT= Size of the population possibly exposed to infection on day T (this will be our disease signature curve) • IT = Probability of infection from current exposure, given previous infection T days ago
Clearing up communication Pathogenicity : The probability of developing disease after becoming infected = ST, for n the maximum recovery time n n n T=0 T=0 T=0 Virulence :The probability of dying after becoming ill = MT, for n the maximum recovery time Infectivity :The probability of becoming infectious after becoming exposed = I* CT , for n the end of the window for the disease With those we can build :
Clearing up communication cont. Attack rate:The probability of developing disease after becoming exposed = I * ST , for n the end of the window for disease expression n T=0 And : But now we notice that, from our original list, Immunogenicity is not a truly meaningful idea, so we define instead: PsuedoImmunogenicity : Probability of infection from current exposure, given previous infection T days ago = IT We won’t be using all of these today, but they’re still useful to have if you ever need to talk to health people
Clearing up communication cont. Uses a slightly different notation Now both the math and health people have the same picture! But this is only one town The SIR models could handle spatial spread with PDEs…
Clearing up communication cont. ? ? With multiple locations and central reporting :
Clearing up communication cont. Notice : different occurrences don’t have to be separated only spatially or temporally Can be different demographic populations, or anything that allows narrower, more accurate estimations of exposure or susceptibility Let’s call these narrower things subpopulations
Building Disease Signatures So, using our definitions and our flow chart: For a given subpopulation, we can compute a ‘disease signature curve’ representing the number of cases predicted over time from a single instance of exposure Notice : these signature curves depend on subpopulation-specific etiology, including the shape of the distribution for some parameters – not just averages
Decomposing curves into signatures So, if we have a total reported disease curve, we can iteratively define (Notice populations exposed on different days are disjoint sets due to the definitions) Now we can think of a single reported curve CT as the composition of these curves
Decomposing curves into signatures cont. Since we are interested in exploiting the heterogeneity of etiological response within a diverse population, we can specify these curves by subpopulation Y: Yielding the total disease incidence curve:
Decomposing curves into signatures cont. And we can even exploit immune memory by further dividing subpopulations into classes of those with similar immune protection from previous infection With IT = Probability of infection given previous infection T days ago And T* = the last day of most recent prior infection Giving us
Decomposing curves into signatures cont. Important because public health people may trust it Coins Sub-Populations 5¢ 10¢ 25¢ Now we can use high school math to find combinations of signature curves that make up the total reported cases curve! How many different combinations of coins can make $1.50… Similarly, we can ask how many combinations of ‘signature curves’ can go into a ‘Total Reported Cases’ curve:
Decomposing curves into signatures cont. Now let’s come back to the idea of an outbreak: Remember, we wanted ‘outbreak’ to mean “times when we need to pay attention to disease spread for some reason” Suppose that the only combination of disease signature curves was to have EVERY subpopulation just beginning to show symptoms from a disease – that means that soon many many more people will be sick – we should probably pay attention to that OR Maybe the only combination of signature curves indicates that only one location has been exposed – we might want to use that to find out what the source of exposure was, or quarantine the area No matter how we choose to define it (will be arbitrary), this method can tell us WHY we should care now
Decomposing curves into signatures cont. Let’s take a look at an example of how this can work To begin with, let’s look at something very simple : Giardiasis – a waterborne infection causing diarrheal disease in humans with extremely low levels of secondary transmission (makes life simpler) There was an actual “outbreak” in MA in 1995
Decomposing curves into signatures cont. Reported incidence for MA (all of it) HIPPA requires aggregation of data released to public and to most researchers without special access
Decomposing curves into signatures cont. To use this method, we need some measured parameter values I’m cheating a little because I’m assuming values for I, but we could in theory measure this
Decomposing curves into signatures cont. We know that most of the reporting came from 3 urban centers:
Decomposing curves into signatures cont. Then we can decompose by demographic subgroup for each town:
Decomposing curves into signatures cont. That was a really simple disease without any secondary transmission So what happens if there is secondary spread? It gets MUCH more complicated… First of all, the probability of exposure in each subpopulation can start to depend on the levels of infection in each other subpopulation Now we start getting into the social network stuff
An aside Social Networks : Oy vey Since this is a talk and not a course, I can’t leave this as an exercise to the reader, but I can use the ‘we only have a little over an hour’ excuse to hand-wave some of the modeling details on this – I’m going to talk about the concepts If you are interested in the details, well, that’s why I’m going to be around for the year
Again, rather than using mass averages, let’s still keep the idea of a disease signature So exposure isn’t a simple underlying rate - it’s based on contacting an infected individual We can think of individuals in each subgroup as having certain probabilities of interacting with others, possibly in other subgroups (People in the room who think of social interactions as edges in a graph, this is almost the same - it’s like weighted edges in a complete graph) Also, membership in particular subgroups can changes over time (e.g. children becoming adults) (In this case, both vertex states and edge weights can be thought of as vertex-state dependent progressions)
Checking Reality • This all gets complicated enough that it’s nerve wracking not to check model outcomes against some form of reality • Need to : • measure all model parameters • create disease outbreaks • check predicted spread against what actually happens • (I tried to get Thus Spake Zarathustra to play now, but I couldn’t make it work) My beautiful termites
Checking Reality cont. Metarhizium anisopliae On Thursday, at the DIMACS Mixer, I’ll be talking to you about ‘Why Termites’ For now, just go with it The particular details: Temporary Immunity Zootermopsis angusticollis Allogroomed off Spores land on termite Burrow through cuticle Not a termite Death
Checking Reality cont. • So we built some CA simulation models • Including age-based differences in : • direction of wandering through nest • interaction rates • exposure rates • susceptibility to infection from exposure • mortality from infection • efficacy/duration of induced immunity (via social vaccination) • As the model ran, individuals aged and behaved accordingly
Checking Reality cont. And… Thank god, all the work so far has shown that the models predict spread accurately Whew! We’re even getting some interesting new directions
Regardless of why specific outputs happen Now that we know the model can work, we can work backwards Fit model outcome to observed data and look at which sets of parameter values and behavioral mixing rates produce them This might provide an odd way of understanding human social networks – especially since they can so dramatically affect model output Maybe this last part is a pipe-dream. Who knows, but it’s so crazy it just might work…
Thanks for asking me to speak to you I hope you’ve had fun Some of what I’ve talked about has been accomplished in collaboration with Elena Naumova, James Traniello and Rebeca Rosengaus My thanks to the NIH for funding support for this research