340 likes | 556 Views
Statistical Modeling of SARS Epidemic Propagation via Branching Processes. V.Kamalesh, V.Kuralmani, Goh Li Ping, Qian Long, Fu Xiuju, Terence Hung. Software & Computing Programme Institute of High Performance Computing.
E N D
Statistical Modeling of SARS Epidemic Propagation via Branching Processes V.Kamalesh, V.Kuralmani, Goh Li Ping, Qian Long, Fu Xiuju, Terence Hung Software & Computing Programme Institute of High Performance Computing “To succeed in containing SARS in Singapore, everyone must cooperate and play his part.” - Prime Minister Goh Chok Tong
History of Branching Process The study of branching processes originated with a mathematical puzzle posed by Sir Francis Galton, the noted cousin of Charles Darwin, in the Educational Times of 1 April 1873. Branching process may be viewed as a mathematical representation of the evolution of a population wherein the reproduction and death are subject to the laws of chance.
Galton’s Puzzle A large nation, of whom we will only concern ourselves with the adult males, N in number, and who each bear separate surnames, colonise a district. Their law of population is such that, in each generation, P0 per cent of the adult males have no male children who reach adult life; P1 have only one such male child; P2 have 2, and so on up to P5 who have 5. Find(1) What proportion of the surnames will have become extinct after r generations; and (2) how many instances there will be of the same surname being held by m persons A solution was proffered by the Rev. Henry William Watson, and from his 1874 joint paper with Galton , the mathematical tool of branching emerged, the Galton-Watson Process.
Examples of BP Propagation of human and animal species and genes Nuclear chain reaction Electronic cascade phenomena Epidemic Models
Bienayme-Galton-Watson BP Bienayme-Galton-Watson BP can be thought of as a stochastic model of an evolving population of particles or individuals. It starts at time 0 with Z(0) particles, each of which splits into a random number of offspring that constitute the first generation, and so on. The number of “offspring” produced by a single “parent” particle at any time is independent of the history of the process, and of other particles existing at the present.
The archetypal branchingProcess (Galton-Watson): Discrete reproduction periods (‘generations’; no overlap or parents equivalent to offspring) 1 type of individuals, with identical offspring distribution They do not affect each other’s reproduction Distributions of offspring numbers do not change in time
BP as an epidemic Model Branching processes can be adopted as models for the spread of epidemic diseases. Infections directly due to an infective are the offspring One can approximate the infective population during the early stages of the epidemic by a branching process Minor epidemic: Extinction of the branching process Major epidemic: Non-extinction of the branching process
Specification & standard details A Galton-Watson process {xn; n=0,1,2,…} is a Markov chain defined on a probability space (Ω,Γ,P) with state space Δ={0,1,…} and it has the representation x0 = N, some specified positive integer, x1 = ξ1 + ξ2 + … + ξx0 x2 = ξx0+1 + ξx0+2 + …+ ξx0+x1 . . . xn = ξx0+x1+…+xn-2+1 + …+ ξx0+x1+…+xn-1 and xn = 0 if xn-1 = 0, n ≥ 1 where ξi, i=1,2,… are independent and identically (iid) distributed non-negative integer valued rv on (Ω,Γ,P) and their common probability law is given by P(ξi = k) = pk, k = 0,1,…; ∑ pk = 1
The Model A Galton-Watson process is a Markov chain {X(n); n ≥ 0} on the non-negative integers, where for n ≥ 0 X(n+1) = ξ(n+1,1) + … ξ(n+1,X(n)) if X(n) ≥ 0 = 0 if X(n) = 0 and {ξ(n,r); r,n ≥1} are independent random variables, identically distributed like ξ (say) and with other additional assumptions. Also E(ξi) = m
Offspring mean (m) Since the offspring mean of a branching process indicates almost sure extinction or possible explosion of a population, there is considerable interest in knowing the value of this criticality parameter (growth rate parameter, basic reproductive rate) The offspring mean (m) is also known as the infection rate and its estimation is of great interest The problem of estimation of ‘m’ arises when we deal with the problem of determining vaccination policies aimed at preventing major epidemics
Estimation of offspring mean Galton-Watson BP is classified as: Sub-critical if m < 1 (always extinction, finite expected time to extinction) Critical if m = 1 (always extinction, infinite expected time to extinction) Super-critical if m > 1 (probability of extinction smaller than 1) Offspring mean indicates the (almost) sure extinction or possible explosion of a population One of the basic problems of the statistics of a G-W process is to find a ‘good’ estimator for m Estimation methods: MLE, Least-squares, Ratio, Moment type, Bayes, etc.
Probability of extinction A parameter of special interest is the following: ∞ ∞ q = P(U ∩{xk = 0} = P(xn → 0) = P(E) (say) n=1 k=n This is referred to as the probability of extinction of a G-W process with x0 = 1 It can be verified that: q = 1 if m ≤ 1, and q < 1 if m > 1 Estimation of q is relevant when one is dealing with the recognition of a new mutation in a genetic population
Immigration Process Estimation of the offspring mean ‘m’ breaks down in the sub-critical case ( when 0 < m < 1), in view of extinction being almost certain in such situations. The introduction of an immigration process into the system facilitates the estimation of the offspring and immigration mean under the sub-critical case. The analysis of a G-W process with immigration has some interesting conclusions: for example, if the mean of the offspring distribution is > 1, immigration makes very little difference to the eventual behaviour of the process.
BP with immigration The simple subcritical G-W process X = {X(t); t=0,1,2,…} with immigration, has the specification that X(0) is a non-negative integer-valued random variable, and for t≥1, X(t) = z(t,1) + …z(t,X(t-1)) + Y(t) if X(t-1) > 0 = Y(t) if X(t-1) = 0 and {z(t,r); r,n ≥1} are independent random variables, identically distributed like z (say) and with other additional assumptions. Y(t) is the immigration component
Data Source The data was taken from the following website: http://sarstracker.blogspot.com/ (source: Straits Times 12 April 2003). After careful study of the data, we transformed it into a format which could be used to fit the Galton-Watson branching process.
Methodology Study the links between the SARS affected patients and identify the generation they belong to. For example, z(0) is the initial number of patients, z(1) the next generation and so on. Hence z(0) is the parent and z(1) is the offspring for the first generation. Similarly z(1) is the parent and z(2) is the offspring for the second generation The parents are the infectives and the offspring the infection
Methodology (Cont.) Calculate the following probabilities: p(0) – probability of 0 person infected p(1) – probability of 1 person infected p(2) – probability of 2 persons infected p(3) – probability of 3 persons infected p(4) – probability of 4 or more persons infected (super spreader) Determine the time period and fit the Galton-Watson branching process
Generation Size Z is the generation for 5 generations, Z(0) to Z(5). These have been colour-banded to show clearly the number of offspring at each point. For example Z4=17 The population size of each generation is: Z(0) =1 (1 female) Z(1) = 25 (14 females + 11 males) Z(2) = 36 (21 females + 15 males) Z(3) = 72 (46 females + 26 males) Z(4) = 17 (10 females + 7 males) Z(5) = 6 (4 females + 2 males) Total = 157 (96 females + 61 males) 61.2% of SARS infected are females and 38.8% of them are males
Probability Calculation p(0) – probability of 0 person infected = 0.8344 p(1) – probability of 1 person infected = 0.0927 p(2) – probability of 2 persons infected = 0.01986 p(3) – probability of 3 persons infected = 0.01986 p(4) – probability of 4 or more persons infected (super spreader) = 0.0331 The mean of the offspring distribution is 1.0331
Software To model the SARS epidemic we use a JAVA program which simulates a single-type BP and computes the extinction probabilities. In this program we specify the distribution for offspring in a BP and "Maximum generations" giving the number of generations we wish to observe the BP. The program computes and displays the probabilities that the branching process will die out by generation g, for g = 1 to Maximum Generations. Source: Written by Julian Devlin, 8/97, for the text book “Introduction to Probability”, by Charles M. Grinstead & J. Laurie Snell
Probability of extinction We set the maximum generations to 30 and the results are:
Some Conclusions The probability that the SARS epidemic will eventually become extinct is 1. This is likely to happen in the 14th generation. Since this data has already encountered 5 generations, there can utmost be 9 more generations. Assuming each generation takes a maximum of 10 days, based on the given data the epidemic will last only for a maximum of 90 more days from 8 April 2003. This result is conditional upon the same environment and quarantine conditions.
Other related work @ IHPC • Auto-Regressive (AR) model • Assumptions • Every time series data consist of both deterministic and stochastic components. • The deterministic component gives rises to trends seasonal patterns and cycles. • While the stochastic component causes statistical fluctuations which have a short term correlation structure.
Auto-Regressive (AR) model • Methodology • Step 1: determine the maximum number of the sample data • Step 2: calculate the mean value of the sample data for previous time • Step 3: estimate the unknown parameters from historical data • Step 4: use the estimated parameters to predict future case numbers • Software • An in-house software in FORTRAN language has been developed. It is compatible with Window systems and UNIX systems
Auto-Regressive (AR) model • Result: two days prediction • use the previous data to predict the data of two days later Day number of patient starting from Mar 16 by two day prediction
Auto-Regressive (AR) model • Result: three days prediction • use the previous data to predict data of three days later Day number of patient starting from Mar 16 by three days prediction
Future Research … A Time Series approach to the study of a Branching Process Motivation: Venkataraman,K.N (1982) A Time Series approach to the study of the simple subcritical Galton-Watson process with immigration, Adv.Appl.Prob., 14, 1-20. Let ε(t) = 0 for t<0; ε(0) = X(0); and for t≥1, ε(t) = X(t) – m X(t-1) – λ Heyde and Seneta (1972) were the first to observe that the above equation is analogous to the first-order autoregressive model for time series Vital difference: In BP ε(t) is determined by X(t) whereas in the analogous time series model X(t) will be determined in terms of ε(t)