1 / 61

Space-Time Modeling and Application to Emerging Infectious Diseases

National Health Research Institutes. Space-Time Modeling and Application to Emerging Infectious Diseases. 李正宇. July 26th, 2005. Division of Biostatistics and Bioinformatics. Outline. Introduction STARMA Models Methods for STARMA Modeling and Software IEAST

sherry
Download Presentation

Space-Time Modeling and Application to Emerging Infectious Diseases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. National Health Research Institutes Space-Time Modeling and Application to Emerging Infectious Diseases 李正宇 July 26th, 2005 Division of Biostatistics and Bioinformatics

  2. Outline • Introduction • STARMA Models • Methods for STARMA Modeling and Software IEAST • Modeling Emerging Infectious Diseases using STARMA and IEAST • Conclusion

  3. Introduction

  4. Introduction Tobler’s First Law of Geography ‘‘Everything is related to everything else, but near things are more related than distant things.’’

  5. Introduction • Biological and ecological processes are often organized and correlated in both space and time. • Why use space-time data and space-time analyses? • Various space-time models • STKF, KKF, VARMA, STARMA, etc. • Why STARMA models? • Is emerging infectious diseases the only application?

  6. Scope of the Work • An efficient and robust STARMA modeling method • Space-time extensions of optimization algorithm and model fitness measures • Refinement of the space-time modeling procedure • Software development -- IEAST • The first general-purpose STARMA modeling and analysis software • Integrated Environment for Analyzing STARMA models • Application to the spread of WNV in an epidemic in Detroit • Modeling and analysis of Dead Crow Data • Modeling and analysis of Human Case Data • Cross analysis of Human Case Data and Dead Crow Data • Statistical inferences from these space-time analyses

  7. STARMA Models

  8. time = t time =t-1 Space-Time Variables Evolving over Time • zt,x : some ecological variable at spatial coordinates vector x at time t. zx forms a time series for location x. • These time series are not independent, but influence each other via spatial proximity. zt,(2,2) zt,(1,2) random noise zt,(2,1) zt,(0,0) time X Y

  9. General STARMA Models • The general STARMA model has the stochastic equation: • Model types: • STAR model (when k,b=0) • STMA model (when k,b=0) • Mixed model (when k,b 0 andk,b 0). |-----AR terms -----| |----- MA terms -----| The strengths of the autoregressive components is measured by k,band the strengths of shared moving average stochastic inputs are k,b.

  10. A Useful Form for STARMA Modeling • By introducing the spatial weight matrices W(l), we can express the general STARMA model as the following form: • This is the equation actually used for the implementation of IEAST and applications. where l : spatial lag, k : temporal lag; zt is the observation vector at time t; W(l) is the weight matrix for l-th order; kl are the parameters of autoregressive terms; kl are the parameters of moving average terms; et is the random noise vector at time t.

  11. 1st order, W(1) 2nd order, W(2) 3rd order, W(3) 4th order, W(4) Spatial Correlation Structure and Weight Matrices • Spatial weight matrices are used to construct the spatial correlation structure among locations. • The following ordering is an example of the definition of spatial correlation structure (up to 4th order neighbors) in 2D system.

  12. Some Limitations of STARMA Modeling • Raster based • Requires massive amount of space-time data • Models generally may not be fully mechanistic Assumptions: • Stationarity • “Spatial Regularity” • Effects are “constant” • Effects are “linearly” correlated

  13. Methods for STARMA Modeling and Software IEAST

  14. Box-Jenkins Modeling Method Data Model Identification Parameter Estimation Modify Model Diagnostic Check No Good? Yes End

  15. Model Identification • To determine the model type and orders. • Conventionally, space-time autocorrelations (i.e. STACF/STPACF) are used (Pfeifer and Deutsch, 1980). • In this research, space-time extensions of model fitness measures (i.e. AIC, BIC) are used to assist identification when the method above does not work. These measures are more objective and computationally efficient.

  16. STACF STPACF STACF STPACF Model Identification—using Space-Time Autocorrelation Functions Example 1: STAR (MaxT=2, MaxS=1) • STACF tails-off • STPACF cuts-off at T-lag=2 & S-lag=1 Example 2: STMA (MaxT=1, MaxS=1) • STACF cuts-off at T-lag=1 & S-lag=1 • STPACF tails-off

  17. STACF STPACF Simulation Data 1 Based on a STAR process Tail-off Cut-off • zt = 0.50zt-1 + 0.30W(1) zt-1 + • 0.10zt-2 + 0.05W(1) zt-2 + et STACF STPACF Simulation Data 2 Based on a STMA process Cut-off Tail-off zt = et -(-0.6)et-1 -(-0.4)W(1) +et-1 Model Identification— using Space-Time Autocorrelation Functions

  18. Using Variance of Residuals Using AIC Datasets based on Model type identified Datasets based on Model type identified STAR STMA Mixed STAR STMA Mixed STAR 4% 0% 96% STAR 16% 0% 84% STMA 4% 6% 90% STMA 4% 6% 90% Mixed 8% 6% 90% Mixed 8% 2% 90% Using BIC Using -AIC*BIC Datasets based on Model type identified Datasets based on Model type identified STAR STMA Mixed STAR STMA Mixed STAR 100% 0% 0% STAR 100% 0% 0% STMA 4% 86% 10% STMA 4% 78% 18% Mixed 18% 16% 66% Mixed 16% 4% 80% Model Identification— using Model Fitness Measures Accuracies (number in red) of model type selection using (1)Variance of residuals, (2)AIC, (3)BIC, and (4)–AIC*BIC based on 150 Monte Carlo simulated datasets:

  19. Parameter Estimation • To calculate coefficients of a candidate model for given model type and orders. • Two methods needed for two kinds of models: • Linear models (i.e. STAR) : Linear ML estimator. • Non-linear models (i.e. STMA and Mixed) : Multi-variate nonlinear optimization. • The multi-variate and non-linear nature raises problems while in optimization : • Converge to local optima • Very time-consuming • A good starting point is crucial for optimization • Extra step ‘Pre-estimation’ • Space-time extended Hannan-Rissanen Algorithm is used.

  20. Diagnostic Check • To decide the adequacy of a candidate model for representing the given data. • Methods: • Variance of residuals • Space-time autocorrelations of residuals • Significance testing of parameters • Space-time extension of AIC/BIC

  21. Modeling Procedures Data Model Identification Parameter Estimation Modify Model Diagnostic Check No Good? Yes End Box-Jenkins method

  22. Software for STARMA Modeling -- IEAST • Developed using GNU Octave v2.1.40 and able to be used under various popular OS, e.g. MS Windows, Mac OS, Unix. • Two interfaces: menu-driven mode and programming mode. • Features: • True spatio-temporal analysis software • Analyzing 2D lattice space-time datasets • Full configurability • Programming environment • Improved estimation algorithms • Improved diagnostic measures • Estimation of spatial correlation structure • Cross correlation analysis • 2D/3D plotting abilities

  23. [IEAST v1.30.01 - STARMA Modeling & Analysis] =============== [ Main Menu ] =============== [ 1] Setup [ 2] Data Preprocessing [ 3] Correlation Analyses [ 4] Model Identification [ 5] Parameter Estimation [ 6] Diagnostic Analysis [ 7] ------ [ 8] Preference [ 9] Interpreter [10] Exit ============================================= ========= [ Data Preprocessing ] ========= [ 1] > Remove Mean [ 2] > De-seasonalize: (1-B^dd)Z(t) [ 3] > Diference by one: (1-B)Z(t) [ 4] > De-trend [ 5] > ------ [ 6] > Subsequencing/Resampling [ 7] > Smoothing [ 8] > Missing Data [ 9] > Filter with a given STARMA model [10] > Undo previous action [11] > Return ========================================== ========== [ Correlation Analyses ] ========== [ 1] > AutoCorrelation (STACF) [ 2] > Partial AutoCorrelation (STPACF) [ 3] > Cross Correlation (STXCF) [ 4] > Partial Cross Correlation (STPXCF) [ 5] > Extended Cross Correlation (ExtSTXCF) [ 6] > Plot Correlations versus T-Lag/S-Lag [ 7] > Return ============================================== ============================================== [ Model Identification ] [ 1] Automatic Identification (Type,Orders) [ 2] Artificial Identification (Type,Orders) [ 3] Parameter Masking [ 4] ------ [ 5] Return ============================================== =================== [ Parameter Estimation ] =================== [ 1] > Pre-estimate Model Param -- Linear (STAR) [ 2] > Pre-estimate Model Param -- Non-linear (STMA,STARMA) [ 3] > Pre-estimate Model Param -- From STACF/STPACF [ 4] > Pre-estimate Model Param -- Specified by users [ 5] > Estimate Model Param -- Fixed SRM [ 6] > Estimate SRM -- Fixed Model Param [ 7] > Estimate SRM & Model Param -- Alternatively [ 8] > Return ================================================================ ==== [ Diagnostic Analysis ] ==== [ 1] > Statistical Significance [ 2] > AICC/BIC Analysis [ 3] > STACF of Residuals [ 4] > STPACF of Residuals [ 5] > ------ [ 6] > Return ================================= ============== [ Setup ] ============== [ 1] > Space-time dataset [ 2] > Spatial correlation structure [ 3] > Information of datasets [ 4] > Return ======================================= # list 10 load data demo.dat 20 load weight uniform.wet 30 stacf ST_ACF Z 16 3 40 plotacf ST_ACF 16 3 "ACF" : : : IEAST —Menu-Driven Mode vs Programming Mode In menu-driven mode, users can conduct the modeling procedure by selecting a series of commands/options from the menu hierarchy.

  24. Space-time Dataset: ‘demo.dat’ # name: DatafileZ # type: matrix # rows: 100 # columns: 100 -0.0350001 0.00197952 -0.00635348.... -0.0886448 0.0504684 -0.00369402.... 0.025101 0.00844576 -0.00743455.... ………………….. [IEAST v1.30.01 - STARMA Modeling & Analysis] =============== [ Main Menu ] ============= [ 1] Setup : : [ 8] Preference [ 9] Interpreter [10] Exit ============================================= Spatial Weighting Matrices: ’uniform.wet’ # name: SOD # type: global matrix # rows: 21 # columns: 21 0 0 0 0 0 0 0 0…. 0 0 0 0 0 0 0 0…. ………………. ============================================= || Welcome to STARMA analyzing interpreter || ============================================= # load program demo.pgm # list 10 load data demo.dat 20 load weight uniform.wet 30 stacf STACF Z 16 3 ……… 100 end # run IEAST Program ‘demo.pgm’ 10 load data demo.dat 20 load weight uniform.wet 30 stacf STACF Z 16 3 ……. IEAST —Menu-Driven Mode vs Programming Mode In programming mode, a set of sophisticated instructions can be used to compose programs to control the modeling flow and to conduct statistical analyses.

  25. Modeling Emerging Infectious Diseases using STARMA and IEAST

  26. State of Art for Statistical Analyses of Emerging Infectious Diseases • As far as we know, no true spatial-temporal statistical models and methods have been used. • Space-time cluster analysis available (Theophilides et al, 2003; Mostashari et al, 2003; Hoebe et al, 2004) • Spatial models available (Watson et al, 2004). • Temporal models available.

  27. Limitations of Simply Observing How a Spatial Distribution Changes over Time • For example, expansion of the leading edge of a disease range. • Is the disease spreading directly over long distances but infrequently, or over short distances frequently? • This is important for projecting the future spread.

  28. STARMA Has Potential for the Early Characterization of Infectious Diseases. • STARMA acts as a “prism”. Can filter the spatial-temporal correlations into direct effects with known magnitude and spatial and temporal lags. • Not generally a complete, mechanistic model, but puts critical constraints on models.

  29. West Nile Virus The West Nile Virus (WNV) was first detected in a woman with a mild fever in the West Nile District of Uganda in 1937. Since then WNV has been spreading to North Africa, Europe, West and Central Asia, and the Middle East.

  30. (A figure from CDC web site) West Nile Virus in the United States • Outbreak in NYC in Sep 1999. Vector is Culex mosquitoes. • Wild birds (89% are American crows) are the principal hosts. Humans, horses, etc. are incidental hosts. • The incidence rate among crows is high. Infected crow almost always die (68%). • Surveillance of Dead crows has been used as an indicator of WNV epidemic.

  31. Dead Crow Data (DCD) & Human Case Datasets (HCD) in 2002 Time: Summer in 2002 (April~October) Place: Detroit metro area (Oakland, Macomb, and Wayne) • DCD were collected systematically before and during an outbreak among humans. Data mainly consisted of locations and dates of reported public sightings. • HCD were obtained from clinicians in Michigan. Data on address of residence and date of onset of disease were obtained from the case-patient or attending physician through telephone interviews.

  32. Two Datasets Collected in 2002 Human Cases Interview GIS - ArcMap Toll-free # Dead Crows* Longitude/Latitude Data Cleaning & Geocoding WWW pages * From www.rci.rutgers.edu/ ~insects/crowid.htm

  33. Space-Time Analysis for Dead Crow Data

  34. The Dead Crow Data • Totally, 1817 dead crow sightings scattered within the three counties (red lines), spanning 28 weeks. • Covered area (after truncation): a rectangular area of 31.6x25.8 mi • Divide the covered area into 10x10 cells. Cell size: 3.16x2.58mi

  35. Spatial Correlation Structure and Trends • Spatial correlation structure (uniform weighting) • Preprocessing • Remove spatio-temporal trend • Spatial trend: 4th order polynomial regression trend surface • Temporal trend: averaging over space. • Remove mean

  36. Model Identification — STACF Tail-off STACF tails-off

  37. Spatially cut-off after this lag Temporally cut-off after this lag The STACF/STPACF suggest the model – STAR(maxT=3, maxS=4). Model Identification — STPACF

  38. Parameter Estimation The parameters (ts) of this STAR model can be estimated in IEAST by linear maximum likelihood estimator. • Values in dark blue are nominally significant at the 0.001 level. • Values in light blue are nominally significant at the 0.01 level.

  39. Diagnostic Check • Statistical significance of parameters • The probabilities P that ts are not significant are: • Residual’s autocorrelations STACF STPACF

  40. Interpretations for the DCD Analysis • STAR(3,4) model is the best-fitted one. • The max. of spatial and temporal lags that are important are still smaller. S=2 (or 6.4 km) and T=2 weeks. • Compare S=1 to S=2. Value for S=1 is much larger—cell boundary length effects. • The virus is not spreading very far very fast. Crows are not much spreading the virus spatially, though they probably are amplifying it locally. • Negative Autoregressive Effect At S=1, and T=2,3. • Appears to be a real effect. • May be due to crow population depletion. • Suggests there is a mixture of two STAR processes, the dominant one reflecting probability of infection, the other an echo effect from depletion.

  41. Additional Analyses and Results Additional Analyses: • Using 20x20 and other cell configurations • Using different lag structures “Pfeiffer’s” vs. “Ring structure” • Using various polynomials for Spatial de-trending • Using sub-sample of the data Results: • Consistent over various methods of spatial de-trending, except high order polynomials resulted in smaller AR. • Consistent AR values using different lag structures and cell sizes. • Consistent implied spatial and temporal scales over which there are significant or substantial AR effects

  42. Distances for Which There Are Significant Spatial Correlation • Based on different cell configurations: 10x10, 16x16, and 20x20 • The effective correlated area in the modeling result is consistently about 10.75 km regardless of cell sizes.

  43. Alternative Spatial Correlation Structures Pfeifer’s Ring structure

  44. Space-Time Analysis for Human Case Data

  45. Human Case Data • Over 500 human cases spanning 13 weeks • Date of onset-converted to week • Home addresses (names stripped)-converted to “cell,” same as for DCD. • Used same arrays of cell sizes and spatial correlation structures as for DCD. • Same spatial and temporal de-trending method

  46. Model Identification — STACF

  47. Model Identification — STPACF

  48. Parameter Estimation Spatial lags Temporal lags (weeks) • Values in dark blue are nominally significant at the 0.001 level. • Values in light blue are nominally significant at the 0.01 level.

  49. Diagnostic Check • Residual’s STACF and STPACF STACF STPACF

  50. Interpretations for the HCD Analysis • Most people are getting infected at or near their homes. • The incidences are highly autocorrelated in space and time. • The distribution or probability of infection is highly “localized”. • The WNV “load” and probability of human infection is “spreading” slowly, in the sense of not spreading very far very fast. • Suggests localized spraying could reduce cases. • Without depletion effect, the human case data show positive and significant above zero for T-lag=2 and S-lag>=1, esp. at S-lag=1.

More Related