1 / 39

Bayesian Biosurveillance Using Multiple Data Streams

Bayesian Biosurveillance Using Multiple Data Streams Greg Cooper, Weng-Keen Wong, Denver Dash*, John Levander, John Dowling, Bill Hogan, Mike Wagner RODS Laboratory, University of Pittsburgh * Intel Research, Santa Clara. Outline. Introduction Model Inference Conclusions.

dbeaton
Download Presentation

Bayesian Biosurveillance Using Multiple Data Streams

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bayesian Biosurveillance Using Multiple Data Streams Greg Cooper, Weng-Keen Wong,Denver Dash*, John Levander, John Dowling, Bill Hogan, Mike Wagner RODS Laboratory, University of Pittsburgh * Intel Research, Santa Clara

  2. Outline • Introduction • Model • Inference • Conclusions

  3. Over-the-Counter (OTC) Data Being Collected by the National Retail Data Monitor (NRDM) 19,000 stores 50% market share nationally >70% market share in large cities

  4. ED Chief Complaint Data Being Collected by RODS Chief Complaint ED Records for Allegheny County

  5. Objective Using the ED and OTC data streams, detect a disease outbreak in a given region as quickly and accurately as possible

  6. Our Approach • A detection algorithm that models each individual in the population • Combines ED and OTC data streams • The current prototype focuses on detecting an outdoor aerosolized release of an anthrax-like agent in Allegheny county Population-wideANomalyDetectionandAssessment (PANDA)

  7. PANDA Uses a causal Bayesian network Home Location of Person Visit of Person to ED Anthrax Infection of Person Location of Anthrax Release Bayesian Network: A graphical model representing the joint probability distribution of a set of random variables

  8. PANDA Uses a causal Bayesian network Home Location of Person Visit of Person to ED Anthrax Infection of Person Location of Anthrax Release The arrows convey conditional independence relationships among the variables. They also represent causal relationships.

  9. Outline • Introduction • Model • Inference • Conclusions

  10. A Schematic of the Generic PANDA Model for Non-Contagious Diseases Population Risk Factors Population Disease Exposure (PDE) Person Model Person Model Person Model Person Model Population-Wide Evidence

  11. A Special Case of the Generic Model Anthrax Release Location of Release Time of Release Person Model Person Model Person Model Person Model OTC Sales for Region Each person in the population is represented as a subnetwork in the overall model

  12. The Person Model Location of Release Age Decile Home Zip Time Of Release Gender Anthrax Infection Other ED Disease Non-ED Acute Respiratory Infection Respiratory from Anthrax Respiratory CC From Other ED Acute Respiratory Infection Acute Respiratory Infection Respiratory CC ED Admit from Anthrax ED Admit from Other Daily OTC Purchase Respiratory CC When Admitted Last 3 Days OTC Purchase ED Admission OTC Sales for Region

  13. Why Use a Population-Based Approach? • Representational power • Spatial, temporal, demographic, and symptom knowledge of potential diseases can be coherently represented in a single model • Spatial, temporal, demographic, and symptom evidence can be combined to derive a posterior probability of a disease outbreak • Representational flexibility New types of knowledge and evidence can be readily incorporated into the model Hypothesis: A population-based approach will achieve better detection performance than non-population-based approaches.

  14. The Person Model Location of Release Age Decile Home Zip Time Of Release Gender Anthrax Infection Other ED Disease Non-ED Acute Respiratory Infection Respiratory from Anthrax Respiratory CC From Other ED Acute Respiratory Infection Acute Respiratory Infection Respiratory CC ED Admit from Anthrax ED Admit from Other Daily OTC Purchase Respiratory CC When Admitted Last 3 Days OTC Purchase ED Admission OTC Sales for Region

  15. The Person Model Location of Release Age Decile Home Zip Time Of Release Gender Anthrax Infection Other ED Disease Non-ED Acute Respiratory Infection Respiratory from Anthrax Respiratory CC From Other ED Acute Respiratory Infection Acute Respiratory Infection Respiratory CC ED Admit from Anthrax ED Admit from Other Daily OTC Purchase Respiratory CC When Admitted Last 3 Days OTC Purchase ED Admission Equivalence Class Example:

  16. Outline • Introduction • Model • Inference • Conclusions

  17. Inference Anthrax Release Location of Release Time of Release Person Model Person Model Person Model Person Model OTC Sales for Region Derive P (AnthraxRelease = true | OTC Sales Data & ED Data)

  18. Inference Key Term in Deriving P ( AR | OTC, ED ) : P ( OTC, ED | PDE ) = P ( OTC | ED, PDE ) P ( ED | PDE ) Contribution of ED Data Contribution of OTC Counts Details in: Cooper GF, Dash DH, Levander J, Wong W-K, Hogan W, Wagner M. Bayesian Biosurveillance of Disease Outbreaks. In: Proceedings of the Conference on Uncertainty in Artificial Intelligence, 2004.

  19. Inference Key Term in Deriving P ( AR | OTC, ED ) : P ( OTC, ED | PDE ) = P ( OTC | ED, PDE ) P ( ED | PDE ) The focus of the remainder of this talk

  20. The Person Model Location of Release Age Decile Home Zip Time Of Release Gender Anthrax Infection Other ED Disease Non-ED Acute Respiratory Infection Respiratory from Anthrax Respiratory CC From Other ED Acute Respiratory Infection Acute Respiratory Infection Respiratory CC ED Admit from Anthrax ED Admit from Other Daily OTC Purchase Respiratory CC When Admitted Last 3 Days OTC Purchase ED Admission OTC Sales for Region

  21. Incorporating the Counts of OTC Purchases Person1Zip1 OTC count Person2Zip1 OTC count Person3Zip1 OTC count Person4Zip1 OTC count Eq Class1Zip1 OTC count Eq Classs2Zip1 OTC count Approximate binomial distribution with a normal distribution Zip1 OTC count

  22. The PANDA OTC Model P (OTCsales = X | ED, PDE ) Recall that: P ( OTC, ED | PDE ) = P ( OTC | ED, PDE ) P ( ED | PDE )

  23. Example Equivalence Class 1 ~ Normal(100,100)

  24. Example Equivalence Class 1 ~ Normal(100,100) Equivalence Class 2 ~ Normal(150,225)

  25. Example Equivalence Class 1 ~ Normal(100,100) Equivalence Class 2 ~ Normal(150,225) If these were the only 2 Equivalence Classes in the County then County Cough & Cold OTC ~ Normal(100+150,100+225)

  26. Example Now suppose 260 units are sold in the county P( OTC Sales = 260 | ED Data, PDE ) = Normal( 260; 250, 325 ) = 0.001231 260

  27. Inference Timing Machine: P4 3 Gigahertz, 2 GB RAM

  28. A Current Limitation • Problem: Currently we assume unrealistically that a person only makes OTC purchases in his or her home zip code • Approach 1: Aggregate OTC-counts (e.g., at the county level) • Approach 2: For each home zip code, model the distribution of zip codes where OTC purchases are made

  29. Outline • Introduction • Model • Inference • Conclusions

  30. Challenges in Population-Wide Modeling Include … • Obtaining good parameter estimates to use in modeling (e.g., the probability of an OTC cough medication purchase given an acute respiratory illness) • Modeling time and space in a way that is both useful and computationally tractable • Modeling contagious diseases

  31. Conclusions • PANDA is a multivariate algorithm that can combine multiple data streams • Modeling each individual in the population is computationally feasible (so far) • An evaluation of the PANDA approach to modeling multiple data streams is in progress using semi-synthetic test data

  32. Thank you Current funding: National Science Foundation Department of Homeland Security Earlier funding: DARPA http://www.cbmi.pitt.edu/panda/ gfc@cbmi.pitt.edu

  33. The PANDA OTC Model Model the OTC purchases for each Equivalence Class Ei as a binomial Distribution. Ei ~ Binomial(NEi ,PEi)

  34. The PANDA OTC Model Model the OTC purchases for each Equivalence Class Ei as a binomial Distribution. Ei ~ Binomial(NEi ,PEi) Number of people in Equivalence Class Ei Probability of an OTC cough medication purchase during the previous 3 days by each person in Equivalence Class Ei

  35. The PANDA OTC Model Model the OTC purchases for each Equivalence Class Ei as a binomial Distribution. Approximate the binomial distribution as a normal distribution. Ei ~ Binominal(NEi ,PEi)  Normal(Ei,2Ei)

  36. The PANDA OTC Model Model the OTC purchases for each Equivalence Class Ei as a binomial Distribution. Approximate the binomial distribution as a normal distribution. Ei ~ Binominal(NEi ,PEi)  Normal(Ei,2Ei) Ei = NEi × PEi 2Ei = NEi×PEi× (1 - PEi)

  37. Computational Cost of a Population-Wide Approach? ~1.4 million people in Allegheny County, Pennsylvania

  38. Equivalence Classes The ~1.4M people in the modeled population can be partitioned into approximately 24,240 equivalence classes

More Related