1 / 59

Infectious Disease Informatics: Overview and The BioPortal Experience

Infectious Disease Informatics: Overview and The BioPortal Experience. Hsinchun Chen, Ph.D. Artificial Intelligence Lab, U. of Arizona NSF BioPortal Research Center Acknowledgement: NSF, CIA, DHS, CDC, NCI.

malo
Download Presentation

Infectious Disease Informatics: Overview and The BioPortal Experience

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Infectious Disease Informatics: Overview and The BioPortal Experience Hsinchun Chen, Ph.D. Artificial Intelligence Lab, U. of Arizona NSF BioPortal Research Center Acknowledgement: NSF, CIA, DHS, CDC, NCI

  2. Medical Informatics: The computational, algorithmic, database and information-centric approach to the study of medical and health care problems. Infectious Disease Informatics: Medical informatics for infectious disease, public health, and biodefence. Hsinchun Chen et al., 2005 Hsinchun Chen, et al., 2010

  3. IDI and Syndromic Surveillance Systems • Data sources and collection strategies • Formal-informal (sequence to epi), standards, data entry and transmission, security • Data analysis and outbreak detection • Syndromic classification, outbreak detection methods (temporal, spatial, spatial-temporal), multiple data streams • Data visualization, information dissemination, and alerting • GIS, temporal, sequence, text, interactive • System assessment and evaluation • Algorithms, data collection, information dissemination, interface, usability

  4. Syndromic Surveillance Data Sources in Different Stages of Developing a Disease  Reaching Situational Awareness Reproduced from Mandl et. al. (2004)

  5. Syndromic Surveillance Systems • Generation 1, paper-based: paper, fax, TEL, TEL directory, etc. • Generation 2, email-based: email, Word/Access, pager, cell phone, etc. • Generation 3, database-driven: database, standards, messaging, tabulation, GIS, graphs, text, etc. • Generation 4, search engine-based: real-time, interactive, web services, visualized, GIS, graphs, texts, sequences, contact networks, etc.

  6. Syndromic Surveillance System Survey

  7. Sample Systems and Data Sources Utilized

  8. COPLINK System

  9. COPLINK News • The New York Times  November 2, 2002 • ABC News  April 15, 2003 • Newsweek Magazine  March3, 2003

  10. Dark Web System

  11. Dark Web News Project Seeks to Track Terror Web Posts, 11/11/2007 Researchers say tool could trace online posts to terrorists, 11/11/2007 Mathematicians Work to Help Track Terrorist Activity, 9/14/2007 Team from the University of Arizona identifies and tracks terrorists on the Web, 9/10/2007

  12. BioPortal: Overview, West Nile Virus (real-time information collection, sharing, access, visualization, and analysis, Epi data across species)

  13. BioPortal Project Goals • Demonstrate and assess the technical feasibility and scalability of an infectious disease information sharing (across species and jurisdictions), alerting, and analysis framework. • Develop and assess advanced data mining and visualization techniques for infectious disease data analysis and predictive modeling. • Identify important technical and policy-related challenges in developing a national infectious disease information infrastructure.

  14. Information Sharing Infrastructure Design Portal Data Store (MS SQL 2000) Data Ingest Control Module Cleansing / Normalization Info-Sharing Infrastructure Adaptor Adaptor Adaptor SSL/RSA SSL/RSA PHINMS Network XML/HL7 Network NYSDOH CADHS New

  15. Public health professionals, researchers, policy makers, law enforcement agencies & other users Browser (IE/Mozilla/…) SSL connection Spatial- Temporal Visual- ization Analysis / Prediction HAN or Personal Alert Management Dataset Privileges Management Data Search and Query Web Server (Tomcat 4.21 / Struts 1.2) WNV-BOT Portal Access Privilege Def. User Access Control API (Java) Data Store Data Store (MS SQL 2000) Data Access Infrastructure Design

  16. Spatial-Temporal Visualization • Integrates four visualization techniques • GIS View • Periodic Pattern View • Timeline View • Central Time Slider • Visualizes the events in multiple dimensions to identify hidden patterns • Spatial • Temporal • Hotspot analysis • Phylogenetic tree • Contact network analysis

  17. BioPortal Prototype Systems

  18. Outbreak Detection & Hotspot Analysis • Hotspot is a condition indicating some form of clustering in a spatial and temporal distribution (Rogerson & Sun 2001; Theophilides et. al. 2003; Patil & Tailie 2004; Zeng et. al. 2004; Chang et. al. 2005) • For WNV, localized clusters of dead birds typically identify high-risk disease areas (Gotham et. al. 2001); automatic detection of dead bird clusters can help predict disease outbreaks and allocate prevention/control resources effectively

  19. Retrospective Hotspot Analysis Problem Statement

  20. Risk-Adjusted Support Vector Clustering (RSVC) Feature space Minimum sphere Split into several clusters High baseline density makes two points far apart in feature space Estimate baseline density

  21. Study II: NY WNV (birds, mosquitoes, and humans) • On May 26, 2002, the first dead bird with WNV was found in NY • Based on NY’s test dataset 140 records 224 records March 5 May 26 July 2 new cases baseline

  22. Dead Bird Hotspots Identified

  23. Dataset name Advanced Search criteria Spatial / Temporal Select background maps Results listed in table Available dataset list User main page Positive cases Time range Select NY / CA population, river and lakes County / State Choose WNV disease data Select CA dead bird, chicken and NY dead bird data Select CA dead bird, chicken and NY dead bird data Positive cases User Login Positive cases Start STV Specify bird species

  24. GIS Timeline Spatial distribution pattern Spatial distribution pattern Spatial distribution pattern NY dead bird temporal distribution pattern NY dead bird temporal distribution pattern NY dead bird temporal distribution pattern NY dead bird temporal distribution pattern Periodic Pattern Close Zoom in NY Close Zoom in Control panel Year 2001 data Move time slider, year 3 Move time slider, year 2 Concentrated in May / Jun Similar time pattern Overall pattern Similar time pattern 2 weeks window View all 3 year data 1 year window in 3 year span

  25. Spatial distribution pattern Spatial distribution pattern Spatial distribution pattern Spatial distribution pattern Spatial distribution pattern Spatial distribution pattern Spatial distribution pattern Spatial distribution pattern Spatial distribution pattern Spatial distribution pattern Spatial distribution pattern Spatial distribution pattern Spatial distribution pattern Dead bird cases migrate from long island Into upstate NY Season end Move time slider Overlay population map Dead bird cases distribute along populated areas near Hudson river Enable population map

  26. BioPortal HotSpot Analysis:RSVC, SaTScan, and CrimeStat Integrated (first visual, real-time hotspot analysis system for disease surveillance) • West Nile virus in California

  27. Select hotspot to highlight case points Regular STV Select algorithms Hotspots found! Select baseline and case periods Select baseline and case periods Select target geographic area Hotspot Analysis-Enabled STV

  28. BioPortal – FMD (many species; phylogenetic tree and news)

  29. FMD Global Surveillance: Lessons Learned • Must understand risks, and nature of changing risks, in order to develop strategies for prevention and mitigation on a global scale • Must understand the global situation in order to prepare locally • United Kingdom FMD outbreak, 2001; $12B, 50-60% of 4M farm animals (cows, pigs, sheep) slaughtered

  30. International FMD BioPortal • Real time web-based situational awareness of FMD outbreaks worldwide through the establishment of an international information technology system. • FMDv characterization at the genomic level integrated with associated epidemiological information and modeling tools to forecast national, regional and/or international spread and the prospect of importation into the US and the rest of North America. • Web-based crisis management of resources—facilities, personnel, diagnostics, and therapeutics.

  31. Preliminary Global FMD Dataset • Provider: UC Davis FMD Lab • Information sources: reference labs and OIE • Coverage: 28 countries globally • Dataset size: 30,000+ records of which 6789 records are complete • Host species: Cattle, Caprine, Ovine, Bovine, Swine, NK, Elephant, Buffalo, Sheep, Camelidae, Goat

  32. Global FMD Coverage in BioPortal

  33. FMD Migration Visualization using BioPortal (cases in South Asia) FMD Cases travel back and forth between countries

  34. BioPortal-Afghanistan

  35. International FMD News • Provider: UC Davis FMD Lab • Information sources: Google, Yahoo, and open Internet sources • Time span: Oct 4, 2004 – present (real-time messaging under development) • Data size: 460 events (6/21/05) • Coverage: 51 countries (Africa:11, Asia:16, Europe:12, Americas:12)

  36. Searching FMD News • http://fmd.ucdavis.edu/ • Searchable by • Date range • Country • Keyword

  37. Visualizing FMD News on BioPortal

  38. FMD Genetic Visualization • Goal: Extend STV to incorporate 3rd dimension, phylogenetic distance • Include a phylogenetic tree. • Identify phylogenetic groups and color-code the isolate points on the map. • Leverage available NCBI tools such as BLAST. • Proof of concept: SAT 2 & 3 analysis • Data: 54 partial DNA sequence records in South Africa received from UC Davis FMD Lab (Bastos,A.D. et al. 2000, 2003) • Date range: 1978-1998 • Countries covered: South Africa, Zimbabwe, Zambia, Namibia, Botswana

  39. Sample FMD Sequence Records Color-coded View (MEGA3) Textual View of Gene Sequence

  40. Identify 6 groups within 2 major families (MEGA3; based on sequence similarity) Phylogenetic Treeof Sample FMD Data Group6 Group1 Group5 Group2 Group4 Group3

  41. Genetic, Spatial, and Temporal Visualization of FMD Data Phylogenetic tree color coded Isolates’ locations color coded Isolates’ appearances in time

  42. FMD Time Sequence Analysis First family cases appeared throughout the period 2nd family cases exist before 1993 and a comeback lately Second family cases existed before 1993 and reappeared later after 1997

  43. FMD Periodic Pattern Analysis 2nd family concentrated in Feb. while 1st family spread evenly

  44. Locations of Family 1 records Selected only groups 1, 2, and 3 and found a spatial cluster

  45. Locations of Family 2 records Sparse isolate locations Selected only groups 4, 5, and 6

  46. BioPortal: Influenza, SARS (chief complaint syndromic surveillance, contact network analysis and visualization)

  47. Existing CC Classification Methods

  48. Syndromic Categories in Different Systems

  49. Stage 1 Stage 2 Stage 3 Symptom Groups CC Standardization Symptom Grouping Syndrome Classification Syndromes symptoms Weighted Semantic Similarity Score EMT-P UMLS Ontology Symptom Grouping Table JESS UMLS Concepts Synonym List EARS Syndrome Rules EMT-P EARS Symptom Table Overall System Design Chief Complaints

  50. Comparing BioPortal to RODS * p-value < 0.1 ** p-value < 0.05 *** p-value < 0.01 Statistical test is based on 2,500 bootstrapings.

More Related