590 likes | 701 Views
Infectious Disease Informatics: Overview and The BioPortal Experience. Hsinchun Chen, Ph.D. Artificial Intelligence Lab, U. of Arizona NSF BioPortal Research Center Acknowledgement: NSF, CIA, DHS, CDC, NCI.
E N D
Infectious Disease Informatics: Overview and The BioPortal Experience Hsinchun Chen, Ph.D. Artificial Intelligence Lab, U. of Arizona NSF BioPortal Research Center Acknowledgement: NSF, CIA, DHS, CDC, NCI
Medical Informatics: The computational, algorithmic, database and information-centric approach to the study of medical and health care problems. Infectious Disease Informatics: Medical informatics for infectious disease, public health, and biodefence. Hsinchun Chen et al., 2005 Hsinchun Chen, et al., 2010
IDI and Syndromic Surveillance Systems • Data sources and collection strategies • Formal-informal (sequence to epi), standards, data entry and transmission, security • Data analysis and outbreak detection • Syndromic classification, outbreak detection methods (temporal, spatial, spatial-temporal), multiple data streams • Data visualization, information dissemination, and alerting • GIS, temporal, sequence, text, interactive • System assessment and evaluation • Algorithms, data collection, information dissemination, interface, usability
Syndromic Surveillance Data Sources in Different Stages of Developing a Disease Reaching Situational Awareness Reproduced from Mandl et. al. (2004)
Syndromic Surveillance Systems • Generation 1, paper-based: paper, fax, TEL, TEL directory, etc. • Generation 2, email-based: email, Word/Access, pager, cell phone, etc. • Generation 3, database-driven: database, standards, messaging, tabulation, GIS, graphs, text, etc. • Generation 4, search engine-based: real-time, interactive, web services, visualized, GIS, graphs, texts, sequences, contact networks, etc.
COPLINK News • The New York Times November 2, 2002 • ABC News April 15, 2003 • Newsweek Magazine March3, 2003
Dark Web News Project Seeks to Track Terror Web Posts, 11/11/2007 Researchers say tool could trace online posts to terrorists, 11/11/2007 Mathematicians Work to Help Track Terrorist Activity, 9/14/2007 Team from the University of Arizona identifies and tracks terrorists on the Web, 9/10/2007
BioPortal: Overview, West Nile Virus (real-time information collection, sharing, access, visualization, and analysis, Epi data across species)
BioPortal Project Goals • Demonstrate and assess the technical feasibility and scalability of an infectious disease information sharing (across species and jurisdictions), alerting, and analysis framework. • Develop and assess advanced data mining and visualization techniques for infectious disease data analysis and predictive modeling. • Identify important technical and policy-related challenges in developing a national infectious disease information infrastructure.
Information Sharing Infrastructure Design Portal Data Store (MS SQL 2000) Data Ingest Control Module Cleansing / Normalization Info-Sharing Infrastructure Adaptor Adaptor Adaptor SSL/RSA SSL/RSA PHINMS Network XML/HL7 Network NYSDOH CADHS New
Public health professionals, researchers, policy makers, law enforcement agencies & other users Browser (IE/Mozilla/…) SSL connection Spatial- Temporal Visual- ization Analysis / Prediction HAN or Personal Alert Management Dataset Privileges Management Data Search and Query Web Server (Tomcat 4.21 / Struts 1.2) WNV-BOT Portal Access Privilege Def. User Access Control API (Java) Data Store Data Store (MS SQL 2000) Data Access Infrastructure Design
Spatial-Temporal Visualization • Integrates four visualization techniques • GIS View • Periodic Pattern View • Timeline View • Central Time Slider • Visualizes the events in multiple dimensions to identify hidden patterns • Spatial • Temporal • Hotspot analysis • Phylogenetic tree • Contact network analysis
Outbreak Detection & Hotspot Analysis • Hotspot is a condition indicating some form of clustering in a spatial and temporal distribution (Rogerson & Sun 2001; Theophilides et. al. 2003; Patil & Tailie 2004; Zeng et. al. 2004; Chang et. al. 2005) • For WNV, localized clusters of dead birds typically identify high-risk disease areas (Gotham et. al. 2001); automatic detection of dead bird clusters can help predict disease outbreaks and allocate prevention/control resources effectively
Risk-Adjusted Support Vector Clustering (RSVC) Feature space Minimum sphere Split into several clusters High baseline density makes two points far apart in feature space Estimate baseline density
Study II: NY WNV (birds, mosquitoes, and humans) • On May 26, 2002, the first dead bird with WNV was found in NY • Based on NY’s test dataset 140 records 224 records March 5 May 26 July 2 new cases baseline
Dataset name Advanced Search criteria Spatial / Temporal Select background maps Results listed in table Available dataset list User main page Positive cases Time range Select NY / CA population, river and lakes County / State Choose WNV disease data Select CA dead bird, chicken and NY dead bird data Select CA dead bird, chicken and NY dead bird data Positive cases User Login Positive cases Start STV Specify bird species
GIS Timeline Spatial distribution pattern Spatial distribution pattern Spatial distribution pattern NY dead bird temporal distribution pattern NY dead bird temporal distribution pattern NY dead bird temporal distribution pattern NY dead bird temporal distribution pattern Periodic Pattern Close Zoom in NY Close Zoom in Control panel Year 2001 data Move time slider, year 3 Move time slider, year 2 Concentrated in May / Jun Similar time pattern Overall pattern Similar time pattern 2 weeks window View all 3 year data 1 year window in 3 year span
Spatial distribution pattern Spatial distribution pattern Spatial distribution pattern Spatial distribution pattern Spatial distribution pattern Spatial distribution pattern Spatial distribution pattern Spatial distribution pattern Spatial distribution pattern Spatial distribution pattern Spatial distribution pattern Spatial distribution pattern Spatial distribution pattern Dead bird cases migrate from long island Into upstate NY Season end Move time slider Overlay population map Dead bird cases distribute along populated areas near Hudson river Enable population map
BioPortal HotSpot Analysis:RSVC, SaTScan, and CrimeStat Integrated (first visual, real-time hotspot analysis system for disease surveillance) • West Nile virus in California
Select hotspot to highlight case points Regular STV Select algorithms Hotspots found! Select baseline and case periods Select baseline and case periods Select target geographic area Hotspot Analysis-Enabled STV
BioPortal – FMD (many species; phylogenetic tree and news)
FMD Global Surveillance: Lessons Learned • Must understand risks, and nature of changing risks, in order to develop strategies for prevention and mitigation on a global scale • Must understand the global situation in order to prepare locally • United Kingdom FMD outbreak, 2001; $12B, 50-60% of 4M farm animals (cows, pigs, sheep) slaughtered
International FMD BioPortal • Real time web-based situational awareness of FMD outbreaks worldwide through the establishment of an international information technology system. • FMDv characterization at the genomic level integrated with associated epidemiological information and modeling tools to forecast national, regional and/or international spread and the prospect of importation into the US and the rest of North America. • Web-based crisis management of resources—facilities, personnel, diagnostics, and therapeutics.
Preliminary Global FMD Dataset • Provider: UC Davis FMD Lab • Information sources: reference labs and OIE • Coverage: 28 countries globally • Dataset size: 30,000+ records of which 6789 records are complete • Host species: Cattle, Caprine, Ovine, Bovine, Swine, NK, Elephant, Buffalo, Sheep, Camelidae, Goat
FMD Migration Visualization using BioPortal (cases in South Asia) FMD Cases travel back and forth between countries
International FMD News • Provider: UC Davis FMD Lab • Information sources: Google, Yahoo, and open Internet sources • Time span: Oct 4, 2004 – present (real-time messaging under development) • Data size: 460 events (6/21/05) • Coverage: 51 countries (Africa:11, Asia:16, Europe:12, Americas:12)
Searching FMD News • http://fmd.ucdavis.edu/ • Searchable by • Date range • Country • Keyword
FMD Genetic Visualization • Goal: Extend STV to incorporate 3rd dimension, phylogenetic distance • Include a phylogenetic tree. • Identify phylogenetic groups and color-code the isolate points on the map. • Leverage available NCBI tools such as BLAST. • Proof of concept: SAT 2 & 3 analysis • Data: 54 partial DNA sequence records in South Africa received from UC Davis FMD Lab (Bastos,A.D. et al. 2000, 2003) • Date range: 1978-1998 • Countries covered: South Africa, Zimbabwe, Zambia, Namibia, Botswana
Sample FMD Sequence Records Color-coded View (MEGA3) Textual View of Gene Sequence
Identify 6 groups within 2 major families (MEGA3; based on sequence similarity) Phylogenetic Treeof Sample FMD Data Group6 Group1 Group5 Group2 Group4 Group3
Genetic, Spatial, and Temporal Visualization of FMD Data Phylogenetic tree color coded Isolates’ locations color coded Isolates’ appearances in time
FMD Time Sequence Analysis First family cases appeared throughout the period 2nd family cases exist before 1993 and a comeback lately Second family cases existed before 1993 and reappeared later after 1997
FMD Periodic Pattern Analysis 2nd family concentrated in Feb. while 1st family spread evenly
Locations of Family 1 records Selected only groups 1, 2, and 3 and found a spatial cluster
Locations of Family 2 records Sparse isolate locations Selected only groups 4, 5, and 6
BioPortal: Influenza, SARS (chief complaint syndromic surveillance, contact network analysis and visualization)
Stage 1 Stage 2 Stage 3 Symptom Groups CC Standardization Symptom Grouping Syndrome Classification Syndromes symptoms Weighted Semantic Similarity Score EMT-P UMLS Ontology Symptom Grouping Table JESS UMLS Concepts Synonym List EARS Syndrome Rules EMT-P EARS Symptom Table Overall System Design Chief Complaints
Comparing BioPortal to RODS * p-value < 0.1 ** p-value < 0.05 *** p-value < 0.01 Statistical test is based on 2,500 bootstrapings.