700 likes | 804 Views
TUTORIAL T6 Theory and Practice of Outbreak Detection. II. DATA. Michael M. Wagner, MD, PhD. AMIA Annual Meeting Saturday, November 8, 2003 8:00 - 4:30 pm. The Basic Question: Which Data are Useful and What Does it Take to Get Them?. 1999 Influenza. Influenza cultures.
E N D
TUTORIAL T6 Theory and Practice of Outbreak Detection II. DATA Michael M. Wagner, MD, PhD AMIA Annual MeetingSaturday, November 8, 2003 8:00 - 4:30 pm
The Basic Question: Which Data are Useful and What Does it Take to Get Them? 1999 Influenza Influenza cultures Sentinel physicians WebMD queries about ‘cough’ etc. School absenteeism Sales of cough and cold meds Sales of cough syrup ER respiratory complaints ER ‘viral’ complaints Influenza-related deaths Week (1999-2000)
Drawing from Two Reports and Several Papers • To AHRQ about data availability (Wagner, Aryel, Dato. 2001 188 pages available at www.health.pitt.edu/rods • To DARPA about data value (Wagner, Pavlin, Brillman, Stetson) expected completion date December 2003. • Self-treatment/health seeking literature • Published studies See bibliography that will be on Web at www.health.pitt.edu/rods
ANIMALS HUMAN BEHAVIORS NON TRADITIONAL USES CLINICAL DATA Biosurveillance Data Space LATER DETECTION EARLY DETECTION GOLD STANDARDS NON TRADITIONAL MEDICAL INTELLIGENCE BIOSENSORS OTC Pharm Test Results Test Results Complaints Diagnosis Sentinel MD Agribusiness Poison Centers Influenza isolates Environmental Investi- gations Web Queries Medical Examiner Survey Nurse Calls Public Transport (bus) Wind Speed/ direct. Cloud cover Limited Utility Some Potential Promising ER Visits Radiograph Reports
Outline (and main points) • Which data are required (and how do we know? • Discussion of primary surveillance data • Availability • Experimental results about value • What supplementary surveillance data do we need to collect? (e.g., spatial, temporal, census, weather)
Overly Simple Answer • It is already known-- exactly the data public health departments collect at present. • Answer: We just need to speed up the collection and processing of the data Yes, but they sometimes miss outbreaks or detect them late Yes, but maybe the data are inherently late. Plus there is still the problem of undetected outbreaks Also, there may just be a better way. There may be highly useful data that they just do not have the infrastructure to collect
First Principle Analysis (Figure 2.2) Universe of Data Elements (Appendix A, Table A.1) Health Behavior Literature Review (Figure 2.3) Combined elements, grouped by time of availability (Table 2.4, Column 2) Information systems (Table 2.4, Column 3) Review of 66 Systems (Report 1 Chapter 3) CDC Case Definitions (Appendix A Table A.2) Outbreak Review (Report 1 Chapter 4) Figure 2.2 Framework of Analysis More Complicated Answer
More Complicated Answer • Analysis of data actually collected routinely collected by public health; • First principles analysis of early detection • Using CDC case definitions; • Analysis of data used in recognition and characterization of 57 recent outbreaks; • Review of the literature on health psychology, especially the sub literature relevant to behaviors of ill individuals between the onset of symptoms and presentation (if ever) for medical care; and first principle analysis of various detection strategies.
Grouping Data by Time of Availability • Pre-outbreak data: • data obtained during the period prior to the release of a biologic agent. • E.g., intelligence or host factors such as vaccinations that determine susceptibility. • Attack, release/or exposure data • obtained at or very near the time of release. • E.g., biosensor arrays, police reports of observed explosions, unauthorized airplane flights • Pre-symptomatic data (incubation period data) • between the time of release of an agent until the recognition of first symptoms in people. • E.g., serology or cultures from pre-symptomatic individuals from enhanced screening • Early symptom data • period between the onset of symptoms and when the illness becomes more fully developed • E.g., diarrheal or upper respiratory symptoms, sales of over-the-counter cold medicines • Specific syndrome data • data that either singly or in combination strongly suggests a specific agent. • E.g., specific symptoms, vital signs, physical findings, laboratory results, radiology results • Definitive data • data that are sufficient on their own to conclude that a patient has a disease. • E.g., microbiology culture or autopsy reports.
Literature on Care-seeking Behavior and Health Psychology Zeng, Wagner et al. JAMIA 2002.
Data Currently Collected by Public Health Surveillance Systems (conventional surveillance data) • Reportable diseases • Sentinel physicians • Reports from astute clinicians • Results of enhanced surveillance or contact testing
Data Used During Outbreak Investigations • Data mentioned in CDC Case Definitions • Data items from case investigation forms • Data items mentioned in MMWR and other published reports as being pivotal in the initial detection
Universe of Data Elements • Table A.1 in Appendix A lists all data elements identified by our five methods. This list is comprehensive and contains references to sources for the data elements, where relevant. Table A.2 contains the data elements used in actual outbreak detection and in the CDC Case Definitions classified by type of outbreak (water borne, food borne etc.).
Table 2.4. Data and Data Systems for Early Detection • PREOUTBREAK • Environmental conditions favorable for outbreaksVegetation, climate, sea surface temperature, cloudiness, rainfallInformation about susceptibility of population (host) Immunization informationInformation about outbreaks in other regions Outbreak report from WHOEmergence of new infections in other regions Keywords in the Internet and electronic reports related to or indicative of outbreaks in other areas Information about pathogens and their occurrence in the environment Antimicrobial resistance patterns Routine testing of food and water suppliesMonitoring temporal and geographic patterns of specific viruses in order to tract conditions likely to be cause future outbreaks (e.g. looking for serotypes of Infuenza likely to spread during the next year)Information about animals Avian morbidity and mortalityCaptive or free ranging sentinel animals Monitoring diseases that animals can transmit to humans. including animals which will eventually be distributed to consumers in the form of pets or food Satellite systemsMeteorological data systems Immunization registriesPublic health systemsVeterinary systemsFood and water monitoringInformation retrieval systems
Outline (and main points) • Which data are required (and how do we know? • Discussion of primary surveillance data • Availability • Experimental results about value • What supplementary surveillance data do we need to collect? (e.g., spatial, temporal, census, weather)
Availability • Easier to know • Methods: phone interviews with industries (hospitals, 911 services, pharmacies, schools …)
Value and Relative Importance • Much much harder to know • Methods • Observational studies of real outbreaks • Studies of what individuals do when sick with different diseases (what they buy, who they call …)
Clinical data are highly relevant to public health-surveillance. Clinicians and health systems are a primary point of data collection about the sick, including data about demographics, risk factors, symptoms, signs, special testing, and diagnoses.
Where are the Clinical Data: Types of Clinical Data Systems • Paper charts • HL7 Message Routers • Registration, Scheduling, and Billing Systems • Clinical Laboratory Systems • Radiology Systems • Pathology • Dictation • Pharmacy • Orders • Data Warehouses • Clinical Event Monitors • Point-of-Care Systems • Patient Web Portals and Call Centers
Laboratory Results and Electronic Lab Reporting • There is no need to prove the value of laboratory results for public health surveillance • The main issue is getting them • Studies of ELR of culture proven notifiable diseases • Hawaii ( 1999) • Pittsburgh (Panackal et al EID 2002) • Findings (and methods) similar • Quicker • More complete reporting
Other-than-Culture-Proven Diseases • New approach (e.g., PA NEDSS) is “form on line” • No published comparisons of the completeness of traditional paper based approaches versus form-on-line. • Word of mouth is that more cases are being reported but whether that is true and whether it is persistent and whether it is due to fear of disease or the system needs to be teased out. • Timeliness also needs to be studied.
Chief Complaints • Chief complaints entered by a triage nurse upon admission to an emergency facility are available electronically from hospitals in the United States. (Paper in AMIA 2003 Proceedings) • ICD-9 coded versus free text • How to group into syndromic categories is a major major question • There exist several categorizations (CDC consensus, RODS, WRAIR)
Detection Performance from ICD-9 coded Chief Complaints • Respiratory Case Detection (Espino 2001) Sensitivity 0.43 Specificity >95% • Respiratory Outbreak Detection (Tsui 2001) Small sample 1/1 detected, 1 false alarm • Diarrhea Case Detection (Ivanov 2002) Similar results to Espino • Main points • These studies provide methods • More studies needed of more syndromes and more outbreaks
Using Free text Chief Complaints (and Natural Language Processing) Also needed for Web or call center queries, and radiographs “cough” NLP “respiratory prodrome”
GI prodrome Output CoCo Naive Bayesian Parser Maps free-text chief complaint to one of seven prodrome categories (or an eighth category—none) P(Respiratory|NVD)= .05 P(Botulinic|NVD)= .001 P(Constitutional|NVD)= .01 P(GI|NVD) = .9 P(Hemorrhagic|NVD)= .001 P(Neurologic|NVD)= .001 P(Rash|NVD)= .001 P(None|NVD)= .036 “N/V/D” Chief complaint CoCo Naive BayesClassifier
Sensitivity, specificity and likelihood ratio positive (LR+) measurements for the CoCo classifier using the Utah Department of Health emergency department gold standard. CoCo Syndrome UDOH Syndrome Sensitivity Specificity LR+ Respiratory Respiratory infection with fever* 0.52 0.89 5 Gastrointestinal Gastroenteritis without blood 0.71 0.90 7 Encephalitic Meningitis / encephalitis 0.47 0.93 7 Rash Febrile illness with rash* 0.50 0.99 56 Botulinic Botulism-like syndrome 0.17 0.998 104 *Required documentation of fever in the patient record. Validation: CoCo Naïve Bayes vs. UDOH Manual ED log review Courtesy Per Gesteland, MD
Detecting Respiratory Outbreaks by Monitoring Chief Complaints Hospital P&I Diagnoses Respiratorychief complaints SDs from Mean 7 Years Ivanov and Gesteland
Detecting Respiratory Outbreaks in Children by monitoring Chief Complaints Detection from CCs precede that from admissions by 9 days (95% CI -5-23) kids respiratory (lower respiratory infections): pneumonia influenza bronchiolitis bronchitis
Detecting GI Outbreaks in Children by monitoring Chief Complaints Detection from CCs precede that from admissions by 23 days (95% CI 12-33) gastroenteritis rotavirus
WHICH IS BETTER, ICD-9 or FREE TEXT? At detecting Cases of Acute Infectious GI (Ivanov, Wagner, Chapman)
WHICH IS BETTER, ICD-9 or FREE TEXT? At detecting Acute Lower Respiratory Illness from Chief Complaints (Espino, Wagner, Dowling, Chapman)
Chest Radiograph Reports • Radiologists dictate a report for most chest radiographs performed in the United States. • Reports are transcribed after dictation and available electronically with a twelve to twenty-four hours latency. • The reports describe specific findings important for detection of infectious diseases of the lower respiratory tract such as SARS, Plague, Tularemia, inhalational Anthrax. • The granularity of the information is quite specific and allows for detection of different patterns of pneumonia, pleural effusions, and mediastinal widening. • The data are identified at the level of the individual patient and can therefore be pinpointed to home location and correlated with other patients to detect clusters of cases,
Detecting Febrile Illness • Coded temperature (Possibly best, but rarely recorded electronically and may be normal) • From NLP of chief complaints • By NLP of Emergency Department (ED) dictation • Sensitivity = 0.98 • Specificity = 0.89 • ~1 day delay
SARS Respiratory symptom Travel to Asia Fever PositiveChest x-ray Individual SARS Case Detection 1. Cough or other respiratory symptom 2. Temperature > 38 C 3. Chest x-ray showing pneumonia or ARDS 4. High risk of exposure
Summary of an Automatic SARS “Syndromic” Strategy (A stretch!)
Lab Test Ordering • Motivation: What if you saw a large number of blood culture orders for people with home addresses in one zip code? • Availability from national laboratory companies (maybe 10-20% coverage of all tests done, perhaps less for infectious disease testing which is done in hospitals) • Barriers: need standards • Demonstrated value: no published studies!
Table 4.1. Clinical systems, data, and market penetration (estimated) Legend: ED, emergency department; LTCF, long term care facility; -, not applicable; ?, unknown
Take Home Message: OTC • Availability is better and more fully proven than any other data type because of the National Retail Data Monitor Project • Value also better understood of all unconventional types of data because of research, although still a lot to do
National Retail Data Monitor: How it Works • OTCs products are UPC bar coded • Retail stores scan purchases • Seven chains (18,000 stores) agreed to send daily sales data • NRDM groups the UPC-level sales data into categories like “cough syrup, pediatric liquid” • NRDM makes data available to health departments via • Web interface: 200+ accounts/33 States • Raw data feeds: • New York State, New York City, National Capital Area (MD, VA, DC), CDC, New Jersey, Georgia • Indiana and Norfolk under development NRDM
OTC Product Categories • There are approximately 7500 products (UPC codes) used for self-treatment of infectious diseases • We group them into 18 analytic classes at present (“categories”) • Antifever Pediatric (274) • Antifever Adult (1340) • Bronchial Remedies (43) • Chest Rubs (78) • Diarrhea Remedies (165) • Electrolytes Pediatric (75) • Hydrocortisones (185) • Thermometer Pediatric (125) • Thermometer Adult (313) • Cold Relief Adult Liquid (709 products) • Cold Relief Adult Tablet (2467) • Cold Relief Pediatric Liquid (323) • Cold Relief Pediatric Tablet (74) • Cough Syrup Adult Liquid (592) • Cough Syrup Adult Tablet (32) • Cough Syrup Pediatric Liquid (24) • Nasal Product Internal (371) • Throat Lozenges (364) Numbers in parenthesis are the number of UPC codes in the category
Detecting Cryptosporidium from Sales of OTC Diarrhea Remedies • Diarrhea remedies = {Kaopectate,Imodium,Pepto} • Stirling et al* • Large, waterborne outbreak of Cryptosporidium in late March/April 2001 • Convenience sample of three pharmacies in North Battleford, Saskatchewan • Approximately 5-fold increase in all three pharmacies (relative to baseline established from Jan 2001 to early March 2001) • Two pharmacies provided March/April 2000 data and those data showed no similar increase • Sales peaked weeks before precautionary drinking water advisory and days to weeks before peak onset of diarrhea *Stirling R, Aramini J, Ellis A, et al. Waterborne cryptosporidiosis outbreak, North Battleford, Saskatchewan, Spring 2001. Can Commun Dis Rep. Nov 15 2001;27(22):185-192.
2001 Crypto in North Battleford … Precautionary water advisory issued on 4/26 Detectable peak on 4/2 in sales of over-the-counter antidiarrheals
Detecting Crypto from Sales of OTC Antidiarrheal (cont) • Rodman et al* • Cryptosporidium outbreak in Collingwood, Ontario Feb/March 1996 • 3/12 pharmacies that were asked gave data • Pharmacy 1: 26 fold increase in sales in Feb 1996 as compared to February 1995 • Pharmacy 2: 1Q 1996 sales were 3 fold 1Q 1995 • Pharmacy 3: Reported no change in sales • Outbreak detected 3/5 • Yet another Cryptosporidium outbreak in Canada (Kelowna and Cranbrook, British Columbia) • All pharmacists (10-12 of them in each city) interviewed acknowledged increased sales (but there was no data available for study) *Rodman JS et al. Pharmaceutical sales: A method of disease surveillance. Journal of Environmental Health, Nov 1997:8-14. **Proctor et al. Surveillance data for waterborne illness detection: an assessment following a massive waterborne outbreak of Cryptosporidium infection. Epidemiol Infect. 1998;120(1):43-54.
Detecting Crypto from Sales of OTC Antidiarrheal (cont) • Proctor et al** • Studied the famous 1993 Milwaukee Cryptosporidium outbreak • One pharmacy in outbreak area provided monthly counts of unit sales • Sales for month of March showed three-fold increase over baseline (March 1994/March 1995) • Public health awareness of outbreak – April 5 • Identified need for knowledge of geographic distribution of water supply to improve outbreak detection (North Milwaukee vs. South Milwaukee)