270 likes | 283 Views
Discover how monitoring OTC medication sales can aid in early bio-terrorism detection. Explore traditional and non-traditional data sources, challenges faced in bio-surveillance, and a proposed automated detection system. Learn about using daily OTC sales data for timely outbreak manifestation. Evaluate a detection system with de-noising and forecasting methods, and assess anthrax attack signatures. Future applications and ongoing research are discussed for improving bio-surveillance.
E N D
Early Statistical Detection of Bio-Terrorism Attacks by Tracking OTC Medication Sales Galit Shmueli Dept. of Statistics and CALD Carnegie Mellon University With Stephen Fienberg (Statistics) Anna Goldenberg & Rich Caruana (CS)
Overview • Current bio-surveillance systems • Monitoring traditional data • Using simple SPC methods • Early detection • Use of non-traditional data • Building a flexible, automated detection system • Evaluating the system • Results and enhancements
Traditional Data Sources • Public health sources • School absence records • Sentinel practices • Laboratory data • Medical sources • Patient visits at urgent care, outpatient clinics, emergency rooms • Speed of detection: weeks after the actual occurrence • Rate of data arrival
Why is detection slow? • Data arrives late • Projects using electronic reporting systems: • Influenza surveillance system (U of Utah) • Tracking ICD9 codes (U of Pittsburgh) • Future: increasing availability of electronic means for gathering surveillance data • Data available on weekly or monthly scale • Data are nation-wide • Signature of outbreak in data is late!
Non-Traditional Data • Data that indirectly measure symptoms • Over-the-counter medication and grocery sales • Web browsing at medical websites • Automatic body tracking devices • Different levels of availability • Regional, localized data • Confidentiality issues
Lab Flu WebMD School Cough& Cold Throat Resp Viral Death weeks Manifestation of Flu in Traditional and Non-Traditional Data
OTC Medication and Grocery Sales • Benefits • Manifestation of outbreak is very early • Timeliness in collection and reporting (daily) • Extremely detailed (basket-level) • Drawbacks • No info about epidemic manifestation in sales data • Requires knowledge about marketing efforts (sales, discounts) • If outbreak replicates sales patterns – hard to detect (Holidays are a big challenge) • Hard to model!
Prior Uses of Non-Traditional Data • Diarrheal Disease Surveillance: data from 38 drug stores in NY (Mikol et al., 2000) • Monitoring near-real-time satellite vegetation and climate data for predicting emerging Rift Valley Fever epidemics in East Africa (DoD and NASA, 2001)
Description of Our Data • Daily sales of several OTC medication groups for 541 days between Aug 8,’99 to Jan 31,‘01 • Concentrated on cough&cold medication (inhalational symptoms): • Cough medication • Tabs & Caps • Nasal medication
Hypothetical Scenario of an Inhalational Anthrax Attack • Symptoms: almost all typical to flu! • fever • fatigue • cough • mild chest discomfort • but no runny nose (!) • Death may occur within 24-36 hours
Overview • Current bio-surveillance systems • Non-traditional data • The detection system • An evaluation method • Results and Conclusions • Future work
The Detection System • Take into account special features of OTC and grocery sales data • Time series • Seasonality • Weekday/Weekend effect • Stores closed on certain days • Influence of total sales patterns • Very noisy, non-stationary • Create automated system
Layers of the Detection System Preprocessing De-noising Forecasting next day sales Creating a threshold Real-time sales > threshold NO YES New day sales WARNING! – POSSIBLE BEGINNING OF AN EPIDEMIC/ATTACK
De-Noising • Target: obtain main features of data, reduce noise to improve predictability • Selected method: Discrete Cosine Transform with horizontal filtering • How much to de-noise? • Retain minimal coefficient set that • Maximizes accuracy • Optimizes predictability • Use cross-validation and MSE-based criteria
De-Noising: DCT with Horizontal Filtering de-noised set 1 de-noised set 2
Forecasting • Target: Predict next day sales • Use pre-processed, de-noised data • Problem: non-stationary (ARIMA doesn’t work) • Method: 1) decompose with wavelets 2) predict each wavelet resolution 3) sum to obtain overall prediction
Threshold Selection: SPC • Based on empirical distribution of residuals (real values –predictions), we fit a “3σ” limit
Overview • Current bio-surveillance systems • Non-traditional data • The detection system • An evaluation method • Results and Conclusions • Ongoing work (basket-level data) • Future work
spike base 1 2 3 day Evaluating the System • How fast does it detect an anthrax footprint? • Problems: • data does not include outbreak signature • We don’t know what signature looks like in such data • Solution: simulated signature Inhalational anthrax signature
Constructing the Signature • Sverdlovsk outbreak, 1979 Based on data from Meselson et al., Science (1994)
Anthrax Signature in OTC Sales • Add signature at each data point sequentially, and look at rate of detection • Try different slopes, heights • Compare different configurations of system for different signatures slope = 1/3 Detects 100% of spikes within 3 days for height = 1.3(data range)
Results and Conclusions • The detection system • works with grocery data • detects simulated footprint quickly • has low false alarm rate • The system is flexible (tools are interchangeable) • Almost fully automated, efficient computation • “Perfect bio-attack” is on holiday
Future Work • Combine with traditional medical and public health data sources • Aggregated data: Track several series simultaneously • Basket data: Utilize other features of grocery data such as spatial factor, customer information