Using Weather Data to Improve Detection of Aerosol Releases

Using Weather Data to Improve Detection of Aerosol Releases William Hogan, MD, MS

Wind direction 2 days ago Bad influenza outbreak or anthrax?

Need to look at areas outside the “linear” pattern to determine whether they also have increased level of disease

Scan Statistics • Dr. Moore already described • Rectangular regions • Compare alignment to recent wind directions

Goal: Automate this type of outbreak detection Meselson M et al. The Sverdlovsk Anthrax Outbreak of 1979. Science, 1994;266(5188):1202-1208.

How to Get There Output of model Model Input data Detection Algorithm Was there a release? “Inverse” Dispersion Model Weather data Location of release? (in 3 dimensions) Model of disease Surveillance data Amount of substance released? There are two pieces to build!

Traditional Dispersion Problem Meteorological data Atmospheric concentration of substance at any given downwind location Amount of substance released Dispersion Model Location of release (in 3 dimensions) Physical characteristics of substance

Location of release? (in 3 dimensions) Amount of substance released? Detection Problem Meteorological data Was there a release? “Inverse” Dispersion Model Distribution of cases Physical characteristics of substance Model of disease

“Inverse” Gaussian Plume Atmospheric concentration at n points (usually n=4) “Inverse” Gaussian Plume Model “Best” set of release parameters Heuristic search over large space of possible release parameters

Inverse Gaussian Plume - Performance • Differences between actual and found values on randomly selected release scenarios:

Disease Model Aerosol Background level of disease

# spores inhaled Day of week Zip code Month of year t Anthrax Disease Model Anthrax aerosol Background respiratory disease P(resp CC due to anthrax) P(resp CC due to background) Today’s count of resp CCs in zip code

Preliminary Results • False positives • Ran on 7 months of data not used for training • No false positives with threshold (likelihood ratio) of 1 • Simulated aerosol anthrax release: *Values are difference from actual

Algorithm Evaluation William Hogan, MD, MS

Algorithm Evaluation • Performance measurements • Datasets for evaluation • Additional Considerations • Examples

Performance Measurements • Timeliness • False alarm rate • Sensitivity

Timeliness • Relatively new metric for evaluating performance • If two algorithms have identical sensitivity and false alarm rates, then: The algorithm that detects outbreaks earlier is better • Outbreak detection falls into a class of problems known as “activity monitoring”: Activity Monitoring Operating Characteristic (AMOC) analysis* *See Fawcett T, Provost F. Activity monitoring: Noticing interesting changes in behavior. In Proc. Fifth International Conference on Knowledge Discovery and Data Mining, pages 53--62, 1999.

Characteristics of Activity Monitoring Domains • Goal is to identify in a timely fashion that positive activity (e.g. outbreak) has begun • Goal is NOT to classify each individual observation or data point as representing positive or negative activity • Alarms after the first alarm may add no value • No fixed notion of true negative

False Alarm Rate • Do not need datasets that include outbreaks of interest • Procedure: • Run algorithm on time series with no outbreaks • Count false alarms • Divide by length of time series

Sensitivity • Fraction (or percentage) of outbreaks detected • NOT the fraction (or percentage) of data points correctly classified as occurring during the outbreak

Standard WSARE2.0 WSARE2.5 WSARE3.0 Typical AMOC Plot

Evaluation Datasets • Real outbreaks • Hard to get enough to achieve statistical significance • Getting datasets from real outbreaks is difficult • Seasonally recurring respiratory & gastrointestinal illness • Not as specific as single disease • May not be a good proxy for outbreaks we want to detect • Simulated outbreaks added to real baseline data (semisynthetic datasets) • Use epidemic curves from real outbreaks of interest to create shape of outbreak • Simulate entire outbreak (e.g., aerosol anthrax release)

Determining Shape From Real Outbreak Fit curve using expectation-maximization algorithm 4.6 std dev increase

Simulate Entire Outbreak Disease/ Population models Aerosol Models of atmospheric dispersion Models of illness behavior

Additional Considerations • Vary size and duration of epidemic curve to understand limits of detectability • Include effects of percent coverage • Sensitivity analysis over assumptions of simulation models is crucial

Taking Into Account Coverage • North Battleford curve derived from one pharmacy • Naïve approach – add 4.6 standard deviation increase to baseline • Assuming 10 stores and all have same mean and standard deviation: • Mean and variance increase 10 fold • Standard deviation increases sqrt(10) times • Peak of curve is thus 14.5 standard deviations

Examples of Simulation Model Assumptions • Infectious dose of anthrax spores • Incubation period • Illness behavior - probability that affected individual, at a given time after symptom onset, will: • Purchase OTC • Go to ED • Call primary care MD

Example 1: DARPA ’03 Algorithm Challenge • All real data • Three data sets from 5 cities over 4 years • Data sets • military outpatient visits to MD • civilian outpatient visits to MD • military prescription drugs • Cities: Charleston, Louisville, Norfolk, Pensacola, Seattle • Years • Training - 1/1 or 6/1/1999 through 8/31/2002 • Test – 9/1/02 through 5/31/03

DARPA ’03 Challenge (cont) • Method: • Outbreak detection group • Public health individuals • From all but one contractor (us) • Determined gold-standard outbreaks • Training data to contractors for ~6 weeks • Test data for 2 weeks, then submit results • Java application to score results

Pitt/CMU Algorithm Results Syndrome False Alarm Best Algorithm Median Sensitivity Rate Timeliness wav8ssm_max GI 1 per 2 weeks 1 7/7 PANDA1 (change point statistic) 1 per 4 weeks 1 6/7 1 per 6 weeks - 1 6/7 PANDA1 (change point statistic) wav8ssmtwrf_sum RESP 1 per 2 weeks 1 8 /8 1 per 4 weeks 1 8/8 wav8ssmtwrf_sum Sick -avail (best sensitivity) 1 per 6 weeks 6 8/8 wav8ssm_max (best timeliness, 1 7/8 Took into account holidays but gold standard may not have tie) wav8ssmtwrf_max (best 1 7/8 timeliness, tie)

Limitations of DARPA Eval • Small n (only 15 outbreaks) • Natural “outbreaks” studied were: • not homogeneous (seasonal respiratory and GI) • how similar to outbreaks of interest? • subjectively determined, albeit by experts

False Alarm Rate=25% 10% 5% 2% Sensitivity Detection Delay (Days from Start of Inject) Example 2: Injected North Battleford Outbreak

Using Weather Data to Improve Detection of Aerosol Releases

Using Weather Data to Improve Detection of Aerosol Releases

Presentation Transcript

Using Data to Improve student Learning

Using SOL Data to Improve Instruction

Using Data to Improve Student Learning

Using data to improve patient care

Using Data to Improve Practice

Using Data to Improve Learning

Using Weather Information to Improve Route

Using Program Data to Improve APS Performance

Using Weather Stations to Improve Irrigation Scheduling

Using Perkins Data To Improve CTE

Using Data to Improve student Achievement

Using Weather Data

Using Data to Improve Your Outcomes

Anomaly Detection Using “Normal” Data

USING DATA TO IMPROVE PERFORMANCE

Using Data to Improve Student Achievement

Using Data to Improve Instruction in Science:

USING DATA TO IMPROVE STUDENT SERVICES

Using Data to Improve Student Achievement

Using Data to Improve Indicator Outcomes.

Using Data to Improve Student Achievement

Using Data to Improve Student Achievement