330 likes | 463 Views
The 2008 Artificial Intelligence Competition. Valliappa Lakshmanan National Severe Storms Laboratory & University of Oklahoma Elizabeth E. Ebert Bureau of Meteorology Research Center, Australia Sue Ellen Haupt Penn State University, State College, PA.
E N D
The 2008 Artificial Intelligence Competition Valliappa Lakshmanan National Severe Storms Laboratory & University of Oklahoma Elizabeth E. Ebert Bureau of Meteorology Research Center, Australia Sue Ellen Haupt Penn State University, State College, PA Sponsored by Weather Decision Technologies lakshman@ou.edu
Why a competition? • AI committee organizes: • Conference with papers • Tutorial session before conference (every 2 years) • The tutorial sessions are very popular, but: • Gets repetitive • Same set of techniques presented too often • Often by same speakers! • Not clear what the differences are • Different datasets, etc. • Can I not just use a machine intelligence or neural network toolbox? • Purpose of competition is to replace tutorial but provide learning experience • Same dataset, different techniques • Competitive aspect is just a sideshow – don’t put too much stock into it! lakshman@ou.edu
The 2008 Artificial Intelligence Competition Dataset Results lakshman@ou.edu
Project 1: Skill Score By Storm Type • Try to answer this question (posed by Travis Smith) • Very critical, but hard to answer based on current knowledge • Is it the type of weather or is it the forecaster skill? • Initially, concentrate on tornadoes • Based on radar imagery, classify the type of storms at every time step • Take NWS warnings and ground truth information for a lot of cases • Compute skill scores by type of storm • Summer REU project • Eric Guillot, Lyndon State • Mentors: Travis Smith, Don Burgess, Greg Stumpf, V Lakshmanan Does the skill score of a forecast office as evaluated by the NWS depend on the type of storms that the NWS office faced that year? lakshman@ou.edu
Project 2: National Storm Events Database • Build a national storm events database • With high-resolution radar data combined from multiple radars • Derived products • Support spatiotemporal queries • Collaboration between NSSL, NCDC and OU (CAPS, CSA) lakshman@ou.edu
Approach • Project 1: How to get classify lots and lots of radar imagery? • Need automated way to identify storm type • Technique: • Cluster radar fields • Extract storm characteristics for each cluster • Associate storm characteristics to human-identified storm type • Train learning technique (NN/decision tree) to do this automatically • Let it loose on entire dataset • Project 2: How to support spatiotemporal queries on radar data? • Can create polygons based on thresholding data • But need to tie together different data sources • Need automated way to extract storm characteristics for querying lakshman@ou.edu
WDSS-II CONUS Grids • In real-time, combine data from 130+ WSR-88Ds • Reflectivity and azimuthal shear fields • Use these to derive products: • Reflectivity Composite • VIL • Echo top heights • Hail probability (POSH), Hail size estimates (MESH), etc. • Low-level, mid-level shear • Many others (90+) • Have the 3D reflectivity and shear products archived • Can use these to recreate derived products lakshman@ou.edu
Cluster Identification Using Kmeans • Hierarchical clustering using texture segmentation and K-means clustering • Lakshmanan, V., R. Rabin, and V. DeBrunner, 2003: Multiscale storm identification and forecast. J. Atm. Res., 67, 367-380 • Technique yields 3 different scales of clustering • Chose D to train the decision tree • Cluster attributes at 420 km^2 (scale D) used for our study lakshman@ou.edu
Manual Storm Classification • Manually classified over 1,000 storms over three days worth of data (March 28th, May 5th, and May 28th of 2007). • Used all the fields ultimately available to automated algorithm • VIL, POSH, MESH, Rotation Tracks, etc. • Available in real-time at http://wdssii.nssl.noaa.gov/ over entire CONUS lakshman@ou.edu
Hail Case (Apr. 19, 2003; Kansas) Reflectivity Composite from KDDC, KICT, KVNX and KTWX lakshman@ou.edu
Echo Top Height of echo above 18 dBZ lakshman@ou.edu
MESH Maximum expected size of hail lakshman@ou.edu
VIL Vertical Integrated Liquid lakshman@ou.edu
Cluster Table • Each identified cluster has these properties: • ConvectiveArea in km^2 • MaxEchoTop and LifetimeEchoTop • MESH and LifetimeMESH • MaxVIL, IncreaseInVIL and LifetimeMaxVIL • Centroid, LatRadius, LonRadius, Orientation of ellipse fitted to cluster • MotionEast, MotionSouth in m/s • Size in km^2 • One set of clusters per scale • We used only the 420km^2 cluster lakshman@ou.edu
Controlling the Cluster Table • Can choose any gridded field for output • From gridded field, can compute the following statistics within cluster • Minimum value, Maximum value • Average, Standard deviation • Area within interval (Useful to create histograms) • Increase in value temporally • Does not depend on cluster association being correct • Computed image-to-image • Lifetime maximum/minimum • Depends on cluster association being correct, so better on larger clusters lakshman@ou.edu
Input Parameters Continued on next slide lakshman@ou.edu
Input Parameters (contd.) lakshman@ou.edu Continued on next slide
Input Parameters (contd.) lakshman@ou.edu
Types of Storms • Four categories: • Not organized • Isolated supercell • Convective lines • Includes lines with embedded supercells • Pulse storms lakshman@ou.edu
Decision Tree Training • Trained decision tree using manually classified storms in order to develop a logical process for automatically classifying them • Tested this decision tree on three additional cases (April 21st of 2007, and May 10th and 14th of 2006) • TSS=0.58; good enough for NWS study to continue lakshman@ou.edu
Decision Tree • Why decision tree? • Didn’t know whether the dataset was tractable • Wanted to be able to analyze resulting “machine” • Make sure extracted rules were reasonable lakshman@ou.edu
The 2008 Artificial Intelligence Competition Dataset Results lakshman@ou.edu
Entries • Received 6 official, and one unofficial, entry by competition deadline • Unofficial entry not accompanied by abstract or AMS manuscript • Neil Gordon (Met Service, New Zealand): random forest • Not eligible for prize, but included in comparisons • Official Entries: • John K. Williams and Jenny Abernathy: random forests and fuzzy logic • Ron Holmes: neural network • David Gagne and Amy McGovern: boosted decision tree • Jenny Abernathy and John Williams: support vector machines • Luna Rodriguez: genetic algorithms • Kimberly Elmore: discriminant analysis and support vector machines lakshman@ou.edu
Truth Baseline Abernethy & Williams Elmore & Richman Gagne & McGovern Gordon Holmes Rodriguez Williams & Abernethy Distribution of storm categories lakshman@ou.edu
Baseline Abernethy & Williams Elmore & Richman Gagne & McGovern Not severe Isolated supercell Convective line Pulse storm Holmes Rodriguez Williams & Abernethy Gordon Classifications for observed class 0 (Not severe) lakshman@ou.edu
Baseline Abernethy & Williams Elmore & Richman Gagne & McGovern Not severe Isolated supercell Convective line Pulse storm Holmes Rodriguez Williams & Abernethy Gordon Classifications for observed class 1 (Isolated supercell) lakshman@ou.edu
Baseline Abernethy & Williams Elmore & Richman Gagne & McGovern Not severe Isolated supercell Convective line Pulse storm Holmes Rodriguez Williams & Abernethy Gordon Classifications for observed class 2 (Convective line) lakshman@ou.edu
Baseline Abernethy & Williams Elmore & Richman Gagne & McGovern Not severe Isolated supercell Convective line Pulse storm Holmes Rodriguez Williams & Abernethy Gordon Classifications for observed class 4 (Pulse storm) lakshman@ou.edu
Similarity matrix - % of identical classifications among entries lakshman@ou.edu
Baseline Abernethy & Williams Elmore & Richman Gagne & McGovern Gordon Holmes Rodriguez Williams & Abernethy Statistical results – True Skill Statistic Joint First Third lakshman@ou.edu
Baseline Abernethy & Williams Elmore & Richman Gagne & McGovern Gordon Holmes Rodriguez Williams & Abernethy Statistical results – Accuracy and Heidke Skill Score lakshman@ou.edu
Acknowledgements • Thanks to: • Weather Decision Technologies for sponsoring the prizes • The AMS probability and statistics committee • For loaning us Beth Ebert’s expertise • All the participants for entering competition and explaining methodology • Can be hard to find time to do “extra-curricular” work • Very grateful that you could enter this competition lakshman@ou.edu
Where to go from here? • Please share with us your thoughts and suggestions • Is such a competition worth doing? • Was this session a learning experience? • How can it be improved in the future? • Is there something that you would have done differently? Why? • Our thoughts: • Classification is not the only aspect of machine intelligence • Estimation, association finding, knowledge capture, clustering, … • Perhaps a future competition could address one of these areas • Address another aspect of AMS besides short-term severe weather lakshman@ou.edu