90 likes | 193 Views
Diagnostic verification and extremes: 1 st Breakout. Discussed the need for toolkit to build beyond current capabilities (e.g., NCEP) Identified (and began to address) 3 major questions:
E N D
Diagnostic verification and extremes: 1st Breakout • Discussed the need for toolkit to build beyond current capabilities (e.g., NCEP) • Identified (and began to address) 3 major questions: • How should confidence intervals and hypothesis tests be computed, especially when there are spatial and temporal correlations? • What methods should be included for evaluating extremes? • What diagnostic approaches should be included initially; in a 2nd tier; in a 3rd tier
Confidence intervals and hypothesis tests • Need to appropriately take into account autocorrelations • Reduce sample by eliminating cases • Block re-sampling (Candille results indicate spatial correlation may have more impact than temporal, at least for upper air) • Identify situations when parametric approaches are “ok” • Bootstrapping approaches are computer-intensive and require lots of data storage • May not always be practical in operational settings • Can also bootstrap on contingency tables • Best with percentile method • Need a way to measure confidence in spatial methods (e.g., matching, shifting, etc.) • E.g., Caren’s CIs on cluster matches • In future need to include other types of uncertainty in addition to sampling variability • E.g., obs uncertainty • Could maybe use info from data assimilation • Could consider in the future including a way to measure sensitivity to observational variations – re-sampling with random noise added to obs, or using a parametric model. This would be a way to get an initial estimate of variation in vx stats to obs uncertainty
Methods for evaluation of extremes • Need to distinguish (in our minds) between extremes and high impact weather • We really mean rare events here; need to treat in a different way statistically • User should define thresholds for extremes • May be based on quantiles of sample distribution • Could use extreme value theory to help with this (e.g., return level methods) • Need to tell user when a threshold is not appropriate (i.e., insufficient cases) • Extreme dependency score is appropriate in many cases • Also compute standard scores: Yule’s Q; odds ratio; ORSS; ETS, etc. • Look into Rick Katz’ EVT method (compares Extreme value distributions)
Diagnostic methods • Goal: Identify different tiers of methods/capabilities that will be implemented over time, starting with Tier 1 in 1st release • Initial discussion: Stratification • Friday discussion: Specific methods
Stratification • Tier 1: Based on meta-data, including time-of-day, season, location, etc. User may need to do homework to select stratifications • Tier 2: Based on other information from the model or observations, such as temperature, wind direction, etc. (any kind of scalar) Could also include non-meteorological data (e.g., air traffic) Also should include derived parameters – e.g., potential vorticity • Tier 3: Based on feature such as location or strength of jet core; cyclone track; etc.
Specific methods and capabilities • Tier 1 • NCEP operational capability • Allow user to specify space and time aggregation • Ex: User-input masks (e.g., small region, climate zone, etc.) • Ex: Allow individual station statistics, or groups of stations • Include traditional methods for • Contingency tables • Continuous variables • Probability forecasts • (Ensemble forecasts?) • GO Index • Confidence intervals for most statistics • Underlying distributions of forecasts, obs, and errors • Basic observations (surface, ACARS, soundings, radar – StII, St IV; etc.) • Extract basic forecast/obs pairs or grids for use elsewhere • Basic plotting capabilities (e.g., some capabilities from R toolkit, NRL)
Specific methods and capabilities • Tier 2 • Allow alternative obs/forecast pairs (e.g., along satellite tracks) • Additional variables • Additional spatial methods, based on VX intercomparison • Scale-dependent methods • Fuzzy methods • Etc. • “Trial” methods • Additional diagnostic graphical capabilities (e.g., NRL capabilities)
Specific methods and capabilities • Tier 3 • Integrate methods into a more user-focused framework • Incorporate user datasets • Decision making aspects
General comments • User training will be important, even for Tier 1 • But – want to let user do what they want to do • As we develop the system will need to provide some guidance on what should be done to answer particular questions • Should be able to evaluate any model on any domain • Demonstrate equivalence to NCEP output • Need for a verification method testbed capability