220 likes | 341 Views
Verification Summit AMB verification : rapid feedback to guide model development decisions. Patrick Hofmann, Bill Moninger , Steve Weygandt , Curtis Alexander, Susan Sahm. Motivation.
E N D
Verification Summit AMB verification:rapid feedback to guide model development decisions Patrick Hofmann, Bill Moninger, Steve Weygandt, Curtis Alexander, Susan Sahm
Motivation There is a critical need for both rapid and comprehensive statistical and graphical verification of model forecasts from various AMB experimental models: RUC, RR, and HRRR • Real-time parallel cycles as well as retrospective runs • Two primary types: • Station verification : Upper-air, surface and clouds • Gridded verification: Precipitation, radar reflectivity, convective probabilities • Illuminate model biases and patterns to errors • Essential for evaluating model/assimilation configuration changes Rapid verification feedback enables timely improvement in forecast skill
Design Goals • Fast computation and display of verification results (real-time for real-time cycles, day or two for retros) • Simple procedures, but with sufficient options to elucidate key aspects (quantify visual impressions) • Built-in capabilities to allow quick stratification by key parameters (metric, threshold, scale, valid time, initial time, forecast length, region) Easily accessible web-based presentation of verification results ability to quickly examine aggregate statistics AND single-case plots in complementary manner Verification design driven by needs of forecast system developers
Design Details • Use modified NCEP IPOLATES routines for interpolation and upscaling of input fields to multiple common grids. • Calculate contingency table fields (YY, YN, NY, NN) for multiple scales, domains, and thresholds: -- database storage for statistical aggregation -- graphics for each event for detailed evaluation • Web-based interface for aggregate statistics and event graphics • Apply to multiple gridded fields (reflectivity, precipitation, probabilities) and multiple model runs (several version each of RUC, RR, HRRR as well as RCPF, HCPF, etc.)
Statistics Webpages • Composite Reflectivity • Time Series: http://ruc.noaa.gov/stats/radar/beta/timeseries • Valid Times: http://ruc.noaa.gov/stats/radar/beta/validtimes • Lead Times: http://ruc.noaa.gov/stats/radar/beta/leadtimes • 24 Hour Precipitation • Time Series: http://ruc.noaa.gov/stats/precip/beta/timeseries • Thresholds: http://ruc.noaa.gov/stats/precip/beta/thresholds • Convective Probabilities • Time Series: http://ruc.noaa.gov/stats/prob/beta/timeseries • CSI vs Bias: http://ruc.noaa.gov/stats/prob/beta/csibias • Reliability Diagrams: http://ruc.noaa.gov/stats/radar/prob/reliabilitydiagrams • ROC Curves: http://ruc.noaa.gov/stats/prob/beta/roc
Sample “time-series” stats interface Many R/T runs and retros Scale Averaging period Region Model Metric Date Range Threshold Forecast Length Valid time
Sample application of “time-series” stats Region Reflectivity (> 25 dBZ) CSI Eastern US on 40 km grid Thresh Metric Scale Models HRRR-dev HRRR RR-dev w/ Pseudo-obs (3-day avg) Difference HRRR-dev better HRRR better “Time series” mode
Sample application of “time-series” stats Region Reflectivity (> 25 dBZ) CSI Eastern US on 40 km grid Thresh Metric Scale Models HRRR-dev HRRR RR-dev w/ Pseudo-obs (3-day avg) Difference HRRR-dev better HRRR better “Time series” mode
Sample application of “time-series” stats Region Reflectivity (> 25 dBZ) CSI Eastern US on 40 km grid Thresh Metric Scale Models HRRR-dev HRRR HRRR-dev Longer time-step RR-dev w/ Pseudo-obs (3-day avg) Difference Implemented in RR-prim HRRR-dev better HRRR better “Time series” mode
Sample application of “time-series” stats Region Reflectivity (> 25 dBZ) CSI Eastern US on 40 km grid Thresh Metric Scale Models HRRR-dev HRRR HRRR-dev Longer time-step RR-dev Added shorter vert. length-scales in RR-dev/GSI RR-dev w/ Pseudo-obs (3-day avg) Imple- mented In HRRR Difference Implemented in RR-prim HRRR-dev better HRRR better “Time series” mode
Sample “time-series” stats to examine scatter in forecast differences CSI 25 dBZ 40-km EUS +6h fcst 8-22 Aug RUC HRRR Better RR HRRR better August
Sample application of “lead-time” stats illustrating CSI and bias “die-off” for different strengths of radar heating CSI (X100) Bias (X100) 0 2 4 6 8 10 0 2 4 6 8 10 Forecast Length (hours)
Sample application of “valid time” stats illustrating diurnal variation in scale-dependent skill • Upscaled verification (especially to 40km and 80km) reveals “neighborhood” skill in HRRR forecasts, especially around the time of convective initiation Convective Initiation time 00z 04z 08z 12z 16z 20z 00z HRRR 25dBZ, 6-h fcst 80-km CSI (x 100) 40-km 20-km 3-km Valid Time (GMT)
Reflectivity Graphics Webpage http://ruc.noaa.gov/crefVerif/Welcome.cgi
Single case plots showing “neighborhood” skill Obs Refl. HRRR fcst Miss FA Hit 40-km 3-km 12z + 6 hr
Sample application of “threshold” stats to show skill for range of precip amounts RR CSI (x 100) RUC RR vs. RUC Precipitation Verification | | | | | | | | 0.01 0.10 0.25 0.50 1.00 1.50 2.00 3.00 in. RR 13-km CONUS Comparison 2 X 12 hr fcst vs. CPC 24-h analysis 1 – 31 Dec 2010 Matched 100 (1.0) bias (x 100) RUC | | | | | | | | 0.01 0.10 0.25 0.50 1.00 1.50 2.00 3.00 in.
Precipitation Graphics Webpages http://ruc.noaa.gov/precipVerif
observed Single case plots showing forecast skill for precip. RR vs. RUC 24-h precip. verif 2 x 12h fcst interpolated to 20-km grid CPC 24-h precip RUC RR Thrs CSI Bias 1.00 .31 0.69 2.00 .21 0.58 Thrs CSI Bias 1.00 .45 1.22 2.00 .29 1.95
observed Single case plots showing forecast skill for precip. RR vs. RUC 24-h precip. verif 2 x 12h fcst interpolated to 20-km grid CPC 24-h precip Miss FA Hit RUC RR 1” threshold Thrs CSI Bias 1.00 .31 0.69 2.00 .21 0.58 Thrs CSI Bias 1.00 .45 1.22 2.00 .29 1.95
Work in progress, have display for CCFP and CoSPA probabilities Sample display of probability verification statistics Plan to add HCPF, RCPF, expand to probabilities of other hazards (fog, high echo-tops, etc.) CSI vs. bias ROC curve 2-h fcst 4-h fcst 6-h fcst
Sample Reliability Diagram All plots can zoom 2-h fcst 4-h fcst 6-h fcst
Conclusion • The verification system, including both the statistical and graphical webpages, greatly aids evaluation of model performance within AMB and facilitates rapid assessment of experimental configurations and improvements in real-time. • We are also able to verify retrospective cases of scientific interest in very quick succession for use in presentations and publications for outreach endeavors.