80 likes | 225 Views
DET: Testing and Evaluation Plan. Barbara Brown 1 , Ed Tollerud 2 , and Tara Jensen 1 1 NCAR/RAL, Boulder, CO and DTC 2 NOAA/GSD, Boulder, CO and DTC. Wally Clark. DTC and DET Testing and Evaluation. T&E is one of the most important activities undertaken by the DTC
E N D
DET: Testing and Evaluation Plan Barbara Brown1, Ed Tollerud2, and Tara Jensen1 1 NCAR/RAL, Boulder, CO and DTC 2 NOAA/GSD, Boulder, CO and DTC Wally Clark
DTC and DET Testing and Evaluation • T&E is one of the most important activities undertaken by the DTC • DTC testing has involved WRF core comparisons, boundary layer schemes, and other aspects of NWP • DTC has created “Reference Configurations” (RCs) that are to be re-tested in conjunction with model changes • DET infrastructure is being developed to allow • Testing and evaluation and • Intercomparison of ensemble systems and system components
Major categories of testing • Forecasting system comparisons • Compare forecasts based on one configuration with forecasts based on a different model configuration • Examples • Two types of model initialization • Two or more methods of statistical post-processing • Individual reference configuration • Model “setup” is evaluated • Setup is re-evaluated when model changes are implemented • Reference configurations may be defined by • Operational centers • Users • RCs may also be community-contributed • Forecasts contributed by a modeling group • Ex: Forecasts evaluated in HWT and HMT projects
DTC Testing and Evaluation Principles • A formal test plan is developed, defining all of the important aspects of the testing and evaluation • Developer may have a role in helping to create the test plan • Execution of test is independent of the developer • Focus of test depends on the questions that are of interest • Module being used • Variables of interest • Many cases evaluated for statistical significance • Not just a few case studies • Multiple seasons, times of day, etc. • Meaningful stratifications • Location/region • Season • Other user-based criteria
Components of a test plan (example) • Goals • Experiment design • Codes • Specification of the codes will be run as part of the test • Model output • What kinds of output will be produced? • Forecast periods • Post-processing • Verification • Statistical methods and measures • Graphics generation and display • Data archival and dissemination of results • Computer resources • Deliverables Example from QNSE evaluation (surface T and wind)
Questions to address when developing a test plan • Which aspect(s) (or modules)of the ensemble system will be evaluated? • What performance aspects are we trying to compare? Or evaluate? • Who are the “users”? • What are the variables of interest? Answers to these questions will lead to determination of the other aspects of the plan
Considerations for ensemble T&E • Number of cases will likely need to be increased (over non-ensemble evaluations) • Many probabilistic and ensemble verification scores (e.g., reliability) require relatively large subsamples • Subsamples must be large enough to assess statistical significance • But – Sampling must be focused enough for representativeness • Verification approaches and metrics are somewhat unique • Computer resources may be a limitation
Other considerations • Real-time vs. post-analysis • DTC intensive tests generally done in post-analysis • Real-time demonstrations also have many benefits (e.g., HMT, HWT) • Subjective evaluations – should these be considered for DET T&E? • How much rigorous end-to-end testing required vs. evaluation of individual components? Example for HMT evaluation – winter 2010