180 likes | 195 Views
Explore the Delta tool's performance evaluation approaches - statistical rigor and diagnostic exploration. Learn to discern model capabilities, benchmarking complexities, and data quality concerns for air quality assessments. Understand the significance of context, data interpretation, and user-oriented performance criteria. Unveil the complexities of model extrapolation and bias identification for enhanced evaluation insights in air quality modeling.
E N D
Fairmode: Some considerations on the Delta tool and model performance evaluation Helge Rørdam Olesen National Environmental Research Institute (NERI) Aarhus Universitet
Two approaches to performance evaluation • Rigorous statistical evaluation (or Operational evaluation). Consists of computing statistical performance measuresaccording to a specific protocol. The Delta tool is intended to be useful in this respect. • Exploratory (or Diagnostic) approach, where modelled and observed data are plotted in various ways. Insightful groping for clues to improve model performance. The Delta tool does have an exploration mode, but is not as flexible as certain other tools.
Comments on the approaches • The exploratory approach is indispensable. It is necessary for assuring that the model not only gives the right results, but for the right reasons. Exploratory data analyses can detect potential errors in data and model setup, and can also highlight notable features in data and reveal shortcomings of models. • Ideally, statistical performance measures – such as those in the Delta tool - should place an administrator in a position to distinguish an adequate model from a less so. However, a lot of caution is required. Pure metrics may easily be misleading.
Target plots and performance metrics • You may be deceived, both by nice-looking plots or by awfully looking plots. • The context is important! • What do the underlying data represent? The challenge to the model systems may be really severe, or it may be trivial. • Defining a ’band of acceptance’ for models requires much care in order to ensure that such criteria are not misleading. Performance statistics are totally dependent on the challenge to which you expose a model!
”Model benchmarking” • Note that the benchmarking results not just address the performance of a dispersion model. • The activity addresses not only the model but the performance of an entire system, consisting of input data (e.g. involving traffic counts, car fleet characteristics etc), a dispersion model, the user and the choices she makes on various options - and all of this is tested against measurements representing a certain scenario, which may or may not correspond to the the assumptions made. • One example of difficulties: Road construction work somewhere may affect traffic characteristics at the point of interest so the assumed input data become obsolete.
Which type of data should quality objectives refer to? • Option A: Research-grade data obtained somewhere in Europe, which are used in a common exercise? • Option B: Your own national data, obtained through national monitoring and modelling?
An important threshold for Delta users: How much effort is required to get started with the Delta tool? • Understand the context (the Directive) • To some extent read background material (JRC papers) • Download and install IDL and the tool • Get acquainted with the tool. Understand the format of data, explore potential of the tool, understand the way it works. • Prepare your own data. • Getting closer to understand the meaning of the metrics. What is good performance for the case at hand?
The logic in selection is not obvious. Give a hint like: In the left pane please select one or more models+scenarios. In the right pane select one or more parameters and stations. Various filters (Type, Parameter, Zone) are available
The issue of understanding is central to an evaluation of air quality models.One wants to know if the model is adequate, or conservative or accurate enough for one'spurposes.Data sets are limited and cannot possibly cover all possible conditions under which the models areexpected to be used.Therefore, one is forced to extrapolate model behaviour well outside the range of veracity of theparticular evaluation results.
To properly make such extrapolations requires development of an understanding of the differentcauses contributing to bias, or even lack of bias, in a model's predictions and relating those causesto the model's parametrization of physical processes.One needs to know if the model is producing the right or wrong answer for the right reason.
It appears that the goal of obtaining both reasonably objective and well-defined evaluations andadequate understanding is not attainable through the use of simple, rote approaches to thecalculation of evaluation statistics.This conclusion comes from experience in carrying out performance evaluations.
...Statistics alone cannot produce understanding nor discern the various causes of model behaviour. They are an aid to thinking, but no replacement for it. Not only that, statistical measures can provide misleading guidance if understanding is lacking. Robin Dennis, 1986