240 likes | 385 Views
Good afternoon ! नमस्कार Guten Tag! Buenos dias ! до́брый день ! Qwertyuiop asdfghjkl ! Bom dia ! Bonjour !. Good afternoon ! नमस्कार Guten Tag! Buenos dias ! до́брый день ! Qwertzuiop asdfghjkl ! Bom dia ! Bonjour !. Please , verify !. Verification of continuous variables
E N D
Goodafternoon! नमस्कार Guten Tag! Buenos dias! до́брыйдень! Qwertyuiopasdfghjkl! Bomdia ! Bonjour! Goodafternoon! नमस्कार GutenTag! Buenos dias! до́брыйдень! Qwertzuiopasdfghjkl! Bomdia ! Bonjour! Please, verify !
Verificationofcontinuous variables Martin Göber Deutscher Wetterdienst (DWD) Hans-Ertel-CentreforWeather Research (HErZ) Acknowledgements: Thanksto Barb Brown and Barbara Casatti!
Types of forecasts, observations • Continuous • Temperature • Rainfall amount • 500 hPageopotential height • Categorical • Dichotomous • Rain vs. no rain • Thresholding of continuous variables • Strong winds vs. no strong wind • Often formulated as Yes/No • Multi-category • Cloud amount category • Precipitation type YY NY YN NN Except when it is meaningful, forecasts should not be degraded to categorical, due to the resulting loss of information.
observation o forecast f The jointprobabilitydistribution p(f,o) (961 classes)*(100 stations)*(2 days)*(5 kindsofforecasts) = 1 Million numberstoanalyse „curseofdimensionality“ Boil down to a fewnumbers (little ?) lossofinformation Joint frequencydistribution, roadsurfacetemperature, winter 2011
Continuousverification Normally distributed ERRORS 5
Normallydistributederrors Iferrorsarenormallydistributed, then 2 parametersareenough, toanswer all questionsapproximately Ifsystematicerror („bias“) small, then Root(MSE )= Standard error
Bias • mean error ME, ideally=0 • “systemtic error” “on average, something goes wrong into one direction”, e.g. model physics wrongly tuned, missing processes, wrong interpretation of guidances • tells us nothing about the pairwise match of forecasts and observations • large in the past, rather small nowadays on average, but maybe large e.g. for certain weather types • misleading for multi-modal error distributions take Mean Absolute Error MAE
ME and MAE Q: If the ME is similar to the MAE, performing the bias correction is safe, if MAE >> ME performing the bias correction is dangerous: why ? A: if MAE >>ME it means that positive and negative errors cancel out in the bias evaluation …
RMSE • mean squared error or root mean square error RMSE • accuracy measure: determines the distance between individual forecasts and observations, • Ideally RMSE = 0 • “It might be useful on average, but when its really important its not good ! ????” NOT necessarily, e.g: • 1 five degree error is penalised like 25 one degree error • 1 ten degree error is penalised like 100 one degree errors
Interpretation of RMSE Iferrorsnormallydistributed, then
Decompositionofthe MSE Bias can be subtracted ! Consequence: smooth forecastsverifybetter
Correlation coefficient • Measures the level of “association” between the forecasts and observations • Related to the “phase error” of the harmonic decomposition of the forecast • Is familiar and relatively easy to interpret • Has a nonparametric analog based on ranks
Correlation coefficient What is wrong with the correlation coefficient as a measure of performance? Doesn’t take into account biases and amplitude – can inflate performance estimate More appropriate as a measure of “potential” performance
Comparative verification • Generic skill score definition: Where M is the verification measure for the forecasts, Mref is the measure for the reference forecasts, and Mperfis the measure for perfect forecasts • Measures percent improvement of the forecast over the reference • Positively oriented (larger is better) • Choice of the standard matters (a lot!)
Comparative verification Skill scores • A skill score is a measure of relative performance • Ex: How much more accurate are my temperature predictions than climatology? How much more accurate are they than the model’s temperature predictions? • Provides a comparison to a standard • Standard of comparison can be • Chance (easy?) • Long-term climatology (more difficult) • Sample climatology (difficult) • Competitor model / forecast (most difficult) • Persistence (hard or easy)
Skillscores General skill score definition: ReductionoferrorVariance (also oftencalled „skill score“ SS)
Higher skill Lower accuracy Accuracyvsskill 24h mean wind forecast Reducedvariance MSE(Persistence) MSE(forecast)
„hits“ andRMSE “hits” = percentage of “acceptable” forecast errors (e.g. ICAO - dd:+-30°, ff:+-5kt bis 25kt, etc.) “hits” “hits” “hits” in % Forecast error in K
„hits“ andRMSE “hits” “hits” “hits” in % Reductionof Error “mass“: Through reductionof large errors Forecast error in K
Long termtrends Every 10 years one day better “Hit rate” (errors +- 2k) in % Maximum temperature Potsdam
q0.75 Linear Error in Probability Space • LEPS is an MAE evaluated by using the cumulative frequencies of the observation • Errors in the tail of the distribution are penalized less than errors in the centre of the distribution
Summary • Verification is a high dimensional problem can be boiled down to a lower dimensional under certain assumptions or interests • If forecast errors are normally distributed, continuous verification allows usage of only a few numbers like bias and RMSE • Accuracy and skill are different things