200 likes | 233 Views
This presentation explores the use of the Gerrity Skill Score for verifying modelled currents through a threshold exceedance approach. The study examines different current regimes, time series analysis, simple categorical metrics, and bias removal techniques. The results highlight the impact of threshold choices on skill scores and provide insights into the performance of model systems.
E N D
NPE - Cross-cutting research on verification techniques Presentation Session Code: SCI-PS153.03 Verifying modelled currents using a threshold exceedance approachDr Ray Mahdon An exploration of the Gerrity Skill Score
Verifying modelled currents using a threshold exceedance approach An exploration of the Gerrity Skill Score Table of Contents • Introduction • Data Source & Locations • Differing Current Regimes • Time Series, Continuous Statistics & Simple Cat. Metrics • Neighbourhood Methods • Bias Removal Questions • Multi-Cat. Metric – Gerrity Skill Score & Ocean Currents • Threshold Choices
Introduction • Surface currents forecasts important for commercial or defence “weather-windows” • e.g. Current speed below 1kt for 12 hours. • e.g. Does not exceed 1kt more than x times • Good for site-specific & threshold based analysis • Some questions we are trying to answer….. • Does the model capture extreme events or “weather-windows”? • In which locations or time of year do the models have the best performance; is there a significant difference in regime, time or area?
Donostia 62025 6201030 62083 Matxitxako 62024 Shelf Circulation 61430 Wind & Tidal Currents 61280 61281 Eddies General Ocean Circulation 61417 62085 61198 Slope Current MyOcean - Puertos Del Estado 26-56N,19W-5E Data Source & Locations
Data, Time Series & Continuous Statistics • Hourly frequency, Jan 2012 – Jun 2014 (30 months) • Collocated model & In-Situ moored observation surface currents • Continuous statistics are helpful to describe overall behaviour • e.g. q-q & histogram plots describe climatology • Timeseries can show seasonal patterns or significant events • Do not quantify the performance of a system when exceeding thresholds is of interest • We focus on surface currents • validation is relatively sparse for this parameter • → Categorical Metric Assessment • Simple 2x2 (binary) contingency table per chosen threshold
Neighbourhoods: 1x1, 3x3, 5x5,..,NxN Combinations spatial & temporal neighbourhoods trialled T+1 T+0 T-1 Neighbourhood Sampling Spatial Neighbourhoods Temporal Neighbourhoods Time averaging & shifting
CORR. REJ. CSI ETS F. ALARMS HITS MISSES Simple Categorical Metrics Improvements from temporal averaging hour-hour assessment not good as CSI → ETS says model mostly correct by chance! CSI & ETS require un-biased input data Over what period should a tidally dominated field be normalised:– 1 tidal cycle; spring-neap cycle; astronomical cycle? How to handle –ve currents?
Multi-Categorical Metric Method The Gerrity Skill Score
Gerrity* Skill Score (GSS) • Refinement of binary categorical methods • Does not depend on the forecast distribution • Rewards/penalises for rare(extreme)/disparate events • does not reward conservative forecasting • Large choice of threshold divisions • Good observation (sample) climatology required • Contingency table distribution leads to scoring matrix • Equitable (i.e., random & constant forecasts score a value of 0) GSS=0.38 × * Gerrity, J.P., (1992), Monthly Weather Review, 120, 2709-2712.
GSS - Threshold Choices 1 year rolling data per point, captured from 2 ½ years (365 × 24 = 8760 pts. – a good climatology!) Skewed Thresholds [0.10,0.25,0.45,0.7] Equal Frequency Distribution [20,40,60,80] percentiles Variability in skill versus thresholds, neighbourhood & time Clues in events from time series & data captured
Equal Frequency Distribution = [0.07 , 0.12 , 0.18 , 0.25] Daily Max/Min Current Speed - 62024 Skewed Thresholds = [0.10 , 0.25 , 0.45 , 0.70] GSS - Threshold Choices Cont. Mean error = -0.03 ms-1 RMSE = 0.11 ms-1
Equal Frequency Distribution = [0.05 , 0.1 , 0.15 , 0.2] Skewed Thresholds = [0.1 , 0.25 , 0.45 , 0.7] GSS - Threshold Choices Cont. Daily Max/Min Current Speed - 62024 Mean error = -0.03 ms-1 RMSE = 0.11 ms-1
OBS C<=0.25 OBS 0.25<C<=0.5 GSS=0.7 0.09 -1.00 FC C<=0.25 272 6 × -1.00 11.52 FC 0.25<C<=0.5 16 19 GSS - Threshold Choices Cont. 1 year’s data captured from 2 ½ years (365 × 24 = 8760 pts. – a good climatology ) Equal Frequency Distribution Regular Thresholds CHECK YOUR ANALYSIS Multi-Category test reduced to 2x2 in many cases! Equal Frequency Distribution = [0.07 , 0.12 , 0.18 , 0.25] Regular Thresholds = [0.25 , 0.5 , 0.75 , 1.0]
Other trials & results • Various spatial & temporal neighbourhoods • Report similar results • Preliminary results on other model systems show similar skill scores • Met Office FOAM-Shelf system • Maximum skill versus neighbourhood size • Other binning thresholds • No firm a priori binning remains a deficiency • Decoupling tidal cycle & residual current from raw signal to highlight skill partitioning • Doodson sea surface height decoupler trialled • Separation of potentially non-parallel (orthogonal) fields not addressed
Conclusions • Hourly frequency currents, Jan 2012 – Jun 2014 (30 months) • Threshold based assessment • Continuous statistics are helpful to describe overall behaviour • Timeseries can show seasonal patterns • Does not quantify spatial or temporally coordinated model/obs values • → Categorical Metric Assessment • Gerrity Skill Score – attractive attributes for rewards/penalties
Conclusions cont. • Choice of thresholds important • Model CAN CAPTURE EXTREME EVENTS – Threshold dependent ! • Equal Frequency Distribution appears to be the fairest a priori • Can be personalised to a particular regime or current distribution • Timeseries needed alongside Gerrity • Missing data can skew results • Similar locations/regimes appear to give broadly similar Gerrity Skill Scores • Winter months tend to show better skill – more extreme events • Multi-category methods on surface ocean current speed are relatively new, so expectation of skill level is unknown
Future Work • Now concept established, apply to forecast data • Include other regional models which have long-term observation record • Bootstrapping Gerrity Skill Score • Error estimation around each score • Return to bias removal issue • Scaled currents, rather than constant removal? • Assess wind speed with Gerrity Skill Score & compare to surface currents • Potentially highlights efficiency of wind speed transmission to surface currents in Ocean:Atmosphere boundary
Acknowledgement • Thank you to MyOcean for funding towards this work