260 likes | 267 Views
Benchmark database based on surrogate climate records. Victor Venema. Goals of COST-HOME working group 1. Literature survey Benchmark dataset Known inhomogeneities Test the homogenisation algorithms (HA). Benchmark dataset. Real (inhomogeneous) climate records Most realistic case
E N D
Benchmark databasebased on surrogate climate records Victor Venema
Goals of COST-HOME working group 1 • Literature survey • Benchmark dataset • Known inhomogeneities • Test the homogenisation algorithms (HA)
Benchmark dataset • Real (inhomogeneous) climate records • Most realistic case • Investigate if various HA find the same breaks • Good meta-data • Synthetic data • For example, Gaussian white noise • Insert know inhomogeneities • Test performance • Surrogate data • Empirical distribution and correlations • Insert know inhomogeneities • Compare to synthetic data: test of assumptions
Creation benchmark – Outline talk • Start with homogeneous data • Multiple surrogate and synthetic realisations • Mask surrogate records • Add global trend • Insert inhomogeneities in station time series • Published on the web • Homogenize by COST participants and third parties • Analyse the results and publish
1) Start with homogeneous data • Monthly mean temperature and precip (France) • Later also daily data • Later maybe other variables • Homogeneous • No missing data • Detrended • 20 to 30 years is enough for good statistics • Longer surrogates are based on multiple copies • Larger scale correlations are small • Distribution well defined with 30a data • Generated networks are: 50, 100 and 200 a long
2) Multiple surrogate realisations • Multiple surrogate realisations • Temporal correlations • Station cross-correlations • Empirical distribution function • Annual cycle removed before, added at the end • Number of stations between 5 and 20 • Cross correlation varies as much as possible • Show plot temporal structure of surrogates • Show plot cross correlations
3) Mask surrogate records • Beginning of records jagged (rough) • Linear increase in number of stations • Last station after 25% of full time • End of record all stations are measuring • Influence of jagged edge on detection and correction • But trend is also increasing in time (i.e. different)! • Is this a problem?
4) Add global trend • NASA GISS GISS Surface Temperature Analysis (GISTEMP) by J. Hansen • Global mean surface temperature • Last year of any surrogate network is 1999
5) Insert inhomogeneities in stations • Random breaks (implemented) • Frequency of breaks 1/20a, 1/40a • Size constants for temperature: 0.25, 0.5, 1.0 °C • Size factors for rain: 0.8, 0.9, 1.1, 1.2 • Simultaneous breaks • Frequency of breaks 1/50a • In 10 to 50 % of network
5) Insert inhomogeneities in stations • Outliers • Frequency: 1 – 3 % • Size: 99 and 99.9 percentiles • Local trends (only temperature) • Linear increase or decrease in one station • Duration: 30, 60a • Maximum size: 0.2 to 1.5 °C • Frequency: once in 10 % of the stations
6) Published on the web • Inhomogeneous data will be published on the COST-HOME homepage • Everyone is welcome to download and homogenize the data
7) Homogenize by participants • Return homogenised data • Should be in COST-HOME file format (next slide) • Return break detections • BREAK • OUTLI • BEGTR • ENDTR • Multiple breaks at one data possible
7) Homogenize by participants • COST-HOME file format: http://www.meteo.uni-bonn.de/ venema/themes/homogenisation/costhome_fileformat.pdf • For benchmark & COST homogenisation software • One data and one quality-flag file per station • Filename: variable, resolution, quality, station • ASCII network-file with station names • ASCII break-file with dates and station names
8) Analyse the results • Detailed analysis will be performed in the working groups • Detection • Correction • Daily data homogenisation • Synthetic and surrogate data • RMS Error • No. breaks detected (function of size) • Application: reduction in the scatter in the trends • Performance difference between synthetic (Gaussian, white noise) and surrogate data
Work in progress • Monthly precipitation • Implement some inhomogeneity types • Daily data: other inhomogeneities • Synthetic data (Gaussian white noise) • More input data! • Agree on the details of the benchmark • Next meeting? • Set deadline for the availability benchmark • Deadline for the return of the homogeneous data
Questions • Ideas for a better benchmark • For example, for other inhomogeneities, constants • Types of inhomogeneities for daily data • Automatic processing • In the order of 100 networks
7) Homogenize by participants • COST-HOME file format: http://www.meteo.uni-bonn.de/ venema/themes/homogenisation/costhome_fileformat.pdf • For benchmark & COST homogenisation software • Regular ASCII matrix (columns) • One data and one quality-flag file per station • Yearly, daily, subdaily data: columns for time, one for data • Monthly data: year column, 12 columns for data • Filename: variable, resolution, quality, station • ASCII network-file with station names • ASCII break-file with dates and station names