200 likes | 389 Views
Quality control and homogenization of the COST benchmark dataset. Petr Štěpánek Pavel Zahradníček Czech Hydrometeorological Institute, regional office Brno. e-mail: p etr.stepanek @chmi.cz zahradnicek@chmi.cz. Processing before any data analysis. Software AnClim, ProClimDB.
E N D
Quality control and homogenization of the COST benchmark dataset Petr Štěpánek Pavel Zahradníček Czech Hydrometeorological Institute, regional office Brno e-mail: petr.stepanek@chmi.cz zahradnicek@chmi.cz
Processing before any data analysis Software AnClim, ProClimDB
Data Quality Control Finding Outliers Two main approaches: • Using limits derived from interquartile ranges(time series) • comparing values to values of neighbouring stations(spatial analysis)
Creating Reference Series • for monthly data • weighted/unweighted mean from neighbouring stations • Power of weight is 1 for temperature (1/d) and 3 for precipitation (1/d3) - IDW • criterions used for stations selection (or combination of it): • best correlated / nearest neighbours (correlations – from the first differenced series) • limit correlation, limit distance • limit difference in altitudes • neighbouring stations series should be standardized to test series AVG and / or STD/ Atlitude • Comparison with „expected“ value – (calculated as weighted mean from standardized neighbours values)
Example: Proposed list of stations used for creating reference series
„Outliers“ temperature sur1, network 1 • detected 12 „outliers“ • 10 errors for station 150 (5 in year 1909) • Mean difference between measured outliers and expect value is about 6°C
„Outliers“ precipitation sur1, network 1 • detected 8 „outliers“ • Mean difference between measured outliers and expect value is about 180 mm • Max difference is 313 mm (station 4307012, 8/1971)
Creating Reference Series • for monthly, • weighted/unweighted mean from neighbouring stations • criterions used for stations selection (or combination of it): • best correlated / nearest neighbours (correlations – from the first differenced series) • limit correlation, limit distance • limit difference in altitudes • neighbouring stations series should be standardized to test series AVG and / or STD (temperature - elevation, precipitation - variance) - missing data are not so big problem then
Relative homogeneity testing • Test series – 40 years • Longer series – divide to the more section with overlay 10 years • Tests: SNHT, Bivarite, t-test
Example of the detected breaks – temperature, sur1, network 1 - Detected 63 breaks Station no. 50, break 1928 Station no. 50, break 1975 Test and reference series Difference between test and reference series Test statistics
Example of the detected breaks – precipitation, sur1, network 1 - Detected 10 breaks Station no. 4309900, break 1909 Station no. 4311803, break 1991
Adjusting monthly data • using reference series based on distance • Power of weight is 0.5 for temperature and 1 for precipitation • adjustment: from differences/ratios 20 years before and after a change, monhtly • smoothing monthly adjustments (low-pass filter for adjacent values) Station no. 100, break 1983 Station no. 50, break 1928
Adjusting values – evaluation • After adjust must correlation increase – if not, the series is not adjust Temperature Precipitation
Absolute values of adjustment for temperature, surg1, network 1
Iterative homogeneity testing • several iteration of testing and results evaluation • several iterations of homogeneity testing and series adjusting (3 iterations should be sufficient) • question of homogeneity of reference series is thus solved: • possible inhomogeneities should be eliminated by using averages of several neighbouring stations • if this is not true: in next iteration neighbours should be already homogenized
Example – homogenized temperature series Station no. 50 Station no. 100
Example – homogenized precipitation series Station no. 4309900, break 1909 Station no. 4311803, break 1991