320 likes | 445 Views
Detection of discontinuities using an approach based on regression models and application to benchmark temperature by Lucie Vincent Climate Research Branch, Science and Technologies Branch, Environment Canada Presentation to the COST meeting in Tarragona, Spain March 9-11, 2009. Outline.
E N D
Detection of discontinuities using an approach based on regression models and application to benchmark temperature by Lucie VincentClimate Research Branch, Science and Technologies Branch, Environment Canada Presentation to the COST meeting in Tarragona, SpainMarch 9-11, 2009
Outline • Methodology • identification of changepoints in annual mean temperature • adjustment of monthly and daily values • Testing methodology using simulated values -homogenous series, single step, random number of steps • Identification of biases in Canadian climate data • bias in relative humidity due to change in instruments • bias in radiosonde temperaturedue to introduction of correction factor • bias in daily minimum temperaturedue to a change in observing time • Application to benchmark temperature datasets -monthly mean minimum temperature at Groix • Discontinuities in precipitation due to joining station observations
Difference between candidate and reference Let y the candidate series and x the reference series Identification of changepoint in annual mean temperature Model 1: to identify an homogeneous series y = a1 + c1x1 + e Model 2: to identify a trend y = a2 + b2i + c2x1 + e i = 1, …, n Model 3: to identify a step y = a3 + b3I + c3x1 + e i = 1, …, n I = 0 for i = 5, …, p-1 I = 1 for i = p, …, n-5 Model 4: to identify a step w trends bef & aft y = a4 + b4iI1 + a5I2 + b5iI2 + c4x1 + e i = 1, …, n I1 = 1 and I2 = 0 for i = 5,…, p-1 I1 = 0 and I2 = 1 for i = p, …, n-5
Statistical tests Durbin-Watson test:to determine if candidate series is homogeneous (autocorrelation)ei = ρei-1 + μi H0: ρ = 0 versus Ha: ρ > 0 D = Σ(ei – ei-1)2 / Σei2if D > du => H0 ; if D < dl => Haif dl ≤ D ≤ du test is inconclusive F test:to determine if the introduction of additional variables improve the fitModel 1 and Model 3 are comparedH0: b3 = 0 versus Ha: b3 ≠ 0 F* = [(SSE1–SSE3)/(df1–f3)] / SSE3/df3 if F* > F(1-α; 1, n-3) reject H0 If there is a significant changepoint, divide series into two segments and re-test each segment separately
Model 1 Example Annual mean of daily maximum temperature of Pointe-au-Père / Mont-Joli 1915-1998 Model 3 Trend of 1.8°C / 84 years
Example Annual mean of daily maximum temperature Difference between candidate and reference Trend of 1.8°C / 84 years Adjusted series Step of 1.1°C in 1943 Step of 1.1°C in 1943 Trend of 0.1°C / 84 years
Remarks • First changepoint is not always associated to a “real” change • - use an hierarchical procedure to find all changepoints until • . convergence of the position of each changepoint • . each segment is homogeneous • . each segment is too short • Reference series can contain inhomogeneities • a step in the neighbour series can affect the candidate series • a network bias is difficult to detect • preferable to confirm the changepoint with metadata
Adjustment of monthly temperature Difference between candidate and reference January Application to the 12 monthly series for changepoint p identified in annual mean temperature July Monthly Adjustments (ai i=1,12) If ai show seasonality => apply ai December If ai randomly distributed => apply annual adjustment
Annual mean of daily maximum temperature of Pointe-au-Père / Mont-Joli Before 1943 Example Step of 1.1°C in 1943 Instruments on the roof Monthly adjustments After 1943
Adjustment of daily temperature Linear interpolation between midmonth target values objectively chosen so that the average of the daily adjustments over a given month is equal to the monthly adjustment: T = A-1M where M are monthly adjustments and A = Regression model 3 applied to individual daily series for changepoint p: y = a3 + b3I + c3x1 + e i = 1, …, n I = 0 for i = 5, …, p-1 I = 1 for i = p, …, n-5
Simulation of annual mean temperatures • Homogeneous Series (series with no steps) • Random numbers ~ N(0,1) with AR(1)=0.1 • 1000 homogeneous series of 100 values (years) • Series with one step • Step of magnitude 0.25, 0.50, 0.75, …, 2.00 σ • Position 5, 10, 15, 20, 35, 50 • 48 000 series with a single step • Series with a random number of steps • Step of magnitude ∂ = 0.5 to 2.0 σ; ∂ ~ N(0,1) • Position ∆t = exp(0.05), ∆t ≥ 10 • 25 000 series with a random number of steps (0 to 7 steps) • Reference series • Reference series cross-correlated with candidate series with correlation factor ~ 0.8and re-standardized
SNHT TPR Identification of homogeneous series MLR WSR Position and magnitude of steps falsely detected when the procedure is applied to 1000 homogeneous series
SNHT TPR Identification of a single step MLR WSR Percentage of steps identified when one step is introduced in the candidate series
Identification of a random number of steps Percentage of steps detected versus number of steps introduced
Bias in relative humidity due to a change instruments75 Canadian climate stationsExample: Kuujjuaq Québec, 1955-2004, dewcel introduced in 1978 Winter Spring Step -8.0% Step -7.1% Summer Fall Original values Adjusted values Many missing very cold values before the introduction of the dewcel Step -2.8% Step -3.3%
Annual Winter Summer Bias in radiosonde temperature due to the introduction of a radiation correction25 Canadian stations Temperature anomaliesmean for Canada5 pressure levelsobservations at 12 UTC During 1985-1995: - semi-automated system implemented - switch to Vaisala instruments - introduction of radiation correction
Bias in daily minimum temperature due to a change in observing time120 Canadian climate stations • On July 1, 1961, the climatological day was redefined to end at 06 UTC • Prior to that, it ended at 12 UTC for max temp and 00 UTC for min temp • The redefinition of the climatological day has created a cold bias in daily min temp Decreasing step identified in 1961; a filled triangle indicate a significant step at 5% level
Bias in daily minimum temperature due to a change in observing time 12 UTC 00 UTC 06 UTC 06 UTC Hourly temperatures at Kapuskasing from July 18 to 23, 2007
Temperature surrogate group 1 • Calculate monthly anomalies • departures from the 1961-1990 reference period • Produce the long series • sequence of monthly values for 100 years (1200 values) • Produce a reference series for each station • average of the remaining stations • Search for all potential changepoints • the first changepoint is not necessary a real one • When all changepoints identified, determine magnitude of each step
Temperature surrogate group 1 Position and magnitude (°C) of each step identified by the regression approach
Temperature surrogate group 1 Monthly anomalies at Groix Adjusted monthly anomalies at Groix
Discontinuities in precipitation due to joining station observations
Does joining precipitation records create any artificial steps?
Methodology Let Ti and Ni be the monthly total rain (or snow) at the tested site and neighbour respectively for year i: Ratios: . if Ti > 0 and Ni > 0 then qi =Ti/Ni . if Ti = 0 and Ni = 0 then qi = 1 . if Ti or Ni = 0 (or missing) then qi = missing Outliers: . qi < q0.25 – (3*(q0.75-q0.25)) q0.25 and q0.75 are 25th and 75th percentiles . qi > q0.75 + (3*(q0.75-q0.25)) outliers qi = missing Standardized ratios: . zi = (qi – Q) / sq Q is average of qi, sq is standard deviation Apply t-test on {zi} to determine if the difference in the means before and after the joining date is different from zero at the significance level 5%: . 30 years before and after joining date . minimum of 5 years on each side of the joining date Adjustments: . Ai = qai / qbi qbi & qai are ratio means before & after joining date
ExampleDigby Airport and Bear River joined in 1957 Monthly, annual and long series (LS) adjustments by each neighbour. The number in bold indicates that the adjustment correspond to a step significant at the 5% level.
ExampleDigby Airport and Bear River joined in 1957 Rain Snow Monthly, annual and long series (LS) adjustments obtained using the neighbours (purple) and overlapping data (green)
Comparing adjustments from neighbours and overlap Snow Rain Box plots of the differences between the adjustments from neighbours and adjustments from overlapping data obtained from the 60 stations. The circle, box and whiskers indicate the median, 10th and 90th percentiles, and minimum and maximum values.
Thank you! Merci! Gracias!