220 likes | 394 Views
Break Position Errors in Climate Records Ralf Lindau & Victor Venema University of Bonn Germany. Internal and External Variance. Consider the differences of one station compared to a neighbor reference. Breaks are defined by abrupt changes in the station-reference time series.
E N D
Break Position Errors in Climate RecordsRalf Lindau & Victor VenemaUniversity of BonnGermany
Internal and External Variance Consider the differences of one station compared to a neighbor reference. Breaks are defined by abrupt changes in the station-reference time series. Internal variance within the subperiods External variance between the means of different subperiods Break criterion: Maximum external variance 12th International Meeting on Statistical Climatology, Jeju, Korea – 24. June 2013
Decomposition of Variance n total number of years N subperiods ni years within a subperiod The sum of external and internal variance is constant. 12th International Meeting on Statistical Climatology, Jeju, Korea – 24. June 2013
Position errors Two segments of lengths n1 and n2with means x1 and x2. A subsegment of length m with mean x0 is erroneouslyexchanged from segment 2 to segment 1. x1 is strongly reduced, x2 differs slightly. x1 and x2 converge. This reduces the external variance, and the wrong segmentation is rejected. 12th International Meeting on Statistical Climatology, Jeju, Korea – 24. June 2013
Change of external variance with The change of external variance Dv is only a function of the means and lengths of the two segments and the exchanged subsegment . 12th International Meeting on Statistical Climatology, Jeju, Korea – 24. June 2013
Express x0 by x2 plus scatter d depends on the internal variance s2 and the length m, because it is a mean over m random numbers. The mean of the exchanged subsegment x0 is equal to x2, the segment mean where it stem from, plus a random scatter variable d. with 12th International Meeting on Statistical Climatology, Jeju, Korea – 24. June 2013
Quadratic function for Dv Replace x0 by d and normalize by the square of the jump height d. The change of the normalized external variance v*, which is the decision criterion for break detection, is a quadratic function of a random variable e, which depends on the signal to noise ratio and the length of the exchanged segment . 12th International Meeting on Statistical Climatology, Jeju, Korea – 24. June 2013
Zero points If the parabola becomes positive, the shift of the break position by m items leads to increased external variance so that this solution is preferred by mistake. Zero points at: 12th International Meeting on Statistical Climatology, Jeju, Korea – 24. June 2013
Simulateddata 10,000 random time series of length 100. Internal s = 1 Jump height = 2 Data confirm the existence of different parabolae for different m. But data coverage only for scatter near zero, never reaching the negative solution. } SNR = 1 (n Dv) / 4 m=1 m=2 m=3 d 12th International Meeting on Statistical Climatology, Jeju, Korea – 24. June 2013
The negative solution Typical situation: SNR extreme low. A drastically disturbed measurement near the break. Its exchange leads to x1’ < x2 and x2’ > x1. The two means diverge so that the external variance grows. X2’ X2 X1 X1’ 12th International Meeting on Statistical Climatology, Jeju, Korea – 24. June 2013
The positive solution A subsegment adjacent to the true break is randomly lifted by more than half of the jump height. Including it to the neighboring segment will reduce the internal variance. An erroneous break position is concluded. Criterion: Maximum hatched area 12th International Meeting on Statistical Climatology, Jeju, Korea – 24. June 2013
Brownian motion with drift Mathematical formulation of the criterion: d s Drift = - SNR 12th International Meeting on Statistical Climatology, Jeju, Korea – 24. June 2013
Theoretical retrace Parabola equation linear approximation around the zero point inserting known slope and (positive) zero point replacingf1 + f2 by 2m multiplying by signal-to-noise ratio Brownian motion with drift 12th International Meeting on Statistical Climatology, Jeju, Korea – 24. June 2013
Distribution of the time of the maximum of a Brownian motion with drift Strictly valid only for continuousprocesses. Buffet , 2003, J Appl Math Stoch Anal SNR = 0.5 _ _ _ _ _ Buffet, 2003 0 0 0 Numerical simulation of a discrete Brownian motion with drift. + + + Complete break search simulation SNR = 2 SNR = 1 12th International Meeting on Statistical Climatology, Jeju, Korea – 24. June 2013
Two more problems Hit rate is not accurately reproduced Break errors are a two-sided symmetric process. Both, too early and too late breaks are possible. Buffet , 2003 12th International Meeting on Statistical Climatology, Jeju, Korea – 24. June 2013
Hit rate The hit rate h can be estimated for all drifts d by: with true + + + estimated 12th International Meeting on Statistical Climatology, Jeju, Korea – 24. June 2013
Two-sided processes Deviations are caused by random scatter independently on both sides. The hit rate h is reduced to h2. One-sided deviations have the probability: with + without competitor For two-sided deviations the probability is halved, if a competitor occurs on the other side: All other probabilities are reduced by 12th International Meeting on Statistical Climatology, Jeju, Korea – 24. June 2013
Practical application The hit rate drops from from 95% for SNR = 2 to 29% for SNR = 0.5 SNR > 1 becoming quickly very exact. SNR < 1 becoming quickly very inexact. SNR = 2 SNR = 1 SNR = 0.5 true + + + estimated 12th International Meeting on Statistical Climatology, Jeju, Korea – 24. June 2013
Conclusions • Break position errors can be described by the distribution of the time of maximum of a Brownian motion with drift. • The drift parameter is equal to the signal to noise ratio, as given by the half jump height between and the internal standard deviation within homogeneous subperiods. 12th International Meeting on Statistical Climatology, Jeju, Korea – 24. June 2013
Hit rate simulation The hit rate is the probability that the initial value is never exceeded. For realistic drift sizes the value converges after a few steps. 12th International Meeting on Statistical Climatology, Jeju, Korea – 24. June 2013
Preliminary maximum Instead of multiplying with h < 1, we can alternatively stop the summation earlier. k = 2 works well. pik is defined as the probability that the kth member of a Brownian motion is the preliminary maximum after i steps. The probability to be also the absolute maximum is lower by a factor of h. Thus: 12th International Meeting on Statistical Climatology, Jeju, Korea – 24. June 2013
Hit rate estimate Define the drift-dependent exceeding probability: The preliminary maxima after 1 and after 2 steps are known. 12th International Meeting on Statistical Climatology, Jeju, Korea – 24. June 2013