1 / 39

What is the correct number of break points hidden in a climate record?

What is the correct number of break points hidden in a climate record?. Ralf Lindau Victor Venema Bonn University. Defining breaks. Consider the differences of one station compared to a reference. (Kriged ensemble of surrounding stations)

hyun-chong
Download Presentation

What is the correct number of break points hidden in a climate record?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. What is the correct numberof break pointshidden in a climate record? Ralf Lindau Victor Venema Bonn University 7. Homogenization Seminar Budapest – 24. – 27. October 2011

  2. Defining breaks Consider the differences of one station compared to a reference. (Kriged ensemble of surrounding stations) Breaks are defined by abrupt changes in the station-reference time series. Internal variance within the subperiods External variance between the means of different subperiods Criterion: Maximum external variance attained by a minimum number of breaks 7. Homogenization Seminar Budapest – 24. – 27. October 2011

  3. Decomposition of Variance n total number of years N subperiods ni years within a subperiod The sum of external and internal variance is constant. 7. Homogenization Seminar Budapest – 24. – 27. October 2011

  4. Two questions Titel of this talk asks: How many breaks? Where are they situated? Testing of all permutions is not feasible. The best solution for a fixed number of breaks can be found by Dynamical Programming 7. Homogenization Seminar Budapest – 24. – 27. October 2011

  5. Dynamical Programming (1) Find the optimum positions for a fixed number of breaks. Consider not only the complete time series, but all possible truncated variants. 7. Homogenization Seminar Budapest – 24. – 27. October 2011

  6. Dynamical Programming (2) Find the optimum positions for a fixed number of breaks. Consider not only the complete time series, but all possible truncated variants. Find the first break by simply testing all permutions. 7. Homogenization Seminar Budapest – 24. – 27. October 2011

  7. Dynamical Programming (3) Find the optimum positions for a fixed number of breaks. Consider not only the complete time series, but all possible truncated variants. Find the first break by simply testing all permutions. Fill up all truncated variants. The internal variance consists now of two parts: that of the truncated variant plus that of the rest. 7. Homogenization Seminar Budapest – 24. – 27. October 2011

  8. Dynamical Programming (4) Find the optimum positions for a fixed number of breaks. Consider not only the complete time series, but all possible truncated variants. Find the first break by simply testing all permutions. Fill up all truncated variants. The internal variance consists of two parts: that of the truncated variant plus that of the rest. Search the minimum out of n. 7. Homogenization Seminar Budapest – 24. – 27. October 2011

  9. Dynamical Programming (5) The 2-breaks optimum for the full length is found. To begin the search for 3 breaks, we need as before the previous solutions for all, also shorterlength. This needs n2/2 searches, which is for larger numbers of breaks k much less than all permutations (n over k). 7. Homogenization Seminar Budapest – 24. – 27. October 2011

  10. Position & Number Solved: The optimum positions for a fixed number of breaks are known by Dynamical Programming. Left: Find the optimum number of breaks. The external variance increase in any case with increasing number of breaks. Use as benchmark the behaviour of a random time series. 7. Homogenization Seminar Budapest – 24. – 27. October 2011

  11. with stddev = 1 Segment averages xi scatter randomly mean : 0 stddev: 1/ Because any deviation from zero can be seen as inaccuracy due to the limited number of members. Segment averages 7. Homogenization Seminar Budapest – 24. – 27. October 2011

  12. External Variance Weighted measure for the variability of the subperiods‘ means The external variance is equal to the mean square sum of a random normal distributed variable. 7. Homogenization Seminar Budapest – 24. – 27. October 2011

  13. c2-distribution n: Length of time series (Number of years) k: Number of breaks N = k+1: Number of subsegments [ ]: Mean of several break position permutations [varext] = (N-1)/n = k/n In average, the external variance increases linearly with k. However, we consider the best member as found by DP. varext~ cN2The external variance is chi2-distributed. Def.: Take N values out of N(0,1), square and add them up. By repeating a cN2-distribution is obtained. 7. Homogenization Seminar Budapest – 24. – 27. October 2011

  14. 21-years random data (1) 1000 random time series are created. Only 21-years long, so that explicite tests of all permutations are possible. The mean increases linearly. However, the maximum is relevant (the best solution as found by DP) Can we describe this function? First guess: 7. Homogenization Seminar Budapest – 24. – 27. October 2011

  15. 21-years random data (2) Above, we expected the data for a fixed number of breaks being chi2-distributed. 7. Homogenization Seminar Budapest – 24. – 27. October 2011

  16. From c2 to b distribution The random data does not fit exactly to a chi2-distribution. The reason is that chi2 has no upper bounds. But varext cannot exceed 1. A kind of confined chi2 is the beta distribution. n = 21 years k = 7 breaks data c2 7. Homogenization Seminar Budapest – 24. – 27. October 2011

  17. From c2 to b distribution X ~ c2(a) and Y ~ c2(b)  X / (X+Y) ~b(a/2, b/2) If we normalize a chi2-distributed variable by the sum of itself and another chi2-distributed variable, the result will be b-distributed. The b-distribution fits well to the data and is the theoretical distribution for the external variance of all break position permutations. n = 21 years k = 7 breaks data b c2 7. Homogenization Seminar Budapest – 24. – 27. October 2011

  18. From c2 to b distribution 17 b 7 15 11 with We are interested in the best solution, with the highest external variance, as provided by DP. We need the exceeding probability for high varext c2 7. Homogenization Seminar Budapest – 24. – 27. October 2011

  19. Incomplete Beta Function External variance v is b-distributed and depends on n (years) and k (breaks): The exceeding probability P gives the best (maximum) solution for v Incomplete Beta Function Solvable for even k and odd n: 7. Homogenization Seminar Budapest – 24. – 27. October 2011

  20. Example 21 years, 4 breaks k = 4  i = 2 n = 21  m = 9 7. Homogenization Seminar Budapest – 24. – 27. October 2011

  21. Theory and Data Theory (Curve): Random data (hached) fits well. 7. Homogenization Seminar Budapest – 24. – 27. October 2011

  22. Nominal Combination Number For n = 21 and k = 4 there are break combinations. If they all were independent we could read the maximum external variance at (4845)-1 ≈ 0.0002 being 0.7350 However, we suspect that the break combinations are not independent. And we know the correct value of varext. 7. Homogenization Seminar Budapest – 24. – 27. October 2011

  23. Effective and Nominal Remember: varext= 0.5876 for k=4 The reverse reading leads to an 23 times higher exceeding probability. This shows that the break permutations are strongly dependent and the effective number of combinations is smaller than the nominal. However, the theorectical function is correct. 7. Homogenization Seminar Budapest – 24. – 27. October 2011

  24. From 21 years to 101 years As we now know the theoretical function, we quit the explicit check by random data. And skip from unrealistic short time series (n=21) to more realistic (n=101). Again the numerical values of the external variance is known and we can conclude the effective combination numbers. Can we give a formula for in order to derive v(k) ? 2 20 breaks 7. Homogenization Seminar Budapest – 24. – 27. October 2011

  25. k breaks k+1 breaks dv/dk sketch Increasing the break number from k to k+1 has two consequences: • The probability function changes. • The number combinations increase. Both increase the external variance. 7. Homogenization Seminar Budapest – 24. – 27. October 2011

  26. k breaks k+1 breaks Using the Slope P(v) is a complicated function and hard to invert into v(P). Thus, dv is concluded from dP / slope. We just derived P(v) by integrating p(v), so that the slope p(v) is known. 7. Homogenization Seminar Budapest – 24. – 27. October 2011

  27. The Slope Insert the known functions: The last summand dominates: Reduce and replace m and i: 7. Homogenization Seminar Budapest – 24. – 27. October 2011

  28. Distance between the Curves The last summand dominates: Reduce and replace m and i: 7. Homogenization Seminar Budapest – 24. – 27. October 2011

  29. (n-1-k) / k Effective combination growth Nominal Growth Rate -2 ln ( (n-1- k) / k) Ln: Logarithmic sketch minus: Number of combinations is reciprocal to Exceeding Probability 2: Exceeding Probability only known for even break numbers However, break combinations are not independent and we know the effective number of combinations 7. Homogenization Seminar Budapest – 24. – 27. October 2011

  30. Ratio: nominal / effective The ratio of nominal / effective is approximatly constant with c = 0.3 7. Homogenization Seminar Budapest – 24. – 27. October 2011

  31. Approximative Solution Normalisation for small k* for n = 100 7. Homogenization Seminar Budapest – 24. – 27. October 2011

  32. Exact Solution 7. Homogenization Seminar Budapest – 24. – 27. October 2011

  33. Constance of Solution 101 years The solution for the exponent a is constant for different length of time series (21 and 101 years). 21 years 7. Homogenization Seminar Budapest – 24. – 27. October 2011

  34. Conclusion We have found a general mathematical formulation how the external variance of a random time series is increasing when more and more breaks as given by Dynamical Programming are inserted. This can be used as benchmark to define the optimum number of breaks. 7. Homogenization Seminar Budapest – 24. – 27. October 2011

  35. Integrated result How does the found function look like after integration? Crosses: Test data Line: Theory Error bars: 90 and 95 percentile 7. Homogenization Seminar Budapest – 24. – 27. October 2011

  36. Appendix (1) Consider the individual summands of the sum as defined in The factor of change f between a certain summand and its successor is: where li runs from zero to i. The ratio of consecutive binomial coefficients can be replaced and it follows: m and i can be replaced by n and k: inserting k instead of lk is a lower limit for f because (n-1-lk)/lk, the rate of change of the binomial coefficients, is decreasing monotonously with k: normalised by 1/(n-1): 7. Homogenization Seminar Budapest – 24. – 27. October 2011

  37. Appendix (2) the approximate solution is known with 1-v = (1- k*)4 We can conclude that each element of the sum given above is by a factor f larger than the prior element. For small k* the factor f is greater than about 4 and grows to infinity for large k*. Consequently, we can approximate the sum by its last summand accordingto: 7. Homogenization Seminar Budapest – 24. – 27. October 2011

  38. Application (1) Insert in each of 1000 random time series 5 breaks of variance 1. The change of external variance for low break numbers (1, 2, 3 up to about 10) increase. Lying above the theoretical function for random time series without any break (arrow). Variances of break numbers higher than 5 increase, because the inserted 5 breaks are not always the biggest. 7. Homogenization Seminar Budapest – 24. – 27. October 2011

  39. Application (2) Stop break search, when the growth rate for the external variance drops firstly below the theoretical one for zero breaks. 1 Example of 1000 test time series Crosses: Observations Thin line: Inserted breaks Fat line: Detected breaks In average over 1000 samples: Added variance: 86% (theoretically 5/6) Remaining after correction: 27% Average detected break number 5.48 7. Homogenization Seminar Budapest – 24. – 27. October 2011

More Related