1 / 36

Internal and External Variance

Detection of inhomogeneities in Daily climate records to Study Trends in Extreme Weather Detection of Breaks in Random Data, in Data Containing True Breaks, and in Real Data Ralf Lindau. Internal and External Variance.

vmart
Download Presentation

Internal and External Variance

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Detection of inhomogeneities in Daily climate records to Study Trends in Extreme WeatherDetection of Breaks in Random Data, in Data Containing True Breaks, and in Real DataRalf Lindau

  2. Internal and External Variance Consider the differences of one station compared to a neighbour or a reference. Breaks are defined by abrupt changes in the station-reference time series. Internal variance within the subperiods External variance between the means of different subperiods Criterion: Maximum external variance attained by a minimum number of breaks Daily Stew Meeting, Bonn – 14. June 2012

  3. Decomposition of Variance n total number of years N subperiods ni years within a subperiod The sum of external and internal variance is constant. Daily Stew Meeting, Bonn – 14. June 2012

  4. Three Questions How do random data behave? Needed as a stop criterion for the number of significant breaks. How do real breaks behave theoretically? How do real data behave? Daily Stew Meeting, Bonn – 14. June 2012

  5. with stddev = 1 Segment averages xi scatter randomly mean : 0 stddev: 1/ Because any deviation from zero can be seen as inaccuracy due to the limited number of members. Segment averages Daily Stew Meeting, Bonn – 14. June 2012

  6. c2-distribution Weighted measure for the variability of the subperiods‘ means The external variance is equal to the mean square sum of a random standard normal distributed variable. Daily Stew Meeting, Bonn – 14. June 2012

  7. From c2 to b distribution X ~ c2(a) and Y ~ c2(b)  X / (X+Y) ~b(a/2, b/2) If we normalize a chi2-distributed variable by the sum of itself and another chi2-distributed variable, the result will be b-distributed. n = 21 years k = 7 breaks data with b c2 Daily Stew Meeting, Bonn – 14. June 2012

  8. Incomplete Beta Function We are interested in the best solution, with the highest external variance. We need the exceeding probability for high varext External variance v is b-distributed and depends on n (years) and k (breaks): The exceeding probability P gives the best (maximum) solution for v Incomplete Beta Function Solvable for even k and odd n: Daily Stew Meeting, Bonn – 14. June 2012

  9. P(v) for different k Can we give a formula for in order to derive v(k)? 2 20 breaks Increasing the break number from k to k+1 has two consequences: The probability function changes. The number combinations increase. Daily Stew Meeting, Bonn – 14. June 2012

  10. k breaks k+1 breaks dv/dk sketch P(v) is a complicated function and hard to invert into v(P). Thus, dv is concluded from dP / slope. And the solution is: Daily Stew Meeting, Bonn – 14. June 2012

  11. Solution Daily Stew Meeting, Bonn – 14. June 2012

  12. Constance of Solution 101 years The solution for the exponent a is constant for different length of time series (21 and 101 years). 21 years Daily Stew Meeting, Bonn – 14. June 2012

  13. The extisting algorithm Prodige Original formulation of Caussinus and Mestre for the penalty term in Prodige Translation into terms used by us. Normalisation by k* = k / (n -1) Derivation to get the minimum In Prodige it is postulated that the relative gain of external variance is a constant for given n. Daily Stew Meeting, Bonn – 14. June 2012

  14. Exceeding probability 1/128 1/64 1/32 1/16 1/8 1/4 Our Results vs Prodige We know the function for the relative gain of external variance. Its uncertainty as given by isolines of exceeding probabilities for 2-i are characterised by constant distances. Prodige propose a constant of 2 ln(n) ≈ 9 Daily Stew Meeting, Bonn – 14. June 2012

  15. Wrong Direction n = 101 years n = 21 years Daily Stew Meeting, Bonn – 14. June 2012

  16. True Breaks Daily Stew Meeting, Bonn – 14. June 2012

  17. Only true for constant lengths True breaks with fixed distances behave identical to random data. For realistic random lengths the exponent is slightly increased. Sub-periods with constant lengths Sub-periods with random lengths data data theory theory Daily Stew Meeting, Bonn – 14. June 2012

  18. Distribution of Lengths The distribution of the sub-periods’ lengths as obtained by randomly inserted breaks is known. If necessary, it could be taken into account. Daily Stew Meeting, Bonn – 14. June 2012

  19. Break vs Scatter Regime The two governing parameters are: 1) The relative amount of break variance compared to the scatter variance 2) The quotient The latter defines how much faster the internal variance decreases in the “true break regime” compared to the “scatter regime” If the relative scatter is low (10%) the transition between the regimes is clearly visible at 15 from 19 breaks. Time series length Number of true breaks Daily Stew Meeting, Bonn – 14. June 2012

  20. Real Data 1050 Climate Stations exist in Germany. For each station the next eastward (to avoid identical pairs) neighbour between 10 km and 30 km is searched. 443 stations pairs remain. All Stations Neighbouring pairs Daily Stew Meeting, Bonn – 14. June 2012

  21. Data Focus This project deals with daily climate data. Findings about their extremes are in the focus. At least statements about the • distribution (moments) • percentiles • indices (number of wet days per month) should be possible. Daily Stew Meeting, Bonn – 14. June 2012

  22. Parameters Per se interesting parameters P Monthly means Temperature Precipitation, etc. Interesting for break detection: Problem parameters PP Expected physical problems Temperature at high sun shine duration Temperature at high pressure Temperature at high diurnal cycle Temperature during snow cover Temperature depending on general weather situation Temperature during rain Rain at high wind speed Expected technical problems Frequency of rainy days below 1 mm Tenth of precipitation report Difference between Tmean and (Tmax-Tmin) Breaks are more sensitive to problem parameters. Breaks in PP may help to find breaks in P project focus (more sensitive?) Distribution and extremes Standard deviation Skewness Kurtosis Maximum Minimum 90 percentile Daily Stew Meeting, Bonn – 14. June 2012

  23. Two Parameter Pairs 1a. Monthly mean temperature 1b. Monthly maximum temperature 2a. Monthly precipitation sum 2b. Frequency of rainy days below 1 mm Can the sensitive parameter help to find breaks in the mean? (Project focus) (Problem parameter) “Drizzle days” are often excluded from rainy days to calculate the interesting indices: • Monthly Rain Frequency • Consecutive Dry Days “Drizzle frequency” is not only a technical problem parameter, but also a per se interesting one. Daily Stew Meeting, Bonn – 14. June 2012

  24. Monthly Mean Temperature Temperature difference between Ellwangen-Rindelbach and Crailsheim-Alexandersreut shows 1 strong and 3 further significant breaks. The statistical signature confirms it: The first break contains much variance. 2, 3 and 4 are only slightly larger than the Mestre penalty. Daily Stew Meeting, Bonn – 14. June 2012

  25. Break Statistics All pairs Individual pair r = 0.937 Daily Stew Meeting, Bonn – 14. June 2012

  26. Monthly Maximum For the monthly temperature maximum, only the largest breaks are detectable, probably due to the reduced correlation. r = 0.865 Daily Stew Meeting, Bonn – 14. June 2012

  27. Additional Breaks? In maximum temperature there are less breaks. Are they nevertheless new compared to those in mean temperature? Enhance the penalty from about 12 (i.e. 2 ln(n)) to 60.) With n = 600, it means that 10% of the remaining internal variance has to be explained by each additional break. Otherwise the search is stopped. For such increased requirements 297 breaks are found in the mean and 67 in the maximum. Nearly all breaks in tmax exist also in tmean. The “stddev” of temporal distance is 1.75 years. Daily Stew Meeting, Bonn – 14. June 2012

  28. Answer: No Nearly no new break is found by the sensitive parameter Monthly Maximum Temperature. The lower correlation (0.865 vs. 0.937  doubled rms) hamper obviously the break finding capability of the sensitive parameter. However, the high correlation of break positions may the opposite direction become possible: To find break positions in the maximum temperature by considering the mean temperature. Daily Stew Meeting, Bonn – 14. June 2012

  29. “Drizzle Days” Monthly frequency of rainy days below 1mm. This parameter is highly inhomogeneous. Even for individual stations the break is evident. Daily Stew Meeting, Bonn – 14. June 2012

  30. Drizzle vs. Mean Precip. In the drizzle parameter more significant breaks are found (index 43.3 compared to 28.8), although the correlation is low, (0.339 compared to 0.855). Are the break positions again correlated? Daily Stew Meeting, Bonn – 14. June 2012

  31. Correlation of break positions Many new breaks are found. Only 12 breaks of the drizzle parameter are found at all somewhere the corresponding time series of mean precipitation, but mostly far away. In 93 time series pairs one or more breaks are found for drizzle, but even not a single in mean precipitation. Are these new breaks also included, but hidden in mean precipitation?  remember Daily Stew Meeting, Bonn – 14. June 2012

  32. Forced Breaks (1) Daily Stew Meeting, Bonn – 14. June 2012

  33. Forced Breaks (2) Also in average, the external variance decreases only by about 1%, if “drizzle breaks” are inserted into the time series of mean precipitation. 1% is the mean decrease of a random n=100 time series and it is beta-distributed. However, here n is equal to 600. Is the result then a bit better than random? Daily Stew Meeting, Bonn – 14. June 2012

  34. Simulated Data 1. Blind try of 3 breaks in a 21 years random time series 2. Blind try of 3 breaks in a 21 years constant time series with 6 true breaks. 3. Blind try 3 breaks in a 21 years time series with 6 true breaks plus random scatter. 3. Realistic mix 1. Purely random 2. Pure true breaks Daily Stew Meeting, Bonn – 14. June 2012

  35. Realistic Mixed Data Real data is expected to be similar to a realistic mix, rather than to random scatter. As it then includes also real breaks, the Null Hypothesis is not random scatter, but a realistic mix. Here the blindly found external variance is again b-distributed, but generally larger. How much is difficult to quantify in advance . It depends on the signal to noise ratio. Daily Stew Meeting, Bonn – 14. June 2012

  36. Conclusions • The analysis of random data shows that the external variance is b-distributed, which leads to a new formulation for the penalty term. • True breaks are also b-distributed. Their external variance increases faster by a factor of n/nkcompared to random scatter. • Are sensitive parameters helpful to find additional breaks? Monthly maximum temperature: Due to the reduced spatial correlation Tmax “finds” less breaks. Those identified are even better visible in Tmean. Drizzle parameter: Highly inhomogeneous  Many breaks found. But they do not coincide with breaks in mean precipitation. • Vice versa we expect that Tmean breaks are helpful to find breaks in Tmax. But the prove of significance will be difficult. Daily Stew Meeting, Bonn – 14. June 2012

More Related