270 likes | 459 Views
Extremal cluster characteristics of a regime switching model, with hydrological applications. Péter Elek, Krisztina Vasas and András Zempléni E ötvös Loránd University, Budapest elekpeti@cs.elte.hu 4th Conference on Extreme Value Analysis Gothenburg, 2005. Contents.
E N D
Extremal cluster characteristics of a regime switching model, with hydrological applications Péter Elek, Krisztina Vasas and András Zempléni Eötvös Loránd University, Budapest elekpeti@cs.elte.hu 4th Conference on Extreme Value Analysis Gothenburg, 2005
Contents • Outline of EVT for stationary series • extremal index • limiting cluster size distribution(e.g. distribution of flood length) • distribution of aggregate excesses (e.g. distribution of flood volume) • Two models: • a light-tailed conditionally heteroscedastic model • a regime switching autoregressive model • Extremal behaviour of the regime switching model • Application to the study of flood dynamics
Extremal index • ConditionsD(un) or (un) are always assumed. • A stationary series has extremal index if there exists a real sequence un for which • n(1-F(un)) • P(M1,nun) exp(-) • where M1,n = max(X1,X2,...,Xn) • Under D(un) the extremal index can be estimated as: • = lim P(M1,p(n) un | X0>un) • where p(n) is an appropriately increasing sequence • p(n) is regarded as the cluster size
Cluster size distribution and point process convergence • Distribution of the number of exceedances in [1,pn]: n(j) = P( 1{X1>un}+...+ 1{Xp(n)>un} = j | M1,p(n)>un ) • The point process of exceedances: Nn(.) = i/n(.)1{Xi>un} • Under appropriate conditions: • n converges to some limiting distribution • Nn(.) converges weakly to a compound Poisson process whose underlying Poisson process has intensity and whose i.i.d clusters are distributed as • High-level exceedances occur in clusters, with cluster size distribution . Moreover, E()=1/.
Distribution of aggregate excess • Aggregate excess above u in time interval [k,l]: Wk,l(u) = (Xk-u)++(Xk+1-u)++...+(Xl-u)+ • This value (called flood volume in hydrology) is a good indicator of the severity of extreme events. • Under appropriate conditions (Smith et al., 1997): W1,n(un) dW1+W2+...+WK where K~Poisson() and the variables Wi are i.i.d, independent of K. • The distribution of Wi can be regarded as the limiting aggregate excess distribution during an extremal event.
Problems • Estimation of limiting quantities (, , W) is difficult. • Often the subasymptotic behaviour is of interest, too, since the convergence to the limit is very slow. • To overcome these problems, one can restrict attention to certain families of models. • A large class of Markov-chains behaves like a random walk at extreme levels • which can be used to simulate extremal clusters in a Markov-chain, see e.g. Smith et al. (1997)
Water discharge series are non-Markovian – even above high thresholds • If the series were Markovian, (Xt-Xt-1 | Xt-1,Xt-1-Xt-2>0) ~ (Xt–Xt-1| Xt-1,Xt-1-Xt-2<0) would hold • The following plots show Xt-Xt-1 as a function of Xt-1 (if Xt-1 is above the 98% quantile), conditionally on the sign of Xt-1-Xt-2 • The two plots are not similar!
A light-tailed conditionally heteroscedastic model Xt-ct = ai(Xt-i-ct-i) +t + bjt-j t = t Zt t = [d0 + d1(Xt-1-m)+]1/2 • Zt is an i.i.d. sequence with zero mean and unit variance • ct describes the deterministic seasonal behaviour in mean • If all moments of Zt are finite, then all moments of Xt are finite • However, the exact tail behaviour is unknown (a special case of a similar model has Weibull-like tails, see Robert, 2000) • The model approximates the extremal properties of water discharge series well (see Elek and Márkus, 2005)
A regime switching (RS) autoregressive model Xt = Xt-1 + 1tif It = 1(rising regime) Xt = aXt-1 + 0t if It = 0(falling regime) • 1t is an i.i.d noise, distributed as Gamma(,) • 0t is an i.i.d noise, distributed as Normal(0,) • 0<a<1 • Successive regime durations are independent and distributed as • NegBinom(1,p1) in the rising regime • NegBinom(0,p0) in the falling regime
Properties of the RS-model • Heuristic explanation: • Xt gets independent positive shocks in the rising regime • it develops as a mean-reverting autoregression in the falling regime • If 1=0=1, then It is a Markov-chain and Xt is a Markov-switching autoregression • The model is stationary by applying the result of Brandt (1986) for stochastic difference equations • Regime switching models have deep roots in hydrology (see e.g. Bálint and Szilágyi, 2005)
Tail behaviour of the stationary distribution • Theorem: The process has Gamma-like upper tail: • P( Xt>u | It=1 ) ~ K1 u-1 exp{-u[1-(1-p1)1/]} • P( Xt>u | It=0 ) ~ K0 u-1 exp{-u[1-(1-p1)1/]/a} • thus: P( Xt>u ) ~ K1 u-1 exp{-u[1-(1-p1)1/]}. • The proof is based on the observations that • the aggregate increment during a rising regime has Gamma-like tail • which becomes “negligible” during the falling regime. • Corollary: Exceedances above high thresholds are asymptotically exponentially distributed: • limu P(Xt>x+u | Xt>u) = exp{-x[1-(1-p1)1/]}
Limiting cluster quantities in the model I. • Even when the regime lengths are negative binomial, • the extremal index is p1, • and the limiting cluster size distribution is geometric with parameter p1.
Limiting cluster quantities in the model II. • If =1, the limiting aggregate excess distribution is W = E1 + 2E2 + ... + NEN • where N is geometric with parameter p1 • the variables Ei are exponential with parameter , independent from each other and from N • The exponential moments are infinite, but all polynomial moments are finite. • Anderson and Dancy (1992) suggested to model the aggregate excesses of a hydrological data set with Weibull-distribution.
Slow convergence to the limiting quantities • The plot gives (u,p) • if =p1=0.5, p0=0.1, a=0.5 and =0=1=1 • for p=100 and 200 and • for u ranging from the 99% to the 99.99% quantile = limp limu P( M1,pu | X0>u ) = (u,p)
Parameter estimation • Estimation of the whole model with hidden regimes: • (reversible jump) MCMC • maximum likelihood if 1=0=1 (i.e. in the Markov-switching case) – but it is computationally infeasible • However, if we focus only on extremal dynamics and assume that the regime durations (at least above a high level) are geometrically distributed we can write down the likelihood based solely on data during floods (i.e. above a high threshold) • =1 is also assumed (in accordance with the empirical data)
Exponential QQ-plot for the positive increments above the threshold 900 m3/s
Likelihood computations • Likelihood can be determined recursively: • qt=P( It=1 | Xt, Xt-1, …) • q1cond = P( It=1 | Xt-1,…) = (1-p1)qt-1 + p0(1-qt-1) • q0cond = P( It=0 | Xt-1,…) = p1qt-1 + (1-p0)(1-qt-1) • f1 = f(Xt , It=1| Xt-1,…) = q1cond fExp() (Xt-Xt-1) • f0 = f(Xt , It=0| Xt-1,…) = q0cond fN(0,) (Xt-aXt-1) • f(Xt | Xt-1,…) = f0 + f1 • qt = f1/(f0 + f1) • Some care is needed: • at the beginning of the floods qtis determined from the tail behaviour of the model • at the end of the floods the observation is censored
Advantages of using only the data over a threshold • Model dynamics may be different at lower levels • For physical reasons, the rate of decay in the falling regime (characterised by a) is varying over the decay • Fast maximum likelihood estimation • Smaller sample size • Regimes separate very well at high levels
Application to flood analysis • Data: 50 years of daily water discharge series at Tivadar (river Tisza) – about 18000 observations • We assume =0=1=1 • Threshold: 900m3/s (about 98% quantile) • Parameter estimates and asymptotic standard errors: • p1=0.598 (0.037) • on average 1.7 days of further increase – in accordance with emp. value • p0=0.027 (0.011) • has a negligible effect on the dynamics over the threshold • a=0.823 (0.007) • high persistence even in the falling regime • =0.0044 (0.0003) • =137.1 (8.0)
Empirical and simulated flood dynamics • Shape of the empirical and simulated floods are very similar. • Subasymptotic behaviour is important: • Simulated water discharge remains over the threshold for 1.4 days in average after the peak
Exceedances over a threshold • Maximal exceedance over a threshold is approximately exponential with parameter p1=1/392 in the model, • in good accordance with the empirical distribution. • The plot shows the exceedance over the threshold 1250m3/s.
Aggregate excess (flood volume) • Threshold = 1250 m3/s • Operational definition: two floods are separated when the water discharge goes below a lower threshold (900 m3/s) between them • There are only 48 such floods in 50 years • Emp. mean: 72.1 mill. m3 • Sim. mean: 76.9 mill m3 • The QQ-plot shows the fit of the distribution, too.
Conclusions • The limiting cluster quantities can be determined in our physically motivated regime switching model • Simulations are still needed since the subasymptotic behaviour is important at the relevant thresholds • To determine return levels of, e.g., flood volume, the occurence of extreme events should also be modelled, by a Poisson-process. • Further work: what parametric multivariate extreme value distribution does a reasonable multivariate regime switching model suggest?
References • Anderson, C.W. and Dancy, G.P. (1992): The severity of extreme events, Research Report 409/92 University of Sheffield. • Bálint, G. and Szilágyi, J. (2005): A hybrid, Markov-chain based model for daily streamflow generation, Journal of Hydrol. Engineering, in press. • Brandt, A. (1986): The stochastic equation Yn+1=AnYn+Bn with stationary coefficients, Adv. in Appl. Prob., 18, 211-220. • Elek, P. and Márkus, L. (2004): A long range dependent model with nonlinear innovations for simulating daily river flows, Natural Hazards and Earth Systems Sciences, 4, 277-283. • Elek, P. and Márkus, L. (2005): A light-tailed conditionally heteroscedastic model with applications to river flows, in preparation. • Robert, C. (2000): Extremes of alpha-ARCH models, in: Measuring Risk in Complex Stochastic Systems (ed. by Franke et al.), XploRe e-books. • Segers, J. (2003): Functionals of clusters of extremes, Adv. in Appl. Prob., 35, 1028-1045. • Smith, R.L., Tawn, J.A. and Coles, S.G. (1997): Markov chain models for threshold exceedances, Biometrika, 84, 249-268.