New approaches in extreme-value modeling

New approaches in extreme-value modeling A.Zempléni, A. Beke, V. Csiszár (Eötvös Loránd University, Budapest) Flood Risk Workshop, 08.07.2002

Analysis of extreme values • Probably the most important part of the project (its aim: to estimate return levels – values which are supposed to be observed once in a given period, 100 years for example). • Classical methods: based on annual maxima (other values are not used). • Peaks-over-threshold methods: utilize all values higher than a given (high) threshold.

Extreme-value distributions (for modeling annual maxima) Let be independent, identically distributed random variables. If we can find norming constants an, bn such that has a nondegenerate limit, then this limit is necessarily a max-stable or so-called extreme value distribution. X1, X2,…,Xn [max(X1, X2,…, Xn)-an]/ bn

Characterisation of extreme-value distributions • Limit distributions of normalised maxima: Frechet: (x>0) is a positive parameter. Weibull: (x<0) Gumbel: (Location and scale parameters can be incorporated.)

Estimation methods • Maximum likelihood, based on the unified parametrisation: • if • the most widely used, with optimal asymptotic properties, if ξ>-0.5 • Probability-weighted moments (PWM) • Method of L-moments

Probability-weighted moments • Analogous to the method of moments • It puts more weight to the high values • The estimators are got by equating the empirical and the theoretical weighted moments and solving the equations for the parameter vector.

Method of L-moments • The basic characteristics (mean, variance, skewness and kurtosis) of the observed distribution are equated to their respective theoretical values. • These values can be estimated by the help of the probability weighted moments.

Comparison • Maximum likelihood is preferable, since • asymptotic properties are known, allowing the construction of confidence intervals • covariates can be incorporated into the model • For the other methods, there is no firm theory.

Further investigations • Estimates for return levels • Confidence bounds should be calculated, possible methods • based on asymptotic properties of maximum likelihood estimator • profile likelihood • resampling methods (bootstrap, jackknife) • Bayesian approach

Confidence intervals • For maximum likelihood: • By asymptotic normality of the estimator: where is the (i,i)th element of the inverse of the information matrix • By profile likelihood • For other nonparametric methods by bootstrap.

Profile likelihood • One coordinate of the parameter vector is fixed, the maximization is with respect the other components: • Its main advantages: • The uncertainty can be visualized • More exact (asymmetric) confidence bounds • Model selection for nested models is possible by the likelihood ratio test

Model diagnostics • Probability plot (P-P plot), the points: • Quantile plot (Q-Q plot), the points: Both diagram should be close to the unit diagonal if the fit is good.

An example: a simulated 100-element sample of unit exponentials

Peaks over threshold methods • Those events are considered extreme, which exceed a given (high) threshold • Advantages: • More data can be used • Estimators are not affected by the small “floods” • Disadvantages: • Dependence on threshold choice • Declustering not always obvious

Theoretical foundations Let X1, X2,…,Xn be independent, identically distributed random variables. If the normalised maximum of this sequence converges to an extreme value distribution (with parameters μ,σ,ξ), then if y>0 and where The asymptotics holds if n and u increases.

Inference • Similar to the annual maxima method: • Maximum likelihood is to be preferred • Confidence bounds can be based on profile likelihood • Model fit can be analyzed by P-P plots and Q-Q plots • Return levels/upper bounds can be estimated

Threshold selection Mean excess plot: For any u (threshold), plot the mean of X-u (for those observations for which X>u) against u. If the Pareto model is true, this plot should be nearly linear. The interpretation is made difficult by the great variability near the upper endpoint of the observations.

Another, very recent method: maximum cross-entropy • Kullback introduced the concept of probabilistic distance (cross-entropy) of a posterior distribution h(t) from a prior f(t): • The method (Pandey, 2002)minimizes the cross-entropy of x(t) (the observed quantile function of exceedances) with respect to its prior estimate y(t), which is chosen as the exponential (motivated by its central role within the GPD family).

Moment conditions PWM constraints: (usually N=3 is used), where is unbiased estimator for the kth PWM, based on the ordered sample of size n

Some results for Vásárosnamény • Six threshold values were used: 440, 480, 520, 560, 600 and 640 centimetres. • As constraints, we considered the first four PMWs, that is, N=3. • The estimated 100-, 500-, and 1000-year return levels

Comments • The results were not as stable as it was claimed in the original paper. • We intended to add bootstrap confidence bounds to the estimates, but this was too time consuming in its original version and the used simplifications have not proven to be realistic.

Stationary sequences • If the independence does not hold (as it is the case for the original daily observations), the limit of the normalized maxima is still a GEV distribution, if the dependence among far away observations tend to diminish. (See the talk of S. Gáspár.) So the GEV model for annual maxima has sound theoretical background. • For POT models, the maximum of the clusters of exceedances may be used. (Clusters need to be defined).

To cope with nonstationarity • Linear regression-type models can be incorporated into the maximum likelihood framework • Profile likelihood, likelihood-ratio tests can be performed for nested models

Water-level data example: Vásárosnamény • At least two observations per day for each station (there are approx. 50 of them) for 100 years. • Reduction: one observation per day.

Annual maxima

The fitted model (shape=-0.4459, right-endpoint: 778 cm)

Another station (estimated max: 945 cm)

New approaches in extreme-value modeling

New approaches in extreme-value modeling

Presentation Transcript

Logic Modeling Approaches

Extreme Value Theory: A useful framework for modeling extreme OR events

Financial Modeling of Extreme Events

Extreme value theory

Extreme Value Techniques

Extreme Value Analysis

New Approaches

Extreme Value Analysis

New Approaches to Modeling Nonlinear Structure Formation

Extreme Value Theorem

Some advanced methods in extreme value analysis

Testing extreme value conditions  an overview and recent approaches

4.1 Extreme Value Theorem

Data Mining Approaches in Atomistic Modeling

Extreme value statistics

Watershed Modeling Approaches

Extreme Value Statistics in SDSS

Extreme Value Theory: Part II

Extreme Modeling of Equity Portfolios