Regional Frequency Analysis (RFA) Adapted from Hosking and Wallis (1997)

Regional Frequency Analysis (RFA)Adapted from Hosking and Wallis (1997) Professor Ke-Sheng Cheng Dept. of Bioenvironmental Systems Engineering National Taiwan University

Why regional frequency analysis (RFA) is needed? • Hydrological frequency analysis is generally conducted for sites with rainfall or flow measurements. • For areas with short record length or without rainfall or flow measurements, hydrological frequency analysis needs to be conducted using data from sites of similar hydrological characteristics.

The Index-Flood Approach for RFA – Concept • Proposed by Dalrymple (1960) for flood frequency analysis. • Let Q be the hydrological variable of interest, for example annual maximum rainfall of a specific duration or annual maximum flow. Suppose that observed data of Q are available at N different sites and ni represents the sample size for the i-th (i = 1, 2, …, N) site. Also, let Qi(F) be the quantile function of Q at site-i.

Observed data: Quantile function • Assume the quantile function of hydrological variables at different sites can be expressed by where i is the index flood (Dalrymple, 1960) and q(F), known as the regional growth curve, is an adjusted dimensionless quantile function common to every site. The index flood i is often taken to be the mean of Q at site-i.

The regional growth curve q(F) is considered as the quantile function of a common distribution Qij/i . • It is usually assumed that the distribution type for the rescaled data Qij/i (i.e. the regional frequency distribution ) is known. Thus, it is necessary to estimate parameters of this common distribution using observed data available at different sites.

The Index-Flood Approach for RFA – Estimations • Parameter estimation • Regional frequency analysis

The Index-Flood Approach for RFA – Implicit Assumptions • Observations at any given site are identically distributed. • Observations at any given site are serially independent. • Observations at different sites are independent. • The distributions of the rescaled variable at different sites are identical. • The distribution type of the rescaled variable is correctly specified. Generally valid. Unlikely to be satisfied by environmentaldata.

The assumption that distributions of the rescaled variable at different sites are identical implicitly imply the existence of a homogeneous region. • A homogeneous region is considered as an area within which rescaled variables in different sites have approximately the same probability distributions. • The homogeneous region need not to be geographically continuous.

Implicit in the definition of a homogeneous region, is the condition that all sites can be described by one common probability distribution after the site data are rescaled by their at-site mean. Thus, all sites within a homogeneous region have a common regional growth curve.

General procedures of regional frequency analysis • Data screening • Correctness check • Data should be stationary over time. • Identifying homogeneous regions • A set of characteristic variables should be chosen and used for delineation of homogeneous regions. • Characteristic variables may include geographic and hydrological variables.

Choice of an appropriate regional frequency distribution • GOF test using rescaled samples from different sites within the same homogeneous region. • The chosen distribution not only should fit the data well but also yield quantile estimates that are robust to physically plausible deviations of the true frequency distribution from the chosen frequency distribution.

Parameter estimation of the regional frequency distribution • Estimating parameters of the site-specific frequency distribution • Estimating parameters of the regional frequency distribution using weighted average.

Situations for application of RFA • General application to individual sites with observed data • Regionalization is valuable. Even though a region may be moderately heterogeneous, regional frequency analysis will still yield much more accurate quantile estimates than at-site analysis. • Application to one site of special interest. • Special care should be taken (by choosing appropriate characteristic variables) to make the site typical of the region to which it is assigned.

Application to one or more ungauged sites (PUB program – http://iahs.info/ ). • An ungauged site can be assigned to a homogeneous region based on its characteristic variables. The regional growth curve at an ungauged site is then estimated using the characteristic variables. • The index flood (or index quantity, if the variable of interest is not flood flow) can be considered as a function of characteristic variables and to calibrate the function by using data from the gauged sites.

Data screening using a measure of discordance Di • Assuming that there are N sites in a region and we want to identify those sites that are grossly discordant with the group as a whole. Hosking and Wallis (1997) proposed a measure of discordance in terms of L-moments (t, t3, and t4) of the sites’ data.

It can be shown that Di satisfies the algebraic bound . Thus, the value of Di can exceed 3 only in regions having 11 or more sites. • The criterion for discordance should be an increasing function of the number of sites in the region since regions with more sites are more likely to contain sites with large values of Di. Hosking and Wallis (1997) recommend that any site with Di >3 be regarded as discordant, as such sites have L-moments ratios that are markedly different from the average for the other sites in the region.

Defining homogeneous sub-regions • Homogeneous sub-regions (grouping of sites/gages) can be determined based on the similarity of the physical and/or meteorological characteristics of the sites. This can be done by performing cluster analysis. • L-moment statistics can then used to estimate the variability and skewness of the pooled regional data and to test for heterogeneity as a basis for accepting or rejecting the proposed sub-region formulation.

Candidates for physical features included such measures as: site elevation; elevation averaged over some grid size; localized topographic slope; macro topographic slope averaged over some grid size; distance from the coast or source of moisture; distance to sheltering mountains or ridgelines; and latitude or longitude.

Candidate climatological characteristics included such measures as: mean annual precipitation; precipitation during a given season; seasonality of extreme storms; and seasonal temperature/dewpoint indices.

Example • A review of the topographic and climatological characteristics in the Oregon study area showed only two measures, mean annual precipitation (MAP) and latitude were needed for grouping of sites/gages into homogeneous sub-regions within a given climatic region. Homogeneous sub-regions were therefore formed with gages/sites within small ranges of MAP and latitude.

The output from the cluster analysis need not, and usually should not, be final. Subjective adjustments can often be found to improve the physical coherence of the regions. Several kinds of adjustment of regions may be useful: • move a site or a few sites from one region to another; • delete a site or a few sites from the data set; • subdivide the region; • break up the region by reassigning its sites to other regions; • merge the region with another or others; • merge two or more regions and redefine groups; and • obtain more data and redefine groups.

Test of regional homogeneity • Once a set of physically plausible regions has been defined, it is desirable to assess whether the regions are meaningful. This involves testing whether a proposed region may be accepted as being homogeneous and whether two or more homogeneous regions are sufficiently similar that they should be combined into a single region. • The hypothesis of homogeneity is that the at-site frequency distributions are the same except for a site-specific scale factor.

Rationale of test of regional homogeneity • Comparing the between-site dispersion of the sample L-moment ratios for the group of sites under consideration and the expected dispersion of a homogeneous region.

Test of regional homogeneity • A heterogeneity measure proposed by Hosking and Wallis (1997). • Suppose that the proposed region has N sites, with site i having record length ni and sample L-moment ratios • Let represent the regional average L-CV, L-skewness, and L-kurtosis, weighted proportionally to the sites’ record length; for example

Calculate the weighted standard deviation of the at-site sample L-CVs, • Fit a four-parameter kappa distribution to the regional average L-moment ratios

Simulate a large number Nsim of realizations of a region with N sites, each having this kappa distribution as its frequency distribution. • The simulated regions are homogeneous and have no cross-correlation or serial correlation; sites have the same record lengths as their real-world counterparts. • For each simulated region, calculate V. • From the simulations determine mean and standard deviation of the Nsim values of V. Call these .

Calculate the heterogeneity measure

Choosing a distribution for frequency analysis • For regional frequency analysis, a single probability distribution is applied to all sites within a homogeneous region. Thus, it is necessary to choose a best-fit distribution from a set of candidate distributions. • Assume that the region is acceptably close to homogeneous. The L-moment ratios of the sites in a homogeneous region are well summarized by the regional average and the scatter of the individual sites’ L-moment ratios about the regional average represents no more than sampling variability.

The goodness-of-fit can be judged by how well the L-skewness and L-kurtosis of the fitted distribution match the regional average L-skewness and L-kurtosis of the observed data. • Assume for convenience that the candidate distribution is generalized extreme-value (GEV), which has three parameters, and the sample L-skewness and L-kurtosis are exactly unbiased.

The GEV distribution fitted by the method of L-moments has L-skewness equal to the regional average L-skewness.

Note: When fitting a three-parameter candidate distribution to at-sites L-moment ratios, we only need to estimate the L-skewness of the distribution by using the method of L-moments (L-skewness equal to the regional average L-skewness). There is no need for estimation of the L-kurtosis since the L-kurtosis is completely dependent on the L-skewness.

We thus judge the quality of fit by the difference between the L-kurtosis of the fitted GEV distribution and the regional average L-kurtosis .

Small values of ZGEV indicate that the GEV distribution can be considered as the true underlying distribution for the region.

Calculation of 4 • Theoretically, separate set of simulations must be made for each candidate distribution in order to obtain the appropriate 4 values. In practice, we can obtain a 4 value by using the same simulated realizations of a kappa distribution for a homogeneous region used in test of regional homogeneity.

Bias correction for L-kurtosis

Goodness-of-fit test • Given a set of candidate three-parameter distributions (Pearson type III, GEV, lognormal, generalized Pareto, etc.). We first need to fit each distribution to the regional average L-moment ratios . • Denote by the L-kurtosis of the fitted distribution, where DIST represents a candidate distribution.

Fit a kappa distribution to the regional average L-moment ratios . • Simulate a large number, Nsim, of realizations of a region with N sites, each having this kappa distribution as its frequency distribution. The simulated realizations are homogeneous and have no cross-correlation or serial correlation; sites have the same record lengths as their real-world counterparts. The fitting of a kappa distribution and simulation of at-site realizations of the kappa distribution can use the same simulations as those used for test of regional homogeneity.

For the m-th simulated realization, regional average L-skewness and L-kurtosis can be calculated.

Regional Frequency Analysis (RFA) Adapted from Hosking and Wallis (1997)