430 likes | 614 Views
Bayesian Spatial Modeling of Extreme Precipitation Return Levels. Daniel COOLEY, Douglas NYCHKA, and Philippe NAVEAU (2007, JASA). Background. July 28, 1997, a rainstorm in Fort Collins, Colorado killed five people caused $250 million in damage 1976 Big Thompson flood near Loveland, Colorado
E N D
Bayesian Spatial Modeling of ExtremePrecipitation Return Levels Daniel COOLEY, Douglas NYCHKA, and Philippe NAVEAU (2007, JASA)
Background • July 28, 1997, a rainstorm in Fort Collins, Colorado • killed five people • caused $250 million in damage • 1976 Big Thompson flood near Loveland, Colorado • Killed 145 people • 1965 South Platte flood • $600 million in damages around Denver
extreme precipitation events • understanding their frequency and intensity is important for public safety and long-term planning • Challenges • limited temporal records • extrapolate the distributions to locations where observations are not available • Data • Precipitation amount at some stations • Possibly some other covariates
Measure of extreme events • Return level • The r-year return level is the quantile that has probability 1/r of being exceeded in a particular year. P(X>tr) = 1/r • Precipitation return levels • given in the context of the duration of the precipitation event • The r-year return level of a d-hour (e.g., 6- or 24-hour) duration interval is reported. • The standard levels for the NWS’s most recent data products are quite extensive with duration intervals ranging from 5 minutes to 60 days and with return levels for 2–500 years. • This article focuses on providing return level estimates for daily precipitation (24 hours)
Most recent precipitation atlas for Colorado • Produced in 1973 • the atlas provides point estimates of 2-, 5-, 10-, 25-, 50-, and 100-year return levels for duration intervals of 6 and 24 hours. • Shortcoming • it does not provide uncertainty measures of its point estimates
Extreme value theory (EVT) • Statistical models for the tail of a probability distribution • Univariate case: generalized extreme value (GEV) distribution • Given iid continuous data Z1,Z2, . . . ,Zn and letting Mn = max(Z1,Z2, . . . ,Zn), it is known that if the normalized distribution of Mn converges as n→∞, then it converges to a GEV
Generalized Pareto distribution (GPD) • Using the maxima only disregards other extreme data that could provide additional information. • GPD • based on the exceedances above a threshold • Exceedances (the amounts that observations exceed a threshold u) should approximately follow a GPD as u becomes large and sample size increases
GPD • Tail of the distribution • Scale parameter • Shape parameter controls the tail
More EVT Exceedance rate
Extreme of spatial data • Weather describes the state of the atmosphere at a given time • Extreme weather events can be modeled by theory on the dependence of extreme observations • Climate at a given location is the distribution over a long period of time • climatological quantities, such as return levels, and their spatial dependence must be modeled outside of the framework above • How does the distribution of precipitation vary over space?
Goal • Let Z(x) denote the total precipitation for a given period of time (e.g., 24 hours) and at location x. • The goal is to provide inference for the probability P(Z(x) > z + u) for all locations, x, in a particular domain and for u large • Given this function, one can compute return levels and other summary measures • To produce a return level map with measure of uncertainty
Basic idea • In the GPD model, we add a spatial component by considering all parametersto be functions of a location x in the study area. • We assume that the values of result from a latent spatial process that characterizes the extreme precipitation and arises from climatological and orographic effects. • The dependence of the parameters characterizes the similarity of climate at different locations
A Bayesian study • A study of 24-hour precipitation extremes for the Front Range region of Colorado • Estimate potential flooding • Apr 1 – Oct 31 • 75% of Colorado’s population lives in this area
Data • 56 weather stations • Daily total precipitation amounts during 1948-2001 • 21 stations have over 50 years of data • 14 stations have less than 20 years of data • All stations have some missing values • Covariates • Elevation • Mean precipitation (MSP) • Remark: covariate information is needed for the entire region to interpolate over the study region and produce a precipitation map
Data Precision • Boulder Station • prior to 1971, precipitation was recorded to the nearest 1/100th of an inch (.25 mm) • after 1971, recorded to the nearest 1/10th of an inch (2.5 mm) • All but three stations similarly switched their level of precision around 1970 • Low precision data is a discretization of the high precision data
Treatment to discretization • True value is uniformly distributed around the observed value • What is the effect of such an assumption? • Adjust the likelihood • d is the length of the interval
How to choose the threshold u? • Bias-variance trade off • If u is large, distribution is close to GPD • If u is large, less data can be used • Finally, the threshold is taken as 0.55 inches • a threshold sensitivity analysis of model runs indicates that the shape parameter is more consistently estimated above this threshold • 7789 exceedances (2% of the original data)
Residual dependence • Assumption • the precipitation observations are conditionally independent spatially and temporally given the stations’ parameters • the spatial dependence is accounted for in the stations’ parameters • This conditional independence may not be true, though.
temporal independence • Temporal dependence • When dependence is short range and extremes do not occur in clusters, maxima still converges to GEV in distribution • If a station had consecutive days that exceeded the threshold, we declustered the data by keeping only the highest measurement • Declustering actually did not change the results much
Spatial dependence • The authors tested for spatial dependence in the annual maximum residuals of the stations • there was a low level of dependence between stations within 24 km (15 miles) of one another and no detectable dependence beyond this distance. • there are very few stations within this distance that record data for the same time period
Seasonal effects • Restricting our analysis to the nonwinter months reduces seasonality • inspecting the data from several sites showed no obvious seasonal effect
Model for Threshold Exceedance • Hierarchical model • Layer 1: data at each station • Layer 2: the latent process that drives the climatological extreme precipitation for the region • Layer 3: the prior distributions of the parameters that control the latent process
Data layer for return level • A GPD distribution • Reparametrization • Let be the kth recorded precipitation amount at location density
Process layer • A structure that relates the parameters of the data layer to the orography and climatology of the region. • Spatial (longitude/latitude) space climate (elevation/MSP) space • Stations are sparse in the spatial space • Stations far away spatially can be close in the climate space • MSP: mean precipitation
Scale parameter • : A Gaussian process with
Shape parameter • A single value for the entire study region with a Unif(-Inf, Inf) prior • Two values • One for the mountain stations • One for the plain stations • A Gaussian process with structure similar to the scale parameter
Priors of • Prior independence • Regression parameter: noninformative • Spatial parameter • Noninformative leads to improper posterior • Informative priors from MLE • Shape parameter
Model for Exceedance Rate • To know the return level, we need to know both the model parameters and the exceedance rate • Assume each station’s number of exceedances is binomial with probability parameter • Logit transformation • Assume the logit transformed parameter as a Gaussian process • Similar prior specification
MCMC • Metropolis within Gibbs • Proposal distribution is obtained using normal approximation or random walk • Three parallel chains • Each chain has 20,000 iterations • 2000 burn-in steps • Test for convergence: Gelman<1.05 • Draws are used to perform spatial interpolation and inference
Point estimate for 25-year return level for daily precipitation
Sensitivity analysis • Sensitivity of the inference to prior of • Ran Model 7 with • Original prior for : Unif[6/7,12] • Alternative prior : Unif[0.214,6] • Posterior of is sensitive to the prior • But the product is less sensitive, and it is what is important for interpolation
Conclusions • A Bayesian analysis for spatial extremes • Model for exceedances • Model for threshold exceedance rate parameter • By performing the spatial analysis on locations defined by climatological coordinates, the authors were able to better model regional differences for this geographically diverse study area. • Produce a map of return levels with features not well shown by the 1973 atlas • an east–west region of higher return levels north of the Palmer Divide • a region of lower return levels around Greeley • region-wide uncertainty measures