450 likes | 610 Views
A Latent Approach to the Statistical Analysis of Space-time Data Dani Gamerman Instituto de Matemática Universidade Federal do Rio de Janeiro Brasil http://acd.ufrj.br/~dani 17th International Workshop on Statistical Modelling Chania, Crete, Greece, 8-12 July 2002. World Cup Algorithm.
E N D
A Latent Approach to the Statistical Analysis of Space-time Data Dani Gamerman Instituto de Matemática Universidade Federal do Rio de Janeiro Brasil http://acd.ufrj.br/~dani 17th International Workshop on Statistical Modelling Chania, Crete, Greece, 8-12 July 2002
World Cup Algorithm 1958 1962 1966 1970 1974 1978 1982 1986 1990 1994 1998 2002 One-time only home win Europe 25 miles apart Played in Europe Played in exotic place Played in (L) America Played in Europe Played in L. America Center of the world football ? 2006
A Latent Approach to the Statistical Analysis of Space-time Data Dani Gamerman Instituto de Matemática Universidade Federal do Rio de Janeiro Brasil http://acd.ufrj.br/~dani Joint work with Marina S. Paez (IM-UFRJ) Flavia Landim (IM-UFRJ) Victor de Oliveira (Arkansas) Alan Gelfand (Connecticut) Sudipto Banerjee (Minnesota) 17th International Workshop on Statistical Modelling Chania, Crete, Greece, 8-12 July 2002
Introduction Environmental science – data in the form of a collection of time series that are geographically referenced. Some examples can be found in other areas • Examples: • 1) measurements of pollutants in time over a set of monitoring stations 2) selling price of properties around a neighborhood of interest 3) counts of morbidity/mortality events in time over a collection of geographic regions
Main Objective: spatial interpolation Example: Pollution in Rio de Janeiro Paez, M.S. and Gamerman, D. (2001). Technical report. Statistical Laboratory, UFRJ.
Picture showed mean interpolated values Other features of interest can be obtained Example: Pollution in Rio de JaneiroProb ( PM10 > 100 mg/m3 | Yobs )
SpatialInterpolation • m = number of observations • g = number of grid points • s1, ... ,sm = observed sites • s1n,...,sgn = grid points (to interpolate) • Y1n,...,Ygn = observations in the grid points
Interpolation • 1. Frequentist inference: generate Yn from • 2. Bayesian inference: i ) generate y from • ii ) generate Yn from • y- all model parameters • Ymis - missing data, treated as parameters • If y = y* with probability 1 then • Obtain P(Yn|Yobs)by simulation. • Steps to generate from Yn|Yobs :
GaussianProcess (GP) (or Gaussian Random Field) S - region of Rp (in general, p=2) { w(s) : s S } is a GP if n, s1 , ... , sm S ( w(s1) , ... , w(sn) ) ~ Nn (, ) where = ( 1, ... , n ) with i=E[w(si)] and = (ij[w(si), w(sj)] )i,j Usual simplifications: 1) Isotropy [w(si),w(sj)]= q(hij) with hij=|si– sj| 2) Homoscedasticity i = , i Notation: w(.) ~ GP((.),,q)
Statistical Analysis Starting point: regression models Yt(s) = t(s) + e t(s) where t(s) = 0 + 1 X t1(s) + ... + pXtp(s) and et(s) ~ N(0, e2) independent Suppose that Xtj(s) handles temporal autocorrelation Otherwise, we can include a temporal component t Usually et(s) remains spatially correlated In this case, et(s) = e0(s) + et1(s) e0(s) errors spatially correlated et1(s) pure residual (white noise) 0(s) = 0 + e0(s)
How to estimate 0(s) ? (c) Inference based on Traditional approach: geostatistical 0(.) ~ GP(0,,q) or e0(.) = 0(.) 0 ~ GP(0,,q) then, 0obs ~ N(0 1, , R) 0obs = (0(s1) , ... , 0(sm) ) Hiperparameters: e2, 2 and 0 Inference 1. At first (3 steps) (a) 0 , 1 , ... , p estimated in the regression model and the residuals rt0(s) = Yt (s) mt(s) are constructed (b) e2, 2 and 0estimated from rt0(s)
Problems: (a) rt0(s) et (s) (b) • 2) next step: • 0 , 1 , ... , p and estimated jointly • solves (a) • but to incorporate uncertainty about is complicated • 3) Natural solution (Kitanidis, 1986; Handcock & Stein, 1993): • specify distribution for 0 • perform Bayesian inference
Model generalization Recall: E[Yt(s)]=0(s) + 1Xt1(s) + ... + pXtp(s) Spatial heterogeneity doesn’t have to be restricted to 0 Example: site by site effect of temperature in the Rio pollution data
We can accommodate spatial variation for other coefficients j, j=1, ... , p. a) prior independence b) same spatial structure and prior correlation between the j(.)´s previous model E [Yt(s)] = 0(s) + 1 Xt1(s) + ... + pXtp(s) Extension of the previous model E [Yt(s)] = 0(s) + 1(s)Xt1(s) + ... + p(s)Xtp(s) (.) = (0(.), 1(.),..., p(.)) One possibility: (.) ~ GP(, , ) Hyperparameters: Y = (g , se , S , q0), where g = (g0, g1,..., gp) Special cases for the j(.)´s:
1) classical solution (Oehlert, 1993; Solna & Switzer, 1996): (a) 0 (s), 1 (s), ... , p (s) estimated by b0(s), b1(s), ... , bp (s) in the local regression model (b) estimated from the bj(s) (c) inference based on Problems (the same as before): (a) bj(s) j(s) (b) How to estimate j(s), j=0,1,...,p ? 2) natural solutions: Specify prior distribution for Y In general, independent and non informative priors are used
Model Summary • Parameters: obs , where = ( ,e2, ,) • jobs = (j(s1) , ... , j(sm) ), j=0, 1, ... , p • obs = (0obs , ... , pobs ) • = ( 0 , 1 , ... , p ) • Data: Yobs = (Y1(s1) , ... , YT(sm)) • Xobs = (X1(s1) , ... , XT(sm))
Simulated data Yt(s) = mt(s) + et(s), t=1,...,30mt(s) = b0(s)+ b1(s) Xt(s)et(s) ~ N(0, e2) independent with e2=1 b0 ~N(g0, s0 ,0(.)) b1 ~N(g1, s1 ,1(.)) Xt(s)~N(g2, s2 ,2(.)), for all time tExponential correlation functions: rj(x) = exp{- qj x} g0= 100 g1= 5 g2= 0 q0= 0.4 q1= 0.8 q2= 1.5s02= 0.1 s12= 1 s22= 0.333
Simulated Data b0 Y b1X e + = +
Inference Parameters: (obs ,) = ( ,e2, ,) Likelihood: L(obs ,) = p(Yobs | obs , e2 ) Prior: p(obs ,)=p( obs | Y) p()p(e2) p(S) p(q) Posterior: (obs ,) L (obs ,) p(obs ,) • Many parameters • Complicated functional form • Solution by MCMC
Full Conditionals (a) [ obs | rest ] ~ Normal (b) [ | rest] ~ Normal use jobs as if they were data (c) [ e2 | rest ] ~ [ e2 | Yobs , obs ] ~ Inverse Gamma (d) [ |rest ] ~ [ | bobs , , q] ~Inverse Wishart (e) [ | rest ] ~ j p(j | jobs , j,) p() again, use jobs as data (geostatistical analysis) hard to sample Metropolis - Hastings
Results (based on a regular grid of m=25 sites) Histogram of the parameters g0 g1 q0 q1 te t0 t1 ti = si-2
Spatial Interpolation Interpolation grid: s1n , ... , sgn jn = (j(s1n) , ... , j(sgn) ), j=0, 1, ... , p n = (0n , ... , pn ) We need to obtain the interpolation of j´s to interpolate Yn
Spatial Interpolation of ´s (n,obs,| Yobs) = ( n | obs, , Yobs) ( obs, | Yobs) = ( n | obs ,) ( obs, | Yobs) Simulation of [ n | Yobs ] in 2 steps: (a)[ obs, | Yobs ] using MCMC (b)[ n | obs ,] using Multivariate Normal • Interpolation of Y´s (Yn,n,| Yobs) = (Yn|n, , Yobs) (n,| Yobs) = (Yn| n ,) (n,| Yobs) Simulation of [Yn |Yobs] also in 2 steps: (a) [ n, | Yobs ] MCMC and Spatial Interpolation (b) [ Yn| n ,] using Multivariate Normal
Simulated data: Interpolation of b1 Simulated values Interpolated values
Simulated data: Interpolation of Y30(.) Simulated values Interpolated values
Interpolation of X´s These interpolations assume that the interpolated covariates Xj are available for j=1, ... , p Otherwise, we must interpolate them
Model may be completed with X(.) | x , x ,Sx ~ GP(x, Sx,x(.)) (Xn, x | Yobs , Xobs) = (Xn , x| Xobs ) = (Xn| x, Xobs) (x | Xobs ) Simulation of [Xn|Yobs,Xobs] in 2 steps: (a) [x | Xobs ] MCMC (b) [Xn| x, Xobs ] using Multivariate Normal
Histogram of the parameters g0 g1 te q0 q1 q2 t2 t0 t1 Results obtained by interpolating X Precisions less sparse then when X is known
Interpolation of X30(.) Simulated values Interpolated values
Interpolation of Y30(.) Known X Unknown X
Now, the temperature coefficient varies in space Yt(s)= b0 (s)+ b1(s)TEMPt + b´ Xt+ et(s) Application to the pollution data Yt (s) = square root of PM10 at site s and time tXt= (MON, TUE, WED, THU, FRI, SAT) et(s) independents N(0,se2)b0~N(g0, s0 ,0(.))b1~N(g1, s1 ,1(.))ri(.), i=0,1 are exponential correlation functions
Histogram of the hiperparameters sample b0 te q0 t0 t1 q1 where ti = si-2 Results for the pollution data in Rio
G(10-3,10-3) G(10,10c) SSDE = 0.1444 SSDE = 0.0637 where SSDE = Interpolation of the b1 coefficient Prior for t c obtained by exploratory analysis site by site (OLS) Same idea can be used for q (explanatory geostatistical analysis)
Extension of the previous model previous model Yt(s)= t(s) + et(s) where t(s)=t0(s)+t1(s)Xt1(s)+...+tp(s)Xtp(s) et(s) ~ N(0, e2) independent Yt(s)= t(s) + et(s) where t(s)= 0(s)+ 1(s)Xt1(s)+...+ p(s)Xtp(s) et(s) ~ N(0, e2) independent We can also accommodate temporal variation of the coefficients j, j=0,...,p. Natural specification t(.) | g t ~ GP(t ,S,), independent in time The model must be completed with: (a) prior for (se , S , q) as before (b) specification of the temporal evolution of the t´s
Suggestion - use dynamic models (SVP/TVM) • (Landim & Gamerman, 2000) • t | t-1 ~ N( Gt t-1 , Wt ) • unknown parameters of the evolution Model parameters: obs , , = ( 0 , e2 , S , , W ) where = ( 1, ... , T) and t= ( t0 , t1, ... , tp ), t=1, ... , T Simulation cycle has 2 changes: I) additional step to II) modified step to
Histogram of the posterior of q Application to simulated data Multivariate observations: Yt (s) = (Yt1 (s), Yt2 (s)) Yt (s) = bt0 (s) + bt1(s)Xt1(s) + et(s) bt(.) ~GP (gt, S,) gt = gt-1 + wt same spatial correlation to b0 and b1 r(.) - exponential correlation function with q=1.
Interpolation Samples from ytn|yobs are obtained through the algorithm below: 1. Sample from btobs, y| yobs - through MCMC 2. Sample from btn| btobs, y - through Gaussian process 3. Sample from ytn| btn, y - Independent Normal draws Once again, Xtn must be known, otherwise, they will have to be interpolated.
Spatially- and time-varying parameters (STVP) SVP/TVM: STVP: Another possibility: temporal evolution applied directly to the bt processes rather than to their means Yt (s) = bt0 (s) + bt1(s)Xt1(s) + et(s) bt(.) = bt-1(.) + wt(.) wt(.) ~ GP (gt, s,) independent in time Completed with: b0(.) = g0 ~ N(g0,R) Marginal Prior: bt(.)| gt ~ GP (gt, tS,) Not separable at the latent level, unlike SVP/TVM
Computations MCMC algorithm must explore the correlation structures parameters are visited in blocks (Landim and Gamerman, 2000; Fruhwirth-Schnatter, 1994) • Prediction Based on the forecast distribution of YT+h|Yobs, for YT+h = (YT+h(s1f ),..., YT+h(sFf )), and any collection (s1f,..., sFf) 1. Sample from bTobs, y| Yobs - through MCMC 2. Sample from bT+hf| bTobs, Y - obtained by introduction of bT+hobs bTobs bT+hobs by successive evolution of the b process bT+hobs bT+hfby interpolation with gaussian process 3. Sample from Ytn| bTn, Y - Independent normal draws
Time-varying locations SVP/TVM: Easily adapted STVP: requires introduction of for updated locations Assume locations st = (st1,..., stnt) at time t btobs is a nt-dimensional vector, t = 1,...,T Both densities in the integrand are multivariate normal The convolution of these two densities can be shown to be normal and required evolution equation for b can be obtained
Can be normalized after suitable transformation gl(y) Example: Rio pollution data Non-Gaussian Observations Two distinct types of non-normality data: • Continuous: l estimated jointly with other model parameters (de Oliveira, Kedem and Short, 1997) • Count data: For example, in the bernoulli or poisson form standard approach: yt(s) ~ EF(mt(s)) spatio-temporal modeling issues: similar computations: harder
Non-Gaussian Evolution Abrupt changes in the process normality is not suitable Robust alternative: wt(.) ~ GP(gt,S,rq) is replaced by wt(.)| lt ~ GP(gt,lt-1S, rq) and lt ~ G(nt, nt), independent for t=1,...,T Therefore, wt(.) ~ tP(gt,S,rq) lt’s control the magnitude of the evolution
Final Comments • More flexibility to accommodate variations in time and space. • Static coefficient models: samples from the posterior were generated in the software BUGS, with interpolation made in FORTRAN • Extensions to observations in the exponential family and estimation of the normalizing transformation. • Extension to accommodate anisotropic processes to some components of the model.
A Latent Approach to the Statistical Analysis of Space-time Data Dani Gamerman Instituto de Matemática Universidade Federal do Rio de Janeiro Brasil http://acd.ufrj.br/~dani 17th International Workshop on Statistical Modelling Chania, Crete, Greece, 8-12 July 2002