Statistics in WR: Session 20

Statistics in WR: Session 20 Introduction to Spatial Statistics Ernest To

Outline • Basics of spatial statistics • Kriging • Application of spatial-temporal statistics (Gravity currents in CCBay) Ernest To 20090408

Basics

Consider the following scenario • Two river stations, A and B, measure dissolved oxygen (DO). • At station A • mean DO = µA = 5 mg/L • std dev at Station A= σA = 2 mg/L • At station B • mean DO = µB = 5 mg/L • std dev at Station A= σB = 2 mg/L • Correlation between measurements at stations A and B = ρAB = 0.5. A B Ernest To 20090408

New data! • We collected a DO measurement of 2 mg/L at Station A. • What is the updated mean (µB|XA ) and standard deviation (σB|XA) at Station B? • (assume that the DO distributions are normal) • µA = 5 mg/L • σA = 2 mg/L • New sample • X A = 2 mg/L A • µB = 5 mg/L • σB = 2 mg/L • µB|XA = ? • σB|XA = ? B Ernest To 20090408

Let’s sketch out the distributions • Distributions at A and B (assume normal) • Joint distribution at A and B f(xA) f(xB) XA XB • µA = 5 mg/L, σA = 2 mg/L • µB = 5 mg/L, σB = 2 mg/L f(xA,xB) XA Ernest To 20090408 XB

Marginal and joint distributions f(xA) f(xA,xB) XA f(xB) XA XB Ernest To 20090408 XB • µA = 5 mg/L, σA = 2 mg/L • µB = 5 mg/L, σB = 2 mg/L

How does ρAB affect the shape of the joint distribution? Scatter plots of XA vs XB • ρAB = 0.99 • ρAB = -0.99 • ρAB = 0.5 • ρAB = 0 XA XA XA XA XA XA XA XB XB XB XB XB XB XB f(xA,xB) XA XB Joint distribution of XB and XA Ernest To 20090408

Bayesian conditioning Prior pdf (joint distribution) XA PRIOR STAGE XB CONDITIONALIZATION STAGE Observed data is used to update the distribution. xA = 2 mg/L XA XB POSTERIOR STAGE A conditional pdf for XB is generated. Prior pdf xA = 2 mg/L XA Conditional pdf Ernest To 20090408 XB

Conditional pdf Prior pdf If the prior pdf is binormal, the conditional pdf is also normal with: Mean = Variance = xA = 2 mg/L XA XB Conditional pdf XB|XA (The variance is independent of XA or XB Homoscedasticity) Ernest To 20090408 Expected value of conditional pdf is a linear function of the conditioning data

Back to the problem Updated mean and std. dev at Station B Mean Std. dev • µA = 5 mg/L • σA = 2 mg/L • New sample • X A = 2 mg/L A • µB = 5 mg/L • σB = 2 mg/L • µB|XA = 3.5 mg/L • σB|XA = 1.7 mg/L B Ernest To 20090408

Can we do the same for any two points on the river? Yes we can…. But under following conditions • Normality • 2nd order stationarity: • Mean does not change with location • Variance does not change with location • Know the mean and variance. • Have a function that determines the correlation between two locations A • µ = 5 mg/L • σ = 2 mg/L B Ernest To 20090408

Modeling correlation In spatial statistics, correlation is modeled as a function of the separation distance between two points Where h = separation distance (aka lag). Most of the time, correlation decreases with distance. (Things that are closer together tend to be more correlated with each other). Ernest To 20090408

Estimating correlation model from data Imagine the case where we have a smattering of data along an axis. Any given pair of data points, i and j, will have two properties: • The semivariance = γ = 0.5*(Zi-Zj )2 2. The separation distance = hij hij = separation distance Data point j Measured value =Zj Data point i Measured value =Zi Ernest To 20090408

Estimating correlation model from data We can plot the semivariance, γ , of all possible pairs against the lag, h. This gives us a variogram. Ernest To 20090408

Estimating correlation model from data We can fit a curve through the semivariogram to model the semivariance as a function of the lag. This is the variogram model. Ernest To 20090408

Estimating correlation model from data We can fit a curve through the semivariogram to model the semivariance as a function of the lag. This is the variogram model. sill range Ernest To 20090408

Estimating correlation model from data Assuming that mean and variance do not change with location (assumption of stationarity), the variogram model is related to the covariance model by the equation: C(h) Where σ2 is the variance Ernest To 20090408

Estimating correlation model from data Assuming that variance does not change with location (assumption of stationarity), the correlation model is related to the covariance model model by the equation : ρ(h) 1 .8 .6 .4 .2 Ernest To 20090408

How does the correlation model affect the estimation • ρAB = 0 • ρAB = 0.5 • ρAB = 0.99 Scatter plots of XA vs XB XA XB XA XA f(xA,xB) XA XA Joint distribution of XA and XB XB XB XB XB XA XB Conditional distribution of XB|XA XB|XA Increasing h Ernest To 20090408

Kriging

Multivariable case What if we have more than one location that provide conditioning data? (Assume distributions are STILL normal at all locations). • At station A1, A2, A3, A4 • µA1 = µA2 = µA3 = µA4 = 5 mg/L • σA1 = σA2 = σA3 = σA4 = 2 mg/L • At station B • mean DO = µB = 5 mg/L • std dev at Station A= σB = 2 mg/L • ρ =f(h)= 0.0125h2 - 0.225h + 1 A1 A2 A3 A4 B Ernest To 20090408

Modeling correlation ρ =f(h)= 0.0125h2 - 0.225h + 1 Distance along river (in hundred meters) 2 2 2 2 B A4 A3 A2 A1 From correlation model: ρA1B = 0.0, ρA2B = 0.1, ρA3B = 0.3, ρA4B = 0.6; ρA1A2 = 0.6, ρA1A3 = 0.3, ρA1A4 = 0.1, ρA2A3 = 0.6, ρA2A4 =0.3 , ρA3A4 = 0.6 Ernest To 20090408

Dealing with multiple variables Divide locations into two groups: • The vector, , representing the set of random variables at the locations contributing the conditioning data. • The variable, ,representing the random variable at the point of estimation. A1 A2 A3 A4 B Ernest To 20090408

Concept 1. If individual distributions are normal, joint pdf is multi-normal. 2. Group variables into two: one for points with data, one for the point of estimation. XB XA1 XA4 XA2 XA3 Prior pdf 3. Intersect pdf with conditioning data to get conditional pdf. Ernest To 20090408 Conditional pdf

Dealing with multiple variables The updated mean and variance of the distribution at Station B are given by: Mean: Variance: Where: A1 A2 A3 A4 B Ernest To 20090408

Equations in multivariable case are more generalized Recall two variable case • Multivariable case takes into account • Correlation between data locations and estimated location ( ). • Correlation among data locations ( ). • This is the most fundamental form of kriging, i.e. Simple Kriging. Multivariable case Conditional pdf Ernest To 20090408

Plug and Chug • Recall that Cov(A,B) = ρAB σA σ B • Compute data to data correlation: Ernest To 20090408

Plug and Chug • Compute data to estimation point correlation: Ernest To 20090408

Plug and Chug weights Note: The weights attributed to each station are determined by the prior (joint distribution) among them. Ernest To 20090408

Weights = [λ1, λ2, λ3,… λn] Plug and Chug weights Note: The weights attributed to each station are determined by the prior (joint distribution) among them. Ernest To 20090408

Plug and Chug Ernest To 20090408

Results from Simple Kriging The updated mean and standard deviation of the distribution at Station B are: Mean: Standard deviation: A1 A2 A3 A4 B Ernest To 20090408

Other forms of kriging • Ordinary kriging (OK) • Does not require mean to be known • Assumes that mean is constant and is somewhere in the range of the conditioning data • Universal kriging (UK) • Does not require mean to be known nor require it to be constant • User specifies a model for the trend in mean. UK will then fit the model to the data. • Indicator kriging (IK) • handles binary variables (0 or 1) • has ability to take care of non-normality in data through iterative application. • Co-kriging (CK) • takes into account a related secondary variable to help estimate the primary variable. Ernest To 20090408

Extension to 2D, 3D • The lag can be represented by the euclidean distance between 2 points • So the covariance model of the form, C = f(h), can still be used • Variables may be more correlated in one direction than the other (anisotropy) • linear transformation can be performed to transform the distances so the correlation distance is the same in all directions (isotropy) Ernest To 20090408

Extension to space-time • For space and time, there is no standard space-time metric. • The form: • is not always correct because the temporal and spatial axes are not always orthogonal to each other. • Processes that happen in time usually have some dependency on processes that happen in space. • (They are not independent). • A separate temporal lag term is usually used • The covariance function takes on the form: Ernest To 20090408

Application(Gravity currents in Corpus Christi Bay)

Sensors in Corpus Christi Bay TCOON stations TCEQ stations Corpus Christi Bay Oso Bay Gulf of Mexico Laguna Madre Ernest To 20090408 Aerial photo from Google Earth USGS gages SERF stations HRI stations

Ernest To 20090408

Selecting a study area depressions ridges ? ? ? - 5.0 m above Mean High Water Level - 4.5 m above Mean High Water Level Oso Bay - 4.0 m above Mean High Water Level - 3.5 m above Mean High Water Level West Laguna Madre - 2.5 m above MeanHigh Water Level East Laguna Madre - 2.0 m above Mean High Water Level - 1.5 m above Mean High Water Level Ernest To 20090408 - 1.0 m above Mean High Water Level channel

Downstream of East Laguna Madre Plume tracking survey July 14 to 17, 2006. (While gravity current was on the move) Ben Hodges University of Texas at Austin Water quality data July 12 and 18, 2006. (At birth and demise of gravity current) Paul Montagna Texas A&M University, Corpus Christi Ernest To 20090408

Synthesis of data salinity salinity salinity salinity salinity salinity salinity salinity salinity salinity salinity salinity 0 0 0 0 0 0 0 0 0 0 0 0 depth depth depth depth depth depth depth depth depth depth depth depth t = 0 t = 2 t = 3 t = 1 Direction of flow Synthesis Ernest To 20090408 Salinity profiles collected at various locations and time Time history of gravity current along direction of flow

HydroGet interface Acquired data in ArcHydro II Time Series Table HRI stations Data Preparation 1. Salinity data from HRI are acquired using HydroGet (a GIS web service client) and combined with plume tracking data. 2. Data locations are projected onto a reference line following the general direction of flow. • Space-time kriging is performed in 3 dimensions • X= Longitudinal measure • (meters from origin point) • Y =Time • (days since 7/12/2006) • Z =Elevation • (meters from water surface) Reference line Origin x = 0 m Ernest To 20090408

Variogram along direction of flow where h= lag distance along direction of flow C0= nugget = 2 psu2 C1= sill = 3.6 psu2 a = range = 6000 m (Gaussian variogram model) Ernest To 20090408

Variogram along direction of flow where h= lag distance along direction of flow C0= nugget = 2 psu2 C1= sill = 3.6 psu2 a = range = 6000 m (Gaussian variogram model) sill nugget range Ernest To 20090408

Variogram along depth where h= lag distance along direction of flow C0= nugget = 0 psu2 C1= sill = 3.6 psu2 a = range = 1.7 m (Gaussian variogram model) Ernest To 20090408

Variogram along time axis where h= lag distance along direction of flow C0= nugget = 0 psu2 C1= sill = 3 psu2 a = range = 1 day (Spherical variogram model) Ernest To 20090408

Interpolation results N LEGEND 37 – 40 psu 40 – 42 psu 42 – 43 psu 42 – 44 psu 44 – 46 psu Elevation Longitudinal profile on 7/13/2006 18:00 z Time Distance to origin point N Longitudinal profile on 7/12/2006 18:00 y Ernest To 20090408 x

Statistics in WR: Session 20

Statistics in WR: Session 20

Presentation Transcript

Why Statistics?

Descriptive Statistics Introduction to Summary Statistics

What’s Up with dbms_stats?

Spatial Statistics III

Thy Will Be Done: What the Bible Says About the Will of God and the Will of Man

Review of Statistics 101

Chapter Eight: Using Statistics to Answer Questions

Session One Slides

Bivariate Statistics and Linear Regression

As much as I can say about Statistics in 60 minutes …

Statistics

Welcome to Session 6!

Lectures ( Biostatistics)

Session Beans Objectives

Descriptive Statistics

BASIC STATISTICS For the HEALTH SCIENCES Fifth Edition

Adventures in S ocial Stochastics

Statistics

BUSINESS STATISTICS

Univariate Statistics