470 likes | 602 Views
ES 470 SAMPLING AND ANALYSIS OF HYDROLOGICAL DATA Manoj K. Shukla, Ph.D. Assistant Professor Environmental Soil Physics. FEBRUARY 09, 2006, (W147, 3 - 5 PM). J. H. Dane G.C. Topp (Editors) Methods of Soil Analysis- Part 4, Physical Methods. ES-470. Scales of Variability.
E N D
ES 470 SAMPLING AND ANALYSIS OF HYDROLOGICAL DATA Manoj K. Shukla, Ph.D. Assistant Professor Environmental Soil Physics FEBRUARY 09, 2006, (W147, 3 - 5 PM)
J. H. Dane G.C. Topp (Editors) Methods of Soil Analysis- Part 4, Physical Methods ES-470
Scales of Variability Particles or Pore Aggregate Molecules Column or Horizon Field or Watershed Regional Pedosphere ES-470
Variability Spatial: variability with increasing distance (space) from a location Temporal: variability with increasing duration/time We will limit our discussion to field scale ES-470
Agriculture Field ??? • In situ soil exhibits large degree of variability or heterogeneity • Changes in soil types need to be accounted for in the composite sampling • The composite sample must maintain the heterogeneity of the insitu soil ES-470
Sources Intrinsic Factors: Soil forming factors, time, soil texture, mineralogy, pedogenesis (geological, hydrological, biological factors) The intrinsic variables have a distinct component that can be called regionalized, i.e., it varies in space, with nearby areas tending to be alike Extrinsic Factors: Land use and management, fertilizer application, other amendments, drainage, tillage ES-470
Structure of Variability Random sampling is done to ensure that estimates are unbiased Meet the criterion of independent sampling under identical conditions Yi = m + ei where Yi is the realization of a soil attribute at location i, m is the mean value for the spatial domain, and ei is a random error term ES-470
An attribute (i.e., bulk density, nitrate concentration, etc.) is described through two statistical parameters E [Yi] = m First moment or Mean E [(Yi - m)2] = s2 Second moment or Variance ES-470
E [Yi] = m E [(Yi - m)2] = s2 Mean and variance or first and second moment are often assumed to be the parameters of a normal (Gaussian) probability distribution function; and Allow for a series of sophisticated statistical analysis Arithmetic mean = m = (x1 + x2 + x3) / 3 Geometric mean = m = (x1* x2* x3)1/3 Harmonic mean = m = (1/x1 + 1/x2 + 1/x3)* (1/n) Variance (s2) = (1/n) * ∑(xi – xm)2 ES-470
Soil N content data Mean = 1.35 g kg-1 Variance = 0 E [Yi] = 1.35 Mean = 1.339 g kg-1 Variance = 0.0003 E [Yi] = 1.339 ± (0.0003)0.5 ES-470
Normal (Gaussian) Distribution Mean The function is symmetric about the mean, it gains its maximum value at the mean, the minimum value is at plus and minus infinity ES-470
Histogram for Sand Content Sigma Plot 8.0 Normal distribution ES-470
Histogram for Saturated Hydraulic Conductivity Skewed distribution- Positive ES-470
Skewed distribution- Positive Skewed distribution- negative One of the tail is longer than other- Distribution is skewed ES-470
Different Data Structures ES-470
So in place of E [Yi] = m E [Yi] = m + b(xi) + ei An Appropriate model Where b(xi) can be a constant or a function, both dependent on a spatial or temporal scale Therefore, simple randomization may not be sufficient Stratified sampling will be better Stratified sampling- the area is divided into sub areas called strata ES-470
Case Study • Formulate objectives • Formulate hypotheses • Design a sampling scheme • Collect data • Data Interpretation Objective: Determine the relative magnitude of statistical and spatial variability at Field scale ES-470
Sampling Design? • Simple random • Stratified • Two-stage • Cluster • Systematic 4 1 2 -3 5 ES-470
How many samples? Sample size for simple random sampling Relative error should be smaller than a chosen limit (r) Where m1-a/2 = (1-a/2) quartile of the standard normal distribution; S- standard deviation of y in the area; is mean Standard deviation or coefficient of variation is known Absolute error to be smaller than a chosen limit d Time and Resources ???? ES-470
Students t-table df = degree of freedom; p is probability level ES-470
Example data of N concentration: 1.10, 1.11, 1.12, 1.13, 1.13, 1.14, 1.16, 1.17, 1.19, 1.20, 1.23, 1.24, 1.25 Relative error = 0.01 g kg-1 Mean of Y = 1.17 g kg-1 Standard deviation = 0.05 Alpha = 0.05 Degree of freedom = 13-1 = 12 t Students (table) = 1.782 ES-470
Relative error (r) = 0.02 g kg-1 Alpha = 0.10 Degree of freedom = 13-1 = 12 T Students (table) = 1.782 r = 0.02 r = 0.01 ES-470
E(Yi)s = Ym Var(Yi) =0 Deterministic parameters Variation in properties Stochastic parameters Mean value and an uncertainty statistics Variance Semi variogram function Var(Yi)s = s2s Var[(Yi)s-(Yi+h)s]= 2g(h) • It is always implied: • Domain is first- or second- order stationary • Process is adequately characterized by a mean value and an uncertainty statistics ES-470
We will use a data collected on a grid of 20 x 20 cm in a field seeded to grass for last 20 years ES-470
Variability can be expressed by coefficient of variation • Where: • x = an individual value • n = the number of test values • = the mean of n values Standard deviation of two independent sets where: n1 = number of values in the first set; s1 = standard deviation of the first set of values; n2 = number of values in second set; s2 = standard deviation of second set of values ES-470
Statistical variability of soil properties at local scale Water Transmission Textural Coefficient of variation (CV) AWC- Available water content (cm) VTP - Volume of transport pores (qs-q6) (%) VSP - Volume of storage pores (%) ic - Steady state infiltration rate (cm/min) Ks - Sat. hydraulic conductivity (cm/min) I - Cumulative infiltration (cm) I5 - Infiltration rate at 5 min (cm/min) ES-470 Shukla et al. 2004
Descriptive statistics (or CV) cannot discriminate between intrinsic (natural variations) and extrinsic (imposed) sources of variability Geostatistical analysis- grid based or spatial sampling For example-20 m x 20 m ES-470
Range (a) Partial Sill (C1) ArC View Variowin Nugget (C0) Lag (h; m) Pannatier, 1996 ES-470
Note: • g increases with increasing lag or separation distance • A small non-zero value may exist at g = 0 • This limiting value is known as nugget variance • It results from various sources of unexplained errors, such as measurement error or variability occurring at scales too small to characterize given the available data • At large h, many variograms have another limiting value • This limiting value is known as sill • Theoretically, it is equal to the variance of data • The value for h where sill occurs is known as range ES-470
Variogram • The most common function used in geostatistical studies to characterize spatial correlation is the variogram • The variogram, g(h), is defined as one-half the variance of the difference between the sample values for all points separated by the distance h where var [ ] indicate variance and E { } expected value ES-470
Estimator for the variogram is calculated from data using where N(h) is total number of pairs of observations separated by a distance h. Caution- variograms can be strongly affected by outliers in the data ES-470
Variogram Model • Variogram model is a mathematical description of the relationship between the variance and the separation distance (or lag), h • There are four widely used equations ES-470
Isotropic Models Linear Model Spherical Model Exponential Model Gaussian Model ES-470
Sill C0, Nugget a, Range Linear Model Spherical Model Does not have a sill or range and the variance is undefined Precisely defined sill or range ES-470
b ~ a/3 b ~ a/30.5 Exponential Model Gaussian Model Range is 1/3 of the range for spherical model Range is 1/sqrt(3) of the range for spherical model ES-470
Variogram is constructed by • Calculating the squared differences for each pair of observations (xj - xk) • Determining the distance between each pair of observation • Averaging the squared differences for those pairs of observations with the same separation distance If observations are evenly spaced on a transect, separation distances are multiple of the smallest distance h1 = 2 m; h2 = 4m; h3 = 6 m …… ES-470
When observations are placed on an irregular pattern, variograms are : • constructed by assigning appropriate lag interval • Binning procedure • B ins are created with interval centers at distances • h1 = (1-2) m; h2 = (2-4) m; h3 =(4-6) m ………………….. ES-470
Important considerations when calculating a variogram: • As separation distance becomes too large, spurious results occur because fewer pairs of observation exist for large separations due to finite boundary • Width of lag interval can affect the sample variogram due to number of samples and variation in the separation distances that fall into a particular lag interval • Uncorrelated and correlated data show different nugget effects • Number of datasets used influence on variogram ES-470
Before you start spatial analysis: Check for normal distribution WSA- water stability of aggregates (%) sand- sand content (%) Ic- saturated hydraulic conductivity (cm/h) ES-470
Use of descriptive statistics Mean, median (most middle), skewness, etc. ES-470
Plot the data to see the structure Y Saturated Hydraulic Conductivity X ES-470
Estimator variance Example Variance = 13.7 Variance = 15.5 Variance = 16.1 ES-470
Sand Content Saturated Hydraulic Conductivity ES-470
Modeling of Variogram Sand Content Spherical Model SS = 0.04598 Nugget = 0 Range = 37.92 m Sill = 16.0 Spherical Model SS = 0.00994 Nugget = 3.04 Range = 49.77 m Sill = 16.0 ES-470
Saturated Hydraulic Conductivity Spherical Model SS = 0.0494 Nugget = 0 Range = 19.8 m Sill = 0.0384 Spherical Model SS = 0.04938 Nugget = 0.004 Range = 19.8 m Sill = 0.0384 ES-470
Parameters for spherical variogram model for soil properties ES-470
Spatial variability: nugget – total sill ratio (NSR) Lower NSR – higher spatial dependence Water Transmission Textural Nugget to total sill ratio NSR < 0.25 highly spatial variable NSR > 0.75 less spatial variable Cambardella et al., 1994 Shukla et al. 2004 ES-470