Application of Statistical Techniques to Interpretation of Water Monitoring Data

Application of Statistical Techniques to Interpretationof Water Monitoring Data Eric Smith, Golde Holtzman, and Carl Zipper

Outline I. Water quality data: program design (CEZ, 15 min) II. Characteristics of water-quality data (CEZ, 15 min) III. Describing water quality(GIH, 30 min) IV. Data analysis for making decisions A, Compliance with numerical standards (EPS, 45 min) Dinner Break B, Locational / temporal comparisons (“cause and effect”) (EPS, 45) C, Detection of water-quality trends (GIH, 60 min)

III. Describing water quality(GIH, 30 min) • Rivers and streams are an essential component of the biosphere • Rivers are alive • Life is characterized by variation • Statistics is the science of variation • Statistical Thinking/Statistical Perspective • Thinking in terms of variation • Thinking in terms of distribution

The present problem is multivariate • WATER QUALITY as a function of • TIME, under the influence of co-variates like • FLOW, at multiple • LOCATIONS

Water Variable Time in Years WQ variable versus time

Bear Creek below Town of Wise STP

Univariate WQ Variable Water Quality Time

Water Quality Water Quality Water Quality Water Quality Water Quality Water Quality Water Quality Water Quality Water Quality Water Quality Water Quality Water Quality Time Univariate WQ Variable

Univariate Perspective, Real Data (pH below STP)

The three most important pieces of information in a sample: • Central Location • Mean, Median, Mode • Dispersion • Range, Standard Deviation, Inter Quartile Range • Shape • Symmetry, skewness, kurtosis • No mode, unimodal, bimodal, multimodal

Central Location: Sample Mean • (Sum of all observations) / (sample size) • Center of gravity of the distribution • depends on each observation • therefore sensitive to outliers

Central Location: Sample Median • Center of the ordered array • I.e., the (0.5)(n + 1) observation in the ordered array. • If sample size nis odd, then the median is the middle value in the ordered array. • Example A: • 1, 1, 0, 2 , 3 • Order: • 0, 1, 1, 2, 3 • n = 5, odd • (0.5)(n + 1) = 3 • Median = 1 • If sample size nis even, then the median is the average of the two middle values in the ordered array. • Example B: • 1, 1, 0, 2, 3, 6 • Order: • 0, 1, 1, 2, 3, 6 • n = 6, even, • (0.5)(n + 1) = 3.5 • Median = (1 + 2)/2 = 1.5

Central Location: Sample Median • Center of the ordered array • depends on the magnitude of the central observations only • therefore NOT sensitive to outliers

Central Location: Mean vs. Median • Mean is influenced by outliers • Median is robust against (resistantto) outliers • Mean “moves” toward outliers • Median represents bulk of observations almost always Comparison of mean and median tells us about outliers

Dispersion • Range • Standard Deviation • Inter-quartile Range

Dispersion: Range • Maximum - Minimum • Easy to calculate • Easy to interpret • Depends on sample size (biased) • Therefore not good for statistical inference

0 5 -1 +1 Dispersion: Standard Deviation 0 1 2 SD = 1 1 -1 3 SD = 2 0 5 -2 +2

Dispersion: Properties of SD • SD > 0 for all data • SD = 0 if and only if all observations the same (no variation) • Familiar Intervals for a normal distribution, • 68% expected within  1  SD, • 95% expected within  2  SD, • 99.6% expected within  3  SD, • Exact for normal distribution, ballpark for any distn • For any distribution, nearly all observations lie within  3  SD

Mean = 7.6 SD = 0.41 Median = 7.6 Interpretation of SD n = 200

Quartiles, Percentiles, Quantiles, Five Number Summary, Boxplot

Quartiles (undergrad classes) E.g., Sample: 0, −3.1, −0.4, 0, 2.2, 5.1, 3.8, 3.8, 3.9, 2.3, n = 10 Note: Quartiles Q0, Q1, Q2, Q3, Q4, = Quantiles Q0.00, Q0.25, Q0.50, Q0.75, Q1.00

5-Number Summary and Boxplot (undergrad perspective)

Terminology Warning:Quartiles, a.k.a. Percentiles, a.k.a. Quantiles Note: Quartiles Q0, Q1, Q2, Q3, Q4, = Quantiles Q0.00, Q0.25, Q0.50, Q0.75, Q1.00

Terminology Warning:But Percentiles and Quantiles are more general Note: Quartiles Q0, Q1, Q2, Q3, Q4, = Quantiles Q0.00, Q0.25, Q0.50, Q0.75, Q1.00

Quantile Location and Quantilesby weighted averages (graduate classes) E.g., Sample: 0, −3.1, −0.4, 0, 2.2, 5.1, 3.8, 3.8, 3.9, 2.3, n = 10 Example: Find the 20th percentile of the sample above. Step 1: q = 0.20, n =10 L= 0.20(10 + 1) = 2.2 indicating the “2.2th “ observation in the ordered array. Step 2: Therefore the 0.20 quantile is a weighted average of the 2nd and 3rd observations in the ordered array, which are a = − 0.4, b = 0 and the weight is w = 0.2 Q = -0.4 + 0.2(0 – (– 0.4)) = – 0.40 + 0.08= – 0.32

Quantile Location and Quantilesby weighted averages (graduate classes) E.g., Sample: 0, −3.1, −0.4, 0, 2.2, 5.1, 3.8, 3.8, 3.9, 2.3, n = 10 Step 2: a= − 0.4, b = 0, w = 0.2 Q = a + w(b – a) = – 0.4 + 0.2(0 – (– 0.4)) = – 0.4 + 0.2(0.4) = – 0.40 + 0.08 = – 0.32 0.4 0 – 0.4 – 0.32

Quantile Location and Quantiles Example: 0, − 3.1, − 0.4, 0, 2.2, 5.1, 3.8, 3.8, 3.9, 2.3, n = 10

5-Number Summary and Boxplotusing weighted averages for quantiles Note slightly different results by using weighted averages.

Dispersion: IQRInter-Quartile Range • (3rd Quartile - (1st Quartile) • Robust against outliers

n = 200 Mean = 7.6 SD = 0.41 Median = 7.6 IQR = 0.54 For a Normal distribution, Median  2IQR includes 99.3% Interpretation of IQR

Shape: Symmetry and Skewness • Symmetry mean bilateral symmetry

Shape: Symmetry and Skewness • Symmetry mean bilateral symmetry • Positive Skewness (asymmetric “tail” in positive direction)

Shape: Symmetry and Skewness • “Symmetry” mean bilateral symmetry, skewness = 0 • Mean = Median (approximately) • Positive Skewness (asymmetric “tail” in positive direction) • Mean > Median • Negative Skewness (asymmetric “tail” in negative direction) • Mean < Median Comparison of mean and median tells us about shape

Bear Creek below Town of Wise STP

Outliers Whisker 75th %-tile = 3rd Quartile Median IQR 25th %-tile = 1st Quartile Whisker Outlier Box Plot

Wise, VA, below STP pH TKN mg/l

Wise, VA below STP BOD (mg/l) DO (% satur)

Wise, VA below STP Fecal Coliforms Tot Phosphorous (mg/l

Application of Statistical Techniques to Interpretation of Water Monitoring Data

Application of Statistical Techniques to Interpretation of Water Monitoring Data

Presentation Transcript

Four categories （ type of data ） in spatial statistical techniques

Acquisition and Interpretation of Water-Level Data

Interpretation Techniques

APPLICATION OF WAVELET BASED FUSION TECHNIQUES TO PHYSIOLOGICAL MONITORING

Application of statistical methods for the comparison of data distributions

Statistical Analysis of Data

Application of Statistical Techniques to Interpretation of Water Monitoring Data

Statistical Evaluation of Data

Statistical interpretation of ECMWF EPS

Interpretation of data

Water Quality Data Interpretation

Interpretation of data

Application of Statistical Techniques to Neural Data Analysis

The Statistical Interpretation of Entropy

Statistical Interpretation of Entropy

Interpretation of Data

Categorical data analysis: An overview of statistical techniques

Recap: statistical interpretation of radiation

Application of statistical methods for the comparison of data distributions