1 / 48

Application of Statistical Techniques to Interpretation of Water Monitoring Data

Application of Statistical Techniques to Interpretation of Water Monitoring Data. Eric Smith, Golde Holtzman, and Carl Zipper. Outline. I. Water quality data: program design (CEZ, 15 min) II. Characteristics of water-quality data (CEZ, 15 min) III. Describing water quality(GIH, 30 min)

nydia
Download Presentation

Application of Statistical Techniques to Interpretation of Water Monitoring Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Application of Statistical Techniques to Interpretationof Water Monitoring Data Eric Smith, Golde Holtzman, and Carl Zipper

  2. Outline I. Water quality data: program design (CEZ, 15 min) II. Characteristics of water-quality data (CEZ, 15 min) III. Describing water quality(GIH, 30 min) IV. Data analysis for making decisions A, Compliance with numerical standards (EPS, 45 min) Dinner Break B, Locational / temporal comparisons (“cause and effect”) (EPS, 45) C, Detection of water-quality trends (GIH, 60 min)

  3. III. Describing water quality(GIH, 30 min) • Rivers and streams are an essential component of the biosphere • Rivers are alive • Life is characterized by variation • Statistics is the science of variation • Statistical Thinking/Statistical Perspective • Thinking in terms of variation • Thinking in terms of distribution

  4. The present problem is multivariate • WATER QUALITY as a function of • TIME, under the influence of co-variates like • FLOW, at multiple • LOCATIONS

  5. Water Variable Time in Years WQ variable versus time

  6. Bear Creek below Town of Wise STP

  7. Univariate WQ Variable Water Quality Time

  8. Water Quality Water Quality Water Quality Water Quality Water Quality Water Quality Water Quality Water Quality Water Quality Water Quality Water Quality Water Quality Time Univariate WQ Variable

  9. Univariate Perspective, Real Data (pH below STP)

  10. The three most important pieces of information in a sample: • Central Location • Mean, Median, Mode • Dispersion • Range, Standard Deviation, Inter Quartile Range • Shape • Symmetry, skewness, kurtosis • No mode, unimodal, bimodal, multimodal

  11. Central Location: Sample Mean • (Sum of all observations) / (sample size) • Center of gravity of the distribution • depends on each observation • therefore sensitive to outliers

  12. Central Location: Sample Mean • (Sum of all observations) / (sample size) • Center of gravity of the distribution • depends on each observation • therefore sensitive to outliers

  13. Central Location: Sample Mean • (Sum of all observations) / (sample size) • Center of gravity of the distribution • depends on each observation • therefore sensitive to outliers

  14. Central Location: Sample Mean • (Sum of all observations) / (sample size) • Center of gravity of the distribution • depends on each observation • therefore sensitive to outliers

  15. Central Location: Sample Mean • (Sum of all observations) / (sample size) • Center of gravity of the distribution • depends on each observation • therefore sensitive to outliers

  16. Central Location: Sample Mean • (Sum of all observations) / (sample size) • Center of gravity of the distribution • depends on each observation • therefore sensitive to outliers

  17. Central Location: Sample Median • Center of the ordered array • I.e., the (0.5)(n + 1) observation in the ordered array. • If sample size nis odd, then the median is the middle value in the ordered array. • Example A: • 1, 1, 0, 2 , 3 • Order: • 0, 1, 1, 2, 3 • n = 5, odd • (0.5)(n + 1) = 3 • Median = 1 • If sample size nis even, then the median is the average of the two middle values in the ordered array. • Example B: • 1, 1, 0, 2, 3, 6 • Order: • 0, 1, 1, 2, 3, 6 • n = 6, even, • (0.5)(n + 1) = 3.5 • Median = (1 + 2)/2 = 1.5

  18. Central Location: Sample Median • Center of the ordered array • depends on the magnitude of the central observations only • therefore NOT sensitive to outliers

  19. Central Location: Sample Median • Center of the ordered array • depends on the magnitude of the central observations only • therefore NOT sensitive to outliers

  20. Central Location: Sample Median • Center of the ordered array • depends on the magnitude of the central observations only • therefore NOT sensitive to outliers

  21. Central Location: Sample Median • Center of the ordered array • depends on the magnitude of the central observations only • therefore NOT sensitive to outliers

  22. Central Location: Sample Median • Center of the ordered array • depends on the magnitude of the central observations only • therefore NOT sensitive to outliers

  23. Central Location: Sample Median • Center of the ordered array • depends on the magnitude of the central observations only • therefore NOT sensitive to outliers

  24. Central Location: Mean vs. Median • Mean is influenced by outliers • Median is robust against (resistantto) outliers • Mean “moves” toward outliers • Median represents bulk of observations almost always Comparison of mean and median tells us about outliers

  25. Dispersion • Range • Standard Deviation • Inter-quartile Range

  26. Dispersion: Range • Maximum - Minimum • Easy to calculate • Easy to interpret • Depends on sample size (biased) • Therefore not good for statistical inference

  27. 0 5 -1 +1 Dispersion: Standard Deviation 0 1 2 SD = 1 1 -1 3 SD = 2 0 5 -2 +2

  28. Dispersion: Properties of SD • SD > 0 for all data • SD = 0 if and only if all observations the same (no variation) • Familiar Intervals for a normal distribution, • 68% expected within  1  SD, • 95% expected within  2  SD, • 99.6% expected within  3  SD, • Exact for normal distribution, ballpark for any distn • For any distribution, nearly all observations lie within  3  SD

  29. Mean = 7.6 SD = 0.41 Median = 7.6 Interpretation of SD n = 200

  30. Quartiles, Percentiles, Quantiles, Five Number Summary, Boxplot

  31. Quartiles (undergrad classes) E.g., Sample: 0, −3.1, −0.4, 0, 2.2, 5.1, 3.8, 3.8, 3.9, 2.3, n = 10 Note: Quartiles Q0, Q1, Q2, Q3, Q4, = Quantiles Q0.00, Q0.25, Q0.50, Q0.75, Q1.00

  32. 5-Number Summary and Boxplot (undergrad perspective)

  33. Terminology Warning:Quartiles, a.k.a. Percentiles, a.k.a. Quantiles Note: Quartiles Q0, Q1, Q2, Q3, Q4, = Quantiles Q0.00, Q0.25, Q0.50, Q0.75, Q1.00

  34. Terminology Warning:But Percentiles and Quantiles are more general Note: Quartiles Q0, Q1, Q2, Q3, Q4, = Quantiles Q0.00, Q0.25, Q0.50, Q0.75, Q1.00

  35. Quantile Location and Quantilesby weighted averages (graduate classes) E.g., Sample: 0, −3.1, −0.4, 0, 2.2, 5.1, 3.8, 3.8, 3.9, 2.3, n = 10 Example: Find the 20th percentile of the sample above. Step 1: q = 0.20, n =10 L= 0.20(10 + 1) = 2.2 indicating the “2.2th “ observation in the ordered array. Step 2: Therefore the 0.20 quantile is a weighted average of the 2nd and 3rd observations in the ordered array, which are a = − 0.4, b = 0 and the weight is w = 0.2 Q = -0.4 + 0.2(0 – (– 0.4)) = – 0.40 + 0.08= – 0.32

  36. Quantile Location and Quantilesby weighted averages (graduate classes) E.g., Sample: 0, −3.1, −0.4, 0, 2.2, 5.1, 3.8, 3.8, 3.9, 2.3, n = 10 Step 2: a= − 0.4, b = 0, w = 0.2 Q = a + w(b – a) = – 0.4 + 0.2(0 – (– 0.4)) = – 0.4 + 0.2(0.4) = – 0.40 + 0.08 = – 0.32 0.4 0 – 0.4 – 0.32

  37. Quantile Location and Quantiles Example: 0, − 3.1, − 0.4, 0, 2.2, 5.1, 3.8, 3.8, 3.9, 2.3, n = 10

  38. 5-Number Summary and Boxplotusing weighted averages for quantiles Note slightly different results by using weighted averages.

  39. Dispersion: IQRInter-Quartile Range • (3rd Quartile - (1st Quartile) • Robust against outliers

  40. n = 200 Mean = 7.6 SD = 0.41 Median = 7.6 IQR = 0.54 For a Normal distribution, Median  2IQR includes 99.3% Interpretation of IQR

  41. Shape: Symmetry and Skewness • Symmetry mean bilateral symmetry

  42. Shape: Symmetry and Skewness • Symmetry mean bilateral symmetry • Positive Skewness (asymmetric “tail” in positive direction)

  43. Shape: Symmetry and Skewness • “Symmetry” mean bilateral symmetry, skewness = 0 • Mean = Median (approximately) • Positive Skewness (asymmetric “tail” in positive direction) • Mean > Median • Negative Skewness (asymmetric “tail” in negative direction) • Mean < Median Comparison of mean and median tells us about shape

  44. Bear Creek below Town of Wise STP

  45. Outliers Whisker 75th %-tile = 3rd Quartile Median IQR 25th %-tile = 1st Quartile Whisker Outlier Box Plot

  46. Wise, VA, below STP pH TKN mg/l

  47. Wise, VA below STP BOD (mg/l) DO (% satur)

  48. Wise, VA below STP Fecal Coliforms Tot Phosphorous (mg/l

More Related