1 / 49

Key Concepts Underlying DQOs and VSP

DQO Training Course Day 1 Module 3. Key Concepts Underlying DQOs and VSP. Presenter: Sebastian Tindall. 120 minutes (75 minute lunch break). Key Points. Have fun while learning key statistical concepts using hands-on illustrations

rhea-cook
Download Presentation

Key Concepts Underlying DQOs and VSP

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DQO Training Course Day 1 Module 3 Key Concepts Underlying DQOs and VSP Presenter: Sebastian Tindall 120 minutes (75 minute lunch break)

  2. Key Points • Have fun while learning key statistical concepts using hands-on illustrations • This module prepares the way for a more in-depth look at the DQO Process and the use of VSP

  3. Schedule Health Risk Sampling Cost Remediation Cost Decision Error Waste Disposal Cost Compliance TheBigPicture

  4. Unnecessary Disposal and/or Cleanup Cost Threatto Public Healthand Environment Sampling and Analyses Cost Sampling and Analyses Cost $ $ $ $ Managing Uncertainty is a Balancing Act PRP 1 Focus Regulatory 1 Focus

  5. Balance in Sampling Design The statistician’s aim in designing surveys and experiments is to meet a desired degree of reliability at the lowest possible cost under the existing budgetary, administrative, and physical limitations within which the work must be conducted. In other words, the aim is efficiency--the most information (smallest error) for the money. Some Theory of Sampling, Deming, W.E., 1950

  6. Visual Sample Plan Our Methodology:Use Hands-On Illustrations of... • Basic statistical concepts needed for VSP and the DQO Process • Using...

  7. Our Methodology:Use Hands-On Illustrations of... • Basic statistical concepts needed for VSP and the DQO Process • Using Coin flips • Pennies • Demo #1 • Demo #2 • Quarter

  8. How Many SamplesShould We Take? 5? 50?

  9. How Many Times Should I Flip a Coin Before I Decide it is Contaminated (Biased Tails)? One tail, 50% Six tails, 1.6% Two tails, 25% Seven tails, 0.8% Three tails, 12.5% Eight tails, 0.4% Four tails, 6% Nine tails, 0.2% Five tails, 3% Ten tails, 0.1%

  10. Football Field One-Acre 30'0" Football Field

  11. Example Problem • A 1-acre field was contaminated with mill tailings in the 1960s • Cleanup standard: • “The true mean 226Ra concentration in the upper 6” of soil must be less than 6.0 pCi/g.” • There is a good chance that actual true mean 226Ra concentration is between 4.0 and 6.0 pCi/g

  12. Example Problem (cont.) • Historical data suggest a standard deviation of 1.6 pCi/g • It costs $1000 to collect, process, and analyze one sample • The maximum sampling budget is $5,000

  13. Simplified Decision Process • Take some number of samples • Find the sample average 226Ra concentration in our samples • If we pass the appropriate QA/G-9 test, decide the site is clean • If we fail the appropriate QA/G-9 test, decide the site is dirty

  14. Color Ra-226, pCi/g Clear 3 White 4 Green 5 Red 6 Dark Yellow 7 Blue 8 Black 9 Marbles

  15. Example of Ad Hoc Sampling Design and the Results • Suppose we choose to take 5 samples for various reasons: low cost, tradition, convenience, etc. • Need volunteer to do the sampling • Need volunteer to record results • We will follow QA/G-9 One-Sample t-Test directions using an Excel spreadsheet

  16. One-Sample t-Test Equation from EPA’s Practical Methods for Data Analysis, QA/G-9 Calculated t = (sample mean - AL) ------------------------ std. dev/sqrt(n)If calculatedt is less than table value, decide site is clean

  17. 4 - 6 = -2 X X X X 5 - 6 = -1 Comparing UCL to Action Level is Like Student’s t-Test UCL = 4 UCL = 5 UCL = 7 7 - 6 = 1 UCL = 8 8 - 6 = 2 2 3 4 5 6 7 8 Action Level True Mean 226Ra Concentration

  18. t-test UCL - upper confidence limit AL - action level N - target population n - population units sampled  - population mean x - sample mean  - population standard deviation s - sample standard deviation Frequency distribution Histograms H0 - null hypothesis  - Alpha error rate  - Beta error rate Gray Region LBGR  - width of Gray Region Coefficient of Variation Relative Standard Deviation Learn the Jargon

  19. t-test Calculated t = (sample mean - AL) ------------------------If calculatedt is less than table value, decide site is clean

  20. Upper Confidence Limit, UCL For a 95% UCL and assuming sufficient n:If you repeatedly calculate sample means for many independent random sampling events from a population, in the long run, you would be correct 95% of the time in claiming that the true mean is less than or equal to the 95% UCL of all those sampling events.Note: Different s will produce different UCLs

  21. Upper Confidence Limit, UCL More commonly, but some experts dislike:For a single, one-sided UCL, you are 95% confident that the true mean is less than or equal to your calculated UCL.(The true mean is bracketed by, in our case, is usually zero) and the UCL.)(See Hahn and Meeker in Statistical Intervals A Guide for Practitioners, p. 31).

  22. Action Level A measurement threshold value of the Population Parameter (e.g., true mean) that provides the criterion for choosing among alternative actions.

  23. N Target Population: The set of N population units about which inferences will be madePopulation Units: The N objects (environmental units) that make up the target or sampled population n The number of population units selected and measured is n

  24. 10 x 10 FieldPopulation = All 100 Population Units

  25. 10 x 10 FieldPopulation = All 100 Population UnitsSample = 5 Population Units 1.5 1.9 2.3 1.7 1.5

  26. N n Xi Xi i = 1 i = 1 Population MeanmThe average of all N population units Sample MeanThe average of the n population units actually measured

  27. Population Standard DeviationsThe average deviation of all N population units from the population mean Sample Standard DeviationsThe “average” deviation of the n measured units from the sample mean

  28. Spatial Distribution - Football Field

  29. Probability Density Function

  30. SHOW Histogram File

  31. SHOW VDT Step by Step Histogram File

  32. The Null HypothesisH0The initial assumption about how the true mean relates to the action levelExample: The site is dirty. (We’ll assume this for the rest of this discussion)

  33. The Alternate HypothesisHAThe alternative hypothesis isaccepted only when there is overwhelming proof that the Null condition is false.

  34. Null Hypothesis = Site is Dirty The Alpha Error Rate (on Type 1 or False + errors)aThe chance of deciding that a dirty site is clean when the true mean is greater than or equal to the action level

  35. The Alpha Error Rate (on Type 1 or False + Errors) A false positive decision or Type 1 error occurs when a decision-maker rejects the null hypothesis (calls it false) when H0 is actually true. The size of the error is expressed as a probability, usually referred to as Alpha (a). This error occurs when the data (sample result x-bar or UCL) indicates that the site is clean when the true mean is actually at or above the Action Level. In other words, the Alpha error is the probability that your sample result is below the Action Level when the true means is actually at or above the Action Level. That probability is usually set to between 1-5%. (Null Hypothesis = Site is Dirty) α

  36. The Alpha Error Rate (on Type 1 or False + Errors) A false positive decision or Type 1 error occurs when a decision-maker rejects the null hypothesis (calls it false) when H0 is actually true. The size of the error is expressed as a probability, usually referred to as Alpha (a). This error occurs when the data (sample result x-bar or UCL) indicates that the site is dirty when the true mean is actually at or below the Action Level. In other words, the Alpha error is the probability that your sample result is above the Action Level when the true mean is at or below the Action Level. That probability is usually set to between 5-1%. (Null Hypothesis = Site is Clean) α

  37. Null Hypothesis = Site is Dirty The Beta Error Rate (on Type 2 or False - errors)bThe chance of deciding a clean site is dirty when the true mean is equal to the lower bound of the gray region (LBGR)

  38. The Beta Error Rate (on Type 2 or False – Errors) A false negative decision or Type 2 error occurs when a decision-maker accepts the null hypothesis (calls it true) when H0 is actually false. The size of the error is expressed as a probability, usually referred to as Beta (β). This error occurs when the data (sample result x-bar or UCL) indicates that the site is dirty when the true mean is actually below the Action Level. In other words, the Beta error is the probability that your sample result is at or above the Action Level when the true mean is actually below the Action Level. That probability is negotiated and set to between 1-50%. (Null Hypothesis = Site is Dirty) β

  39. The Beta Error Rate (on Type 2 or False – Errors) A false negative decision or Type 2 error occurs when a decision-maker accepts the null hypothesis (calls it true) when H0 is actually false. The size of the error is expressed as a probability, usually referred to as Beta (β). This error occurs when the data (sample result x-bar or UCL) indicates that the site is clean when the true mean is actually above the Action Level. In other words, the Beta error is the probability that your sample result is at or below the Action Level when the true mean is actually above the Action Level. That probability is negotiated and set to between 1-20%. (Null Hypothesis = Site is Clean) β

  40. Action Level LBGR µ:α µ:β 100 Evaluate Alpha & Beta Errors AlphaError BetaError 0 ∞ 75 True Mean Concentration

  41. Gray RegionGray Region = AL – LBGR A range of values of the population parameter of interest (such as the true mean contaminant concentration, ) where the consequences of making a decision error are relatively minor.

  42. Gray Region & LBGRGray Region = AL – LBGR The Gray Region is bounded on one side by the action level, and on the other side by the parametervalue where the consequences of decision error beginsto be significant. This point is labeled LBGR, whichstands for Lower Bound of the Gray Region.

  43. The Width of Gray Region D =AL –m1Width of GR = AL – LBGRThe Lower Bound of the Gray Region (m1) is defined as the hypothetical true mean concentration where the site should be declared clean with a reasonably high probability. (Null Hypothesis = Site is Dirty)

  44. The Width of Gray Region D = m1– ALWidth of GR = UBGR – ALThe Upper Bound of the Gray Region (m1) is defined as the hypothetical true mean concentration where the site should be declared dirty with a reasonably high probability. (Null Hypothesis = Site is Clean)

  45. Coefficient of Variation: CV = s / x-bar If CV > 1, not Normal Relative Standard Deviation: RSD (%) = CV * 100 If RSD > 100%, not Normal

  46. SHOW VST File for Coefficient of Variation and RSD

  47. Decisions about populationparameters, such as the true mean, m, and the true standard deviation, , are based on statistics such as the sample mean, , and the sample standard deviation, s. Since these decisions are based on incomplete information, they will be in error. Summary

  48. End of Module 3 Thank you Questions? We will now take a 75 minute lunch break. Please be back in 1 hour and 15 minutes.

More Related