450 likes | 615 Views
DQO Training Course Day 1 Module 4. How Many Samples do I Need? Part 1. Presenter: Sebastian Tindall. 60 minutes (15 minute 1st Afternoon Break). Topics to Discuss in Module 4. How many samples based on Census Sampling Types of decision error Definitions of common statistical terms.
E N D
DQO Training Course Day 1 Module 4 How Many Samples do I Need?Part 1 Presenter: Sebastian Tindall 60 minutes (15 minute 1st Afternoon Break)
Topics to Discuss in Module 4 • How many samples based on • Census • Sampling • Types of decision error • Definitions of common statistical terms
How Many Samples do I Need? n = (total $) ($ per sample) Quick & Dirty Method n = 5 Budget Method
How Many Samples do I Need? How will the data be used? It depends! What is the decision? What is the tolerance for mistakes? What is the underlying variation in the material being sampled?
How Many Samples do I Need? (The Real Answer) Just Enough!
How Many Samples do I Need? REMEMBER: HETEROGENEITY IS THE RULE!
Decisions with Absolute Certainty • Requires knowing the “true condition” of the population in question • Perform a census • Collect and analyze every possible member of the population in question
Decisions with Absolute Certainty (cont.) • Population • Universe of items (elements) within the spatial boundary • All the possible soil samples in the Smith’s backyard • All the people in the U.S.A. • Translation: you have to count/measure (sample) EVERY single member of the population
Football Field One-Acre 30'0" Football Field
Number of Samples in a One-Acre Field How many surface soil samples can I take from a one-acre field? The perimeter of a one-acre field measures 272.25 feet by 160 feet. If one surface soil sample = 2.5” x 2.5” x 6” deep, then…. ...there are = 1,000,000 possible surface soil samples in a one-acre field.
Cost of Sampling Entire One-Acre Field How much would it cost to know the true condition of the one-acre field? If it costs $3000 to test one surface soil sample, it would cost$3,000,000,000 to test all possible population units.
Testing All Possible Samples CENSUS • Testing all possible population units (samples) is the ONLY way to know the true condition of the site with absolute certainty • However, time and money considerations usually prevent us from doing this
Decisions with Absolute Certainty • Perform a census • totally impractical • Therefore, we can never make a decision with absolute certainty • So what’s left to do?
Testing a Few Samples(from the larger population) ESTIMATION • Estimates of the true condition of the site are usually made from a few (representative) samples • Taking a few samples (making a few measurements) and using them to represent the site • Make inferences (even sweeping claims) about the population of interest based on these few samples
The Process of Estimation • An estimate is just an educated guess based on incomplete information • Educated guesses will be wrong, to some degree • In other words, the process of estimation contains inherent errors
Estimation Errors Are unavoidable! • Are NOT mistakes. They do not suggest that anything was done improperly • Are an inherent part of the process of estimation • Are simply deviations from the true condition of the site • Introduce uncertainty into the decision-making process
Consequences of Uncertainty Estimation Errors Decision Errors • Decision errors are true mistakes • Examples: • Walking away from a dirty site • Cleaning up a clean site • Decision errors can be managed
Decision Errors • Are acceptable or tolerable …within limits • We set tolerable limits on the percentage of time we are willing to: • Walk away from a dirty site • Clean up a clean site
Where do errors occur? Planning Sampling Analysis Data Vs Decision
Population Everyone or everything of interest Example: All the people in this class Sample Some subset of the population Example: Five people randomly chosen from the class Definition of Terms
Population Parameter The true value of the population characteristic (e.g., age) that can only be known if all possible samples are measured Example: true mean age of all the people in the class, calculated using data from every member of the population Sample Statistic The estimated value of the population characteristic that is calculated from sample data Example: estimate of the true mean age of all people in the class, calculated using data from a subset (sample) of the population Definition of Terms
Population Parameter Represents “true condition” of the population Decisions can be made with 100% certainty (0% uncertainty) Sample Statistic Represents “estimated condition” of the population Decision cannot be made with 100% certainty Comparison
What is the true mean age in this class? What is the estimated mean age in this class? Randomly select 5 ages 2nd estimated mean age in this class? Randomly select 15 ages (See Computer Age Demo) Class Question?
True Mean Age of All the People in This Class • In this case - where we are only interested in measuring a small group of people who are all in the same room at the same time - it is not too difficult to determine the true mean age with 100% certainty. But: • What if some people failed to respond? • What if some people “fudged” a little? • What if some of the response forms got lost?
Types of Decision Errors • Before we can talk about acceptable limits for making decision errors, we must first understand what correct decisions and decision errors look like and define some terms • There are two types of correct decisions and two types of decision errors that can be made
Graph of Perfect Decision Making 1.0 0.5 0.0 Ideal Decision Rule Chance of Deciding Site is Dirty 6 pCi/g Action Level Low True Mean 226Ra concentration High
Graph of Typical Decision Making 1.0 0.5 0.0 Typical Curve Chance of Deciding Site is Dirty 6 pCi/g Action Level Low True Mean 226Ra Concentration High
Null Hypothesis: The Site is dirty. True State of Site Site is clean Site is dirty The Gray Region 1.0 Probability of deciding that the site is dirty Typical Curve 0.5 0.0 75 100 Lower Bound of Gray Region Action Level True mean COPC Concentration Decision Performance Goal Diagram Walk away from site Clean up site Alternative Action
Action Level UCL 1A UCL 1B X A 75 110 100 95 Decision-Making Procedure: Apply Decision Rule PSQ Is Site clean? Is Site dirty? ∞ DL 95 UCL% COPC Concentration Walk away from site Clean up site Alternative Action
Action Level X B UCL B 110 120 100 Decision-Making Procedure: Apply Decision Rule PSQ Is Site clean? Is Site dirty? ∞ DL 95 UCL% COPC Concentration Walk away from site Clean up site Alternative Action
True Mean Sample Mean UCL Deviation Decision-Making Procedure: Apply Decision Rule PSQ Conclusion: Site is dirty. Is Site clean? Is Site dirty? Action: Clean up a dirty site. A correct decision. ∞ DL 100 Action Level 95 UCL% COPC Concentration Walk away from site Clean up site Alternative Action
True Mean Sample Mean UCL Deviation Decision-Making Procedure: Apply Decision Rule PSQ Conclusion: Site is clean. Is Site clean? Is Site dirty? Action: Walk away from a dirty site. An incorrect decision. ∞ DL 100 Action Level 95 UCL% COPC Concentration Walk away from site Clean up site Alternative Action
True Mean Sample Mean UCL Deviation Decision-Making Procedure: Apply Decision Rule Conclusion: Site is clean. PSQ Is Site clean? Is Site dirty? Action: Walk away from a clean site. A correct decision. ∞ DL 100 Action Level 95 UCL% COPC Concentration Walk away from site Clean up site Alternative Action
True Mean Sample Mean UCL Deviation Decision-Making Procedure: Apply Decision Rule PSQ Conclusion: Site is dirty. Is Site clean? Is Site dirty? Action: Clean up a clean site. An incorrect decision. ∞ DL 100 Action Level 95 UCL% COPC Concentration Walk away from site Clean up site Alternative Action
True Mean Sample Mean UCL Deviation The Gray Region Null Hypothesis: The Site is dirty. True State of Site Site is clean Site is dirty When the True Mean is well above the Action Level... 1.0 Probability of deciding that the True Mean is greater that or equal to the Action Level ... then there should be high a probability that the Sample Mean UCL will also be above the Action Level... 0.5 ... and it is highly likely that we will correctly decide to clean up a dirty site. 0.0 Lower Bound of GrayRegion 75 100 Action Level True mean COPC Concentration Walk away from site Clean up site Alternative Action
True Mean Sample Mean UCL Deviation Null Hypothesis: The Site is dirty. The Gray Region True State of Site If the True Mean is well below the Lower Bound of the Gray Region... ... then there should be a very low probability that the Sample Mean UCL will be above the Action Level... Site is clean Site is dirty 1.0 Probability of deciding that the site is dirty 0.5 0.0 Lower Bound of GrayRegion 75 100 Action Level True mean COPC Concentration ... and it is highly unlikely that we will incorrectly decide to clean up a clean site. Walk away from site Clean up site Alternative Action
True Mean Sample Mean UCL Deviation Null Hypothesis: The Site is dirty. The Gray Region True State of Site ... then there is an increased probability that the Sample Mean UCL will be above the Action Level... When the True Mean is IN the gray region….. Site is clean Site is dirty 1.0 Probability of deciding that the site is dirty 0.5 ... and that we will agree to incorrectly decide to clean up a clean site. 0.0 Lower Bound of GrayRegion 75 100 Action Level True mean COPC Concentration Walk away from site Clean up site Alternative Action
Null Hypothesis: The Site is dirty. True State of Site Site is clean Site is dirty 1.0 Typical Curve The Gray Region 0.5 Probability of deciding that the site is dirty 0.0 Lower Bound of Gray Region 75 100 Action Level True mean COPC Concentration Decision Performance Goal Diagram Walk away from site Clean up site Alternative Action
Unnecessary Disposal and/or Cleanup Cost Threatto Public Healthand Environment Sampling and Analyses Cost Sampling and Analyses Cost $ $ $ $ Managing Uncertainty is a Balancing Act PRP 1 Focus Regulatory 1 Focus
Key Points • We will never know the true condition of the site - time and money prevent this • Therefore we must estimate the true condition through sampling • Estimates based on samples are not factual statements about the site. They are educated guesses • Estimates must be in error - because they use incomplete information
Key Points (cont.) • Errors are not mistakes - just deviations from the truth • Errors (deviations) introduce uncertainty into the decision-making process • Errors and uncertainty can be managed so that you can still get the job done and prove that you did it
Key Points (cont.) • The DQO Process is designed to help you manage uncertainty and: • Get the job done efficiently • Prove that you did it defensibly
Primary Benefit of the DQO Process: Managing uncertainty through systematic planning. “FAILING TO PLAN….. IS PLANNING TO FAIL”
How Many Samples do I Need? REMEMBER: HETEROGENEITY IS THE RULE!
End of Module 4 Thank you Summary of Parts 1, 2, 3 will be at the end of Module 6 Questions? We will now take a 15 minute break. Please be back in 15 minutes.