1 / 170

Data Quality Indicators (DQIs) What are they, and how do they affect me? An US-EPA Approach

Data Quality Indicators (DQIs) What are they, and how do they affect me? An US-EPA Approach. P. A. S. R. C. C. DQIs Defined. DQIs are quantitative and qualitative measures of principal quality attributes; Precision; Bias; Representativeness; Comparability; Completeness; and

cicero
Download Presentation

Data Quality Indicators (DQIs) What are they, and how do they affect me? An US-EPA Approach

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Quality Indicators (DQIs) What are they, and how do they affect me? An US-EPA Approach P A S R C C

  2. DQIs Defined • DQIs are quantitative and qualitative measures of principal quality attributes; • Precision; • Bias; • Representativeness; • Comparability; • Completeness; and • Sensitivity. • Quantitative DQIs • Precision, bias, and sensitivity. • Qualitative DQIs • representativeness, comparability, and completeness.

  3. The Hierarchy of Quality Terms DQOs Qualitative and quantitative study objectives Attributes Descriptive qualitative and quantitative aspects of collected data DQIs Indicators of the quality attributes MQOs Acceptance criteria for the quality attributes measured by project DQIs

  4. Precision • Precision is the measure of agreement among repeated measurements of the same property under identical or substantially similar conditions. • A precision DQIis a quantitative indicator of the random errors or fluctuations in the measurement process. • e.g., standard deviation or variance

  5. Bias • Bias is systematic or persistent distortion of a measurement process that causes error in one direction. • A bias DQI is a quantitative indicator of the magnitude of systematic error resulting from: • biased sampling design; • calibration errors; • response factor shifts; • unaccounted-for interferences; and • chronic sample contamination. • e.g., instrument reads XX mg/L too high

  6. Accuracy • Accuracy is composed of precision and bias. • Accuracy is a measure of the overall agreement of a measurement to a known value: • when random errors are tightly controlled, bias dominates the overall accuracy; and • when random errors predominate, variance dominates the overall accuracy.

  7. Influence of Bias and Imprecision on Overall Accuracy Precise and biased Imprecise and unbiased Imprecise and biased Precise and unbiased

  8. Representativeness • Representativeness is the measure of the degree to which data suitably represent a characteristic of a population, parameter variations at a sampling point, a process condition, or an environmental condition. • Representativeness DQIs are qualitative and quantitative statements regarding the degree to which data reflect the true characteristics of a well defined population. • e.g., these samples are representative of surface soil to be found in a specific area of XX square meters.

  9. Comparability • Comparability is a qualitative expression of the measure of confidence that two or more data sets may contribute to a common analysis. • a comparability DQI is a qualitative indicator of the similarity of attributes of data sets. • e.g., soil salinity or soil acidity data sets are comparable as they share a common preparation and analytical method operated under similar conditions.

  10. Completeness • Completeness is a measure of the amount of valid data obtained from a measurement system, expressed as a percentage of the number of valid measurements that should have been collected. • the DQI for completeness is often expressed as a percentage. • e.g., the percentage of valid samples for which data for all analytes of interest were reported.

  11. Sensitivity • Sensitivity is the capability of a method or instrument to discriminate between measurement responses representing different levels of the variable of interest. • Sensitivity can be regarded as detection limit • but this term is often used without defining what is intended (minimum detection or quantitation). • A sensitivity DQI describes the capability of measuring a constituent at low levels. • a Practical Quantitation Level (PQL) describes the ability to quantify a constituent with known certainty. • e.g., a PQL of .05 mg/L for mercury represents the level where a precision of +/- 15% can be obtained.

  12. Verification • Data verification refers to the procedures needed to ensure that a set of data is a faithful reflection of all the processes and procedures used to generate the data. • verification involves the examination of objective evidence that the specified method, procedures, and contractual requirements were fulfilled.

  13. Validation • Data validation is an analyte and sample matrix-specific process to determine the analytical quality of a specific data set. • validation entails the inspection of data handling practices for deviations from consistency, the review of quality control (QC) information for deviations, assessment of deviations, and assignment of data qualification codes. • Validation can entail the examination of the data with respect to the QA Plan.

  14. Integrity • Lack of integrity affects all aspects of data interpretation, especially data used for decision making; and • Lack of integrity includes: • manipulation of QC measurements; • Dry-labbing (complete falsification of data); • manipulation of results during analysis; • failure to conduct required analytical steps; and • post-analysis alteration of results.

  15. After Verification and Validation • The set of data are then analyzed by comparing the results to the original objectives. In many cases this is a comparison of the results to the DQOs using data quality assessment; • Data quality assessment, a five step process: • Review of DQOs and sample design • Preliminary data review • Selection of statistical test • Verification of assumptions • Drawing conclusions from the data • But that is another course all together!

  16. Representativeness Statistical and Conceptual Model-Based Approaches

  17. Representativeness • Representativeness is the measure of the degree to which data suitably represent a characteristic of a population, parameter variations at a sampling point, a process condition, or an environmental condition: • Representativeness DQIs are qualitative and quantitative statements regarding the degree to which data reflect the true characteristics of a well- defined population.

  18. What Does "Representativeness" Mean? • Very vaguely defined in working English: • "seal of approval" by simple statement of writer. • there is an absence of biasing forces. • it is a miniature or replica of the population. • it is a typical or ideal case. • there is wide coverage of a population. • it enables good estimation. • it is good enough for the purposes of the study. • statistically based sampling method.

  19. Different definitions of "Representativeness” • "...expected to exhibit the average properties of the universe or whole" • "...should be selected on the basis of spatial and temporal representativeness" • "...samples should be representative of daily operations”

  20. Achieving Representativeness Involves a Process • Planning, design, and assessment • careful attention to measurement and analytical process; • consideration of the size (amount of material) and method for sample collection and handling; • determination of adequate type, location, timing, and number of samples to be taken; and • defensible approach for drawing inferences from sample data to the target population. • Sample design and measurement processes should minimize unintentional bias.

  21. The Process Involves Evaluating Both Micro and Macro Scales • Micro scale • how well measurements taken within a sampling unit reflect that unit • (e.g.,"parameter variations at a sampling point") • Macro scale • degree to which measurements from a set of sampling units reflect the population of interest • (e.g., "accurately and precisely represent a characteristic of a population")

  22. Micro Scale (Within-Sampling-Unit) Representativeness • An appropriate quality system to ensure quality implementation and sample integrity; • Carefully defined sampling units with correct sampling procedures and equipment; • Adequate sample support (amount of material) to make inferences about the characteristics within the sampling unit; and • Appropriate analytical methods (including sample preparation), designed to achieve MQOs for measurement precision, bias, and sensitivity.

  23. What is a Sampling Unit? • A sampling unit (SU) can be defined as the portion of the environment for which a measurement has meaning for its intended use. • Defining SUs for a project allows us to communicate more clearly about components of total-study precision.

  24. Specifying Sampling Units • SUs can vary depending on the specific problem; they can be: • as small as the physical sample itself; • something encompassing multiple physical samples; or • something much larger. • In classical survey design (e.g., opinion survey) the SU is typically an individual.

  25. Specifying Sampling Units • SUs are less well defined in other types of surveys (e.g., in a survey to determine soil salinity levels) • in this case, a soil sample is much smaller than the area it represents - is the sampling unit topsoil sample, a pedon, or the farm as a whole? • Consider how data will be used • average over multiple units, the spatial distribution of units, or some combination?

  26. Alternative Sampling Unit Definitions • Default SU Definition • equivalent to the physical sample (soil, water, or plant specimen) taken. • Alternative SU Definitions • units comprised of multiple samples to obtain enough of the medium to perform all desired analyses. • units of a size adequate to collect multiple specimens (such as composite samples). • units defined to include a group of samples when individual samples are not the unit of interest.

  27. Choice of Sampling Unit - What Does a Sample Represent? A small farm 7.5 cm core 1-ha area

  28. Sampling Theory: Within-SU Error • To what degree is heterogeneity within a sampling unit inherent? • Gy refers to this as the “constitution heterogeneity.” No amount of mixing or homogenization can reduce this. • Constitution heterogeneity leads to fundamental error • Fundamental errors are negligible for liquids and gases without suspended solids, but are significant in soil and any other solids.

  29. Sampling Theory: Within-SU Error • What is the distribution and variance between small increments of the media? • Gy refers to this as “distribution heterogeneity” which reflects the distribution of groups of some number of neighboring fragments. • Grouping and segregation errors result from distribution heterogeneity • minimize these errors by taking more increments to form a sample of the required weight

  30. Heterogeneity of Pollutants Can Lead to Sampling Errors • h1 = small scale (random fluctuations) • h2 = large scale (trends, nonrandom, bias) • h3 = cyclic phenomena • h = h1 + h2 + h3 • Each of these components of heterogeneity lead to errors • Experiments to characterize these components (using variograms) allow one to optimize a design

  31. Controlling Sampling Errors • Ensure the field sampling protocol does not distort or bias sample • should be capable of ensuring all parts of the media (e.g., all particle sizes) have the same probability of being included in the increment obtained to form a sample • Ensure the laboratory subsamples represent all the particle-size fractions • subsamples must be large enough (optimal sample weight to accommodate the range of particle sizes • samples and subsamples should be comprised of as many correctly obtained increments as possible

  32. Questions raised by Sampling Theory Related to “Within-SU Error” • What is the correct scale at which to sample? • What is the correct protocol for obtaining increments to form samples of the media of interest?

  33. Questions raised by Sampling Theory Related to “Within-SU Error” (cont.) • pilot studies needed to determine the nature of the heterogeneity. • If, for example, soil salinity areas are highly clustered on a scale smaller than the scale of real concern, small grabs will reveal varied results • If homogenization and sub-sampling does not remove clustering, representation of a sampling unit from a single sample will not be achievable • sampling protocols should be selected that do not alter the characteristics of the media (e.g., particle- size composition)

  34. Classical Statistical Approach • Define the population of interest • spatial and temporal boundaries; and • sampling units • Develop a statistical sampling plan • probability-based design, every sampling unit has a known probability of inclusion. • Evaluate process for drawing inferences from data • how well the sampling units selected represent the population under study; • how data will be used to estimate target population parameters such as the mean and variance; and • how well the sampled population provide information on subject in question.

  35. Strategies for Improving Within SU Representativeness • Utilize within-sampling-unit replication • can reduce the variability of the average by a factor of n-1/2. • Utilize within-sampling-unit compositing • increasing the number of increments in the sample reduces the variability of the unit average. • Increase the sample support area or volume • expanding the definition of what area or volume the analytical measurement will represent can alleviate small-scale (or short-term) fluctuations.

  36. Statistical Strategies for Improving Between-Sampling-Unit Representativeness • Statistical sampling schemes • simple random sampling • systematic (grid) sampling • stratified random sampling • ranked set sampling • cluster sampling • between-sampling-unit composite sampling

  37. Balanced Design to Achieve Representativeness • Understanding the relative contribution of within-sampling-unit and between- sampling-unit variance • focus on components of variance to which the total variability is most sensitive More samples to lower between- sampling-unit variance More precise measurements to lower within- sampling-unit variance

  38. Assessing Representativeness • Evaluating existing data • representativeness affects the degree to which a data set can be used for a purpose other than originally intended. • Use of a checklist • promotes a thorough evaluation of the attributes of representativeness. • Use of quality assessment samples such as dups, splits, or other replicates • can assist in answering questions about within-sampling-unit representativeness.

  39. Important Attributes (Micro-level) • Was a rationale provided to support the selection of sampling equipment and handling procedures? • correct choice of equipment and handling procedures directly affect degree to which the increments and samples reflect the characteristics of the matrix. • Was the rationale to support selection of analytical methods provided? • choice of sample preparation and analytical instrument is critical. • Were samples collected from all selected sampling units? • incomplete sampling, if biased due to the lack of completeness, can lead to incorrect conclusions

  40. Important Attributes (Macro-level) • Were study objectives adequately defined using the DQO process or equivalent planning process? • intended use of data provides the context for evaluating representativeness. • Was the population of interest clearly defined? • probability-based designs require the population to be defined as a set of sampling units. • Was the statistical basis for the sampling plan explained (number of samples, their allocation)? • representativeness hinges on adequate number of samples. • different sample allocation approaches can maximize effectiveness.

  41. Precision Indicators Reflective of the Data Collection Life Cycle Planning Implementation Assessment

  42. Precision • "Precision is the measure of agreement among repeated measurements of the same property under identical or substantially similar conditions." • properties in soil studies • concentration of a constituent, say nitrogen • physical measurement (e.g., grain size) of soil media • a precision DQI is a quantitative indicator of the random errors or fluctuations in the measurement process

  43. Common Indicators of Precision • Range • difference between largest and smallest values • Variance or standard deviation • a statistical measure of the spread of data calculated from two or more measured values • the standard deviation is the square root of the variance • Relative range • the Range divided by the mean of the data set • Relative standard deviation (CV) • the standard deviation calculated from two or more values divided by the mean of those values

  44. Framework for Evaluating Indicators of Precision • A simple model allows us to evaluate the components and indicators of total-study variability • within-sampling-unit variability: • measurement process • small-scale variability • sample acquisition • between-sampling-unit variability: • inherent spatial variability • sampling design error Total-Study Variability Within- Sampling-Unit Variability Between- Sampling-Unit Variability

  45. Simple Total-Study Variability Model Total-Study Variability Between- Sampling-Unit Variability Within- Sampling-Unit Variability Inherent Spatial Variability (among units) Small-Scale Variability (within unit) Sampling Design Error Sample Collection and Measurement Process Variability

  46. Sampling Units • A sampling unit (SU) can be defined as the portion of the natural environment (soil, water, plant) for which a measurement has meaning for its intended use. • Defining SUs for a soil, water or plant sampling project allows us to communicate more clearly about components of total-study precision.

  47. Specifying SUs • SUs can vary depending on the specific problem; they can be: • as small as the physical sample itself; • something encompassing multiple physical samples; or • something much larger. • In classical survey design (e.g., opinion survey) the SU is typically an individual. • SUs are less well defined in other types of surveys (e.g., in a survey to determine soil salinity levels). • in this case, a soil sample is much smaller than the individual - is the sampling unit the soil sample, the pedon, the farm, or the project? • Consider how data will be used. • average over multiple units, the spatial distribution of units, or some combination?

  48. Alternative Sampling Unit Definitions • Default SU Definition • equivalent to the physical sample taken • Alternative SU Definitions • units comprised of multiple samples to allow for obtaining enough of the medium to perform all desired analyses • units of a size adequate to collect multiple samples (such as collocated samples) • units uniquely defined to measure properties of interest when a sample is not the unit of interest, nearby samples are highly correlated, or there is an explicit desire to control the precision within the unit

  49. Evaluating Sampling Unit Definitions • Defining SUs larger than the physical sample has some potential benefits. • clarifies whether collocated samples should be treated as additional field samples or replicates; • forces to consider the scale at which measurements have meaning; and • facilitates a more comprehensive consideration of sources of error affecting our understanding of properties of interest, and sources of variability affecting individual measurements. • Most study designs do not account for within-sampling-unit variability in any explicit way. • tradeoffs between fewer precise measurements versus more imprecise measurements begin to address the issue.

  50. Sampling Theory Raises Important Questions Related to Within-SU Error • What is the correct scale at which to sample? • What is the correct protocol for obtaining samples? • pilot studies needed to determine the nature of the heterogeneity • if concentration of analytes is highly clustered on a scale smaller than the scale of real concern, small grabs will reveal varied results. • if homogenization and sub-sampling does not remove clustering, representation of a sampling unit from a single sample will not be achievable. • sampling protocols should be selected that do not alter the characteristics of the media (e.g., particle- size composition).

More Related