180 likes | 457 Views
Case Example: Using a Stratified Sampling Design & Field XRF to Reduce the 95% UCL for Residential Soil Lead. Deana Crumbling, EPA/OSRTI/TIFSD crumbling.deana@epa.gov 703-603-0643 2009 EPA Annual Quality Conference. What things increase the interval between the sample mean & UCL?.
E N D
Case Example: Using a Stratified Sampling Design & Field XRF to Reduce the 95% UCL for Residential Soil Lead Deana Crumbling, EPA/OSRTI/TIFSD crumbling.deana@epa.gov 703-603-0643 2009 EPA Annual Quality Conference
What things increase the interval between the sample mean & UCL? • High variability in data set • Data set is from a non-normal or non-parametric distribution • Small number of physical samples in the statistical sample What creates high data variability? • True changes in matrix concentrations across space • Inadequate soil sample homogenization • Artifact caused small analytical subsample mass
Variability as an artifact of small analytical sample mass As analytical sample volumes increase, data variability decreases & distribution goes from lognormal to normal (assumes whole sample is measured)
} Physical manipulation of sample, increase volume (MIS) and/or sufficient replicate analyses Reduce the UCL by addressing: • Variability artifacts • Non-normal statistical distributions • Small number of physical samples in the statistical sample • High variability due to true variation By procedures that support: • Sample homogenization • Increased sample mass • True changes in matrix concentrations across space
Can anything be done about true spatial variations in concentration? (Statistical) Stratified Sampling Design • Methods for Evaluating the Attainment of Cleanup Standards Volume 1: Soils and Solid Media”, 1989, section 6.4 http://www.cluin.org/download/stats/vol1soils.pdf • Guidance on Choosing a Sampling Design for Environmental Data Collection (EPA QA/G-5S), 2002, Chap 6. http://www.epa.gov/quality/qs-docs/g5s-final.pdf • Data Quality Assessment: Statistical Methods for Practitioners (EPA QA/G-9S), 2006, section 3.2.1.3 http://www.epa.gov/quality/qs-docs/g9s-final.pdf • Purpose: determine the overall mean & UCL for a decision unit (DU) when different sections of the DU have different means & standard deviations (SDs).
** 1100 1040 18 * 20 * 25 * 22 * 16 * 15 * 21 * “Dividing by 12” assumes equal weight is given to each sample (1/12th of total area) What Makes a Stratified Design Different? 120 * 184 * 155 * To calculate average over the entire area, routine practice is that data go straight into a database, and then… Sum(all) = 2736; then 2736 ÷ 12 = 228 ppm
1100 1040 * 5% of area; ave = 1070 * 120 * 184 * 155 * 18 * 20% of area ave = 153 20 * 25 * 75% of area ave = 20 22 * 16 * 15 * 21 * Area High Mid Low Routine Stratified Mean 1070 153 20 228 99 SD 42 32 4 398 80 95% UCL 434 (Δ=196) 143 (Δ=44) But the CSM supports partitioning the site into 3 distinct portions based on similar populations 20(0.75) + 153(0.20) + 1070(0.05) = 99 ppm A spatially weighted mean makes a difference!
Basic Principles of a Stratified Sampling Design The CSM is the basis for defining both the DU & its strata • Decision Unit (DU) = a unit for which a decision is made: a single drum, a batch of drums, risk exposure unit, remediation unit, etc. • The DU is the volume & dimensions over which an average conc is desired • Strata are created by different release or transport mechanisms – cause different contaminant patterns in within the DU • Target properties like conc level & variability differ from strata to strata w/in the DU
Basic Principles (cont’d) • DU is delineated (stratified) into non-overlapping subsections according to the CSM • Each stratum’s area/volume is recorded as a fraction of the DU’s area/volume • Each stratum’s conc mean & SD determined • The means & SDs are weighted and mathematically combined overall mean & UCL for the DU • Can apply stratification to data analysis even if not planned into sampling, but must have spatial info & final CSM available
Benefits of a Stratified Sampling Design • Small areas of very high or low conc do not bias the overall mean of the DU. • Reduces variability (SD) in the DU data set • Reduces statistical uncertainty (as distance between mean & UCL) • Preserves spatial information to identify source/transport mechanisms & support remedial design.
Case Example: XRF with stratified sampling design Properties in old town near Pb battery recycling plant XRF Pb data from bagged soil samples (~300 gram) Plastic bag of soil
Data Collection Design • Property divided into 3 sections (strata) • Front yard (likely “same” conc within & own SD) • Side yard (ditto) • Back yard (ditto) • Each stratum 5 ~equal subsections (sample units) • 1 grab (or MIS) sample (300-400 g) into plastic bag • 5 sample units/stratum or 15 sample units/DU (the EU) Decision Goals • Resolve confusion over past conflicting data. • Determine mean (95% UCL) for exposure unit (entire yard): 500 ppm risk-based A/L; if over, cleanup high contamination areas • Pb source? Suggested by spatial contaminant pattern (does facility have liability?)
{ Side Yard: 5 Bagged Samples { { Front Yard: 5 Samples Back Yard: 5 Samples House Footprint Preliminary CSM of Simplified Property Action Level (entire yard) = 500 ppm Area fraction = 0.25 Area fx = 0.15 Area fx = 0.60 Potential release: Traffic (facility truck, Pb gasoline); Pb house paint; facility’s atmospheric deposition; combination. Expected Pb conc: Higher. Potential release: Pb paint; atmos dep. Pb conc: Uncertain (near road, house?) Potential release: Pb paint (near structures); atmos dep. Expected Pb conc: Lower.
XRF Bag Analysis • 4 30-sec XRF readings on bag • (2 on front & 2 on back) • Results entered real-time into pre-programmed spreadsheet • Spreadsheet immediately calculates: • ave & SD for each bag • ave & SD within each strata (yard section), • ave & UCL for the decision unit (entire property). • the greater of within-bag vs. between-bag variability • IFstatistical uncertainty interferes w/ desired decision confidence for DU: • Use #4 & a series of decision trees to reduce statistical uncertain until confident decision possible
Minimizing Variability Improves Statistical Confidence in EPCs NOTE: “Routine” calculation applies same weighting to data points & database loses their spatial representativeness Note: ½ CI width = mean-to-UCL width
House Footprint Data Used to Mature the CSM Preliminary CSM: an informed hypothesis about strata boundaries Mature CSM: Data confirms or modifies hypothesis about strata boundaries
Progressive Data Uncertainty Management * Normal z-distribution used for the XRF instrument’s counting statistics, rest of rows use the t-distribution
Questions ? Deana M. Crumbling, M.S. U.S. EPA, Office of Superfund Remediation & Technology Innovation 1200 Pennsylvania Ave., NW (5203P) Washington, DC 20460 PH: (703) 603-0643 crumbling.deana@epa.gov www.triadcentral.org