210 likes | 284 Views
Re-testing data when the part exceeds the whole. Elgin Perry, Ph.D. Statistics consultant and Bill Romano MD Dept. of Natural Resources. Analytical Methods and Quality Assurance Workgroup. 21September 2007. The issue.
E N D
Re-testing data when the part exceeds the whole Elgin Perry, Ph.D. Statistics consultant and Bill Romano MD Dept. of Natural Resources Analytical Methods and Quality Assurance Workgroup 21September 2007
The issue • On occasion, labs report results where PO4 exceeds TDP, or NO23 exceeds TDN • Labs typically re-test these results if the difference (e.g., PO4 – TDP) exceeds the sum of the method detection limits (PO4 + TDP or NO23 + TDN) indicating that the difference exceeds measurement error • This is equivalent to saying confidence intervals for the two measurements do not overlap.
Alternatives to sum of MDLs • A slightly more powerful test for difference can be obtained by computing a z-score. • The difference can be converted to a z-score by dividing by the standard deviation of the difference (z= x1-x2/ δdiff) • The z-score can be compared to a normal probability table to determine if the difference exceeds measurement error
Important assumption • For both the sum of MDL test and the z-score test, an important assumption is that measurement precision remains constant • If true, then the variance estimate used to create the MDL can be used for either method
MDL calculation • MDLs are calculated from seven aliquots of a low-level sample that are processed through the entire analytical method • The standard deviation (s) of the aliquots is calculated from the analytical results and then multiplied by t(n-1, 0.99), or 3.143 • Using MDLs assumes that the standard deviation of low and high concentration samples is the same for both the sum of MDL method and the z-score test method
A recent example…. • NO23 = 2.712 • TDN = 2.697 • NO23 – TDN = 0.015 • Sum of MDLs = 0.003 + 0.006 = 0.009 • Difference exceeds sum of MDLs • z-score = 7 (a really big number!) • Percent difference = 0.56 (a really small number)
Re-measurement of same sample • NO23 = 2.779 • TDN = 2.788 • TDN is greater than NO23, as it should be, in the replicate analysis of the same sample • Thus the NO23 greater than TDN in the first analysis was probably measurement error
Examine constant variance assumption • The preceding example calls into question the assumption of constant variance • To examine this question use graphical analysis of lab replicate data.
Standard deviation versus the mean of NO23 replicates Standard deviation increases as concentration increases The standard deviation of the replicate mean increases as the mean concentration increases.
Standard deviation versus the mean of TDN replicates Standard deviation fairly constant as concentration increases
Standard deviation versus the mean of PO4 replicates Standard deviation increases as concentration increases
Standard deviation versus the mean of TDP replicates Standard deviation increases as concentration increases
Testing the constant variance assumption • Increasing variance was computed as a step function of the mean concentration • The step point was chosen from the graphs to differentiate between high and low variance groups • The standard deviation of each group was estimated from the root mean square error • An f-ratio was calculated to test for equivalent variance between groups
Testing for constant variance In all cases the degrees of freedom exceeded 100.
Histogram of NO23 residuals for low variance group The distribution of the residuals is symmetric about the mean, but heavy-tailed.
Plot of the standard normal probability density function From: Engineering Statistics Handbook http://www.itl.nist.gov/div898/handbook/eda/section3/eda3661.htm
Normal probability plot of NO23 residuals for low variance group
Histogram of NO23 residuals for high variance group The distribution of the residuals is symmetric about the mean, but heavy-tailed.
Normal probability plot of NO23 residuals for high variance group
Comparing the methods • Using a z-score of 2 would require re-testing the most pairs of data (most conservative, most work) • Using a z-score of 3 would require re-testing fewer pairs of data • Using the sum of MDLs would require re-testing still fewer data • The stratified variance approach appears to require re-testing the least amount of data
Recommendations • MDLs underestimate the measurement error of higher concentrations • Heavy tailed distributions suggest that big differences between reps should be trimmed before estimating precision • Use z-score test based on stratified variance estimates. e. g.,