1 / 16

POOLED DATA DISTRIBUTIONS

National Research Conseil national Council Canada de recherches. POOLED DATA DISTRIBUTIONS. GRAPHICAL AND STATISTICAL TOOLS FOR EXAMINING COMPARISON REFERENCE VALUES Alan Steele, Ken Hill, and Rob Douglas National Research Council of Canada E-mail: alan.steele@nrc.ca.

yama
Download Presentation

POOLED DATA DISTRIBUTIONS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. National Research Conseil national Council Canada de recherches POOLED DATA DISTRIBUTIONS GRAPHICAL AND STATISTICAL TOOLS FOR EXAMINING COMPARISON REFERENCE VALUES Alan Steele, Ken Hill, and Rob Douglas National Research Council of Canada E-mail: alan.steele@nrc.ca • Measurement comparison data sets are generally summarized using a simple statistical reference value calculated from the pool of the participants’ results. Consideration of the comparison data sets, particularly with regard to the consequences and implications of such data pooling, can allow informed decisions regarding the appropriateness of choosing a simple statistical reference value. Graphs of the relevant distributions provide insight to this problem.

  2. Introduction • Comparison data collection and analysis continues to grow in importance among the tasks of international metrology • Sample distributions and populations are routinely considered when preparing the summary of the comparison • Reference values (KCRVs) are often calculated from the measurement data supplied by the participants • We believe that graphical techniques are an aid to understanding and communication in this field Steele, Hill, and Douglas: Pooled Data Distributions

  3. The Normal Approach • Generally, initial implicit assumption is to consider that all of the participants’ data, as xi/ui, represent individual samples from a single (normal) population • A coherent picture of the population mean and standard deviation can be built from the comparison data set that is fully consistent with the reported values and uncertainties • Most outlier-test protocols rely on this assumption to identify when and if a given laboratory result should be excluded, since its inclusion would violate this internal consistency Steele, Hill, and Douglas: Pooled Data Distributions

  4. Pooled Data Distributions • Creating pooled data distributions tackles this problem from the opposite direction • The independent distributions reported by each participant (through their value and uncertainty) are summed directly • Result is taken as representative of the underlying population as revealed in the comparison measurements • Monte Carlo methods are useful when calculations involve Student distributions or medians rather than means Steele, Hill, and Douglas: Pooled Data Distributions

  5. Monte Carlo Calculations • High quality linear congruent uniform random number generators are easy to find • Transformation from uniform to any distribution done via cumulative distribution • Example shows Student distribution transform • Our Excel Toolkit includes an external DLL for doing fast Monte Carlo simulations with multiple large arrays Steele, Hill, and Douglas: Pooled Data Distributions

  6. Dealing with Student Distributions • Student Cumulative Distribution Functions for different Degrees of Freedom (n = 2…10) • Note that the line at 97.5% cumulative probability crosses each curve at the coverage factor, k, appropriate for a 95% confidence interval Steele, Hill, and Douglas: Pooled Data Distributions

  7. Example Data From KCDB • Recent results for CCAUV.U-K1 • Low power, 1.9 MHz: 5 Labs • Finite degrees of freedom specified for all participants • Data failed consistency check using weighted mean • Median chosen as KCRV Steele, Hill, and Douglas: Pooled Data Distributions

  8. Statistical Distributions • Results of Monte Carlo simulation: • lab distributions used to resample comparison • pooled data histogram incremented once for each lab per event • mean, weighted mean, and median calculated for each event • Population revealed by measurement is multi-modal and evidently not normal Steele, Hill, and Douglas: Pooled Data Distributions

  9. Statistical Distributions • Results of Monte Carlo simulation: • lab distributions used to resample comparison • pooled data histogram incremented once for each lab per event • mean, weighted mean, and median calculated for each event • Population revealed by measurement is multi-modal and evidently not normal Steele, Hill, and Douglas: Pooled Data Distributions

  10. Advantages of Monte Carlo • Technique is simple to implement • Allows calculation of confidence intervals for statistics • Covariances can be accommodated in straightforward manner • Possible to include outlier rejection schemes • Easy to track quantities of interest, such as probability of a given participant being median laboratory • Can consider other candidate reference values Steele, Hill, and Douglas: Pooled Data Distributions

  11. Example: CCT-K3 Argon Point • Another example from KCDB • CCT-K3 Argon Triple Point • Large variation in reported values • Large variation in stated uncertainties • No KCRV was assigned, based on data pooling analysis Steele, Hill, and Douglas: Pooled Data Distributions

  12. Algorithmic Reference Values • Linear combinations of simple estimators can be used as robust estimators of location • For CCT-K3, proposal to use simple average of mean, weighted mean, and median • Evaluation of any such algorithmic estimator is easy to do with Monte Carlo Steele, Hill, and Douglas: Pooled Data Distributions

  13. Quantifying the Comparison • Calculating a reference value – typically the variance-weighted mean or the median - is a routine part of reporting comparisons • The suitability of these statistics for representing the data set can be checked using chi-squared testing • It is also possible to perform such tests without invoking a reference value by considering the data in pair wise fashion • Advantages of pair-statistics • Always works, even before choosing a reference value • More rigorous, since can handle correlations exactly • Explicit, following metrological chains of inference Steele, Hill, and Douglas: Pooled Data Distributions

  14. Pair-Difference Distributions • Similar to exclusive statistics • Consider difference between one lab and “rest of world” • Sum of per-lab differences is the all-pairs-difference (APD) distribution; this is symmetric • Width of APD is a measure of “global” quality assurance for independent calibration of an artifact by two different labs chosen at random Steele, Hill, and Douglas: Pooled Data Distributions

  15. Reduced Chi-Squared Testing • Normalizing the pair differences by the pair uncertaintiesallows us to build tests of the measurement capability claims • This is still independent of any chosen reference value • This All Pairs Difference reduced 2 has N-1 degrees of freedom • If a data setfails the APD 2 test, it will fail for every possible KCRV APD Steele, Hill, and Douglas: Pooled Data Distributions

  16. Conclusions • Monte Carlo technique is fast and simple to implement • Graphs provide a powerful tool for visual consideration of: • Pooled data (sum distribution) • Simple Estimators (mean, weighted mean, median) • Other Estimators (any algorithm can be used) • All-pairs reduced chi-squared statistic is egalitarian over participants, and independent of choice of KCRV • No single choice of KCRV can adequately represent a comparison that fails the all-pairs-difference chi-squared test Steele, Hill, and Douglas: Pooled Data Distributions

More Related