DATA & STATISTICS 101

DATA & STATISTICS 101 Presented by Stu Nagourney NJDEP, OQA

Precision, Accuracy and Bias • Precision: Degree of agreement between a series of measured values under the same conditions • Accuracy: Degree of agreement between the measured and the true value • Bias: Error caused by some aspect of the measurement system

Precision, Accuracy and Bias

Sources of Error • Systematic Errors: Bias always in the same direction, and constant no matter how many measurements are made • Random Errors: Vary in sign and are unpredictable. Average to 0 if enough measurements are made • Blunders: The occasional mistake that produces erroneous results; can be minimized but never eliminated

Applying Statistics • One cannot sample every entity of an entire system or population. Statistics provides estimates of the behavior of an entire system or population, provided that: • Measurement system is stable • Individual measurements are all independent • Individual measurements are random representatives of the system or population

Distributions • Data generated by a measurement process generally have the following properties: • Results spread symmetrically around a central value • Small deviations from the central value occur more often than large deviations • The frequency distribution of a large amount of data approximates a bell-shaped curve • The mean of even small sets of data represent the overall better than individual values

“Normal” Distribution

Other Distributions

Issues with Distributions • For large amounts of data, distributions are easy to define. For smaller data sets, it is harder to define a distribution. • Deviations from “normal” distributions: • Outliers that are not representative of the population • Shifts in operational characteristics that skew the distribution • Large point-to-point variations that cause broadening

Estimation of Standard Deviation • The basic parameters that characterize a population are • Mean () • Standard Deviation () • Unless the entire population is examined,  and  cannot be known. They can only be estimated from a representative sample by • Sample Mean (X) • Estimate of Standard Deviation (s)

Measures of Central Tendency & Variability • Central Tendency: the value about which the individual results tend to “cluster • Mean: X =  [X1 + X2 + X3 + … Xn] / n • Median: Middle value of an odd number of results when listed in order • s = [(Xi - X)2 / n-1]1/2

Measures of Central Tendency & Variability

Statistics • If you make several sets of measurements from a normal distribution, you will get different means and standard deviations • Even the best scientist and/or laboratory will have measurement differences when examining the same sample (system) • What needs to be defined is the confidence in measurement data and the significance of any differences

Estimation of Standard Deviation

Does a Measured Value Differ from an Expected Value? • Confidence Interval of the Mean (CI) : The probability where a sample mean lies relative to the population mean • CI = X ± (t) (s) / (n)1/2: value of t depends upon level of confidence desired & # of degrees of freedom (n-1)

Does a Measured Value Differ from an Expected Value?

Criteria for Rejecting an Observation • One can always reject a data point if there is an assignable cause • If not, evaluate using statistical techniques • Common Outlier Tests • Dixon (Q) Test • Grubbs Test • Youdon Test • Student t Test

Criteria for Rejecting an Observation: Dixon (Q) Test

Control Charts

DATA & STATISTICS 101