600 likes | 647 Views
Statistical Chemistry Yudonob’s lecture notes. The Two Types of Errors are Determinate Errors Random Errors. · Determinate Errors – systematic errors that cause a measurement to always be too high or too low which can be traced to an identifiable source. Examples include
E N D
Statistical Chemistry Yudonob’s lecture notes
The Two Types of Errors are • Determinate Errors • Random Errors
·Determinate Errors – systematic errors that cause a measurement to always be too high or too low which can be traced to an identifiable source. Examples include • Use of an uncalibrated or faulty tool or instrument • Use of wrong values, such as molar mass, conversion factor, etc. • A good way to detect the existence of determinate errors is to use different methods of analyzing the same material
·Random Errors – errors that are random in nature. They occur when a calibrated instrument is correctly used to its most sensitive degree of measurement. For example, using the analytical balance (sensitivity of 0.1mg), you see variations in the last digit when re-weighing the same object. In this chapter we will focus our attention on ways to evaluate random errors.
o All measurements have some random errors. No measurements contain experimental errors. Statistics allows us to accept conclusions that have a high probability of being correct and to reject conclusions that have a low probability. • o Statistics apply only to random errors; the analyst would eliminate all determinate errors before making sensitive measurements.
Random errors follow a Gaussian distribution of values about the central measurement.
A Gaussian distribution is characterized by the mean value and a standard deviation Mean value or average – is a measurement of central tendency xmean = i (xi) / n where i represents each individual measurement, means the summation, and n is the number of measurements in that set of data.
Standard deviation – is the measure of the width of the distribution about the central value. ______________ s = i (xi – xmean) 2 / (n-1) The above defined standard deviation is for a limited or small set of data; for a large set of data the standard deviation is indicated by and is defined as ______________ = i (xi – xmean) 2 / n
As the size of the data set increases there (n – 1) n, so s Ordinarily analytical chemists will use the first value (s) for the standard deviation since we will typically deal with a small population or small data set.
The larger the value of s, the broader is the Gausssian curve.
The relative standard deviation is the standard deviation divided by the mean value, that is s / xmean. The relative standard deviation may be expressed in % (parts per hundred) or ppt (parts per thousand) Relation standard deviation (%) = s x100/ xmean Relation standard deviation (ppt) = s x 1000 /xmean
Other important terms Median – middle value in an ordered set (ascending or descending); when n is an even number, it is the average of the 2 middle values. • Range – difference between the highest and lowest values in the set of data. May be stated as • (High – Low) or that value. • For example in a set of data where 25.11 is the highest and 24.85 the lowest, we could describe the range as (25.11 – 24.85) or 0.26
Find the mean, median, standard deviation, relative standard deviation and the range of the following set of student data acquired in the analysis of chloride in a sample: xi 18.56% 18.65% 18.49% 18.54% 18.70% 18.53%
The sum of the individual values = 111.47 there were 6 measurements The mean (xmean) = 111.47 / 6 = 18.578 = 18.58
The quantity (xmean – xi) is calculated next xi (xmean – xi) (xmean – xi) 2 18.56% -0.02 0.0004 18.65% 0.07 0.0049 18.49% -0.09 0.0081 18.54% -0.04 0.0016 18.70% 0.12 0.0144 18.53% -0.05 0.0025
The sum of deviations squared • ( i (xmean – xi)2 )= 0.0319 • 0.0319/(6-1) = 0.00638 • ________ • s = (0.00638) = 0.0798 = 0.08 or 0.080 to use the authors method. • Note that s is reported to the number of decimal places as the data. The relative standard deviation = s/xmean = 0.0798 x 100 / 18.578 = 0.429 this could be reported as 0.43% or 4.3 ppt
To find the median, first arrange in order (a/d); I choose d(escending) x I 18.70 18.65 18.56 18.54 18.53 18.49 1 2 3 4 5 6 Since n = 6, the median is between ordered #3 and #4, so the median = (18.56 + 18.54)/2 = median = 18.55
For the ideal Gaussian distribution 68.3% of the measurements lie within 1 (standard deviation) of the mean value, 95.5% within 2 and 99.7% within 3. This means that for real data of a small population we can expect only 4.5% to fall outside the 2s limits and only 0.3% outside the 3s limits from the mean value.
Student’s t test The Student’s t test is a test developed by W. S. Gossett who used the pseudonym “Student” to publish this statistical test in 1908. It is used to express confidence intervals for a set of data and to statistically compare the results of different experiments.
Student’s t test The true mean is denoted as . From a small number of data points it is not possible to determine either or . Instead, we have xmean and s. We would like to be able to state the probability that the true value is within some quantity of xmean . The confidence interval does this in the form = xmean t s / n and may stated at a certain probability such as 90%, 95%, or 99%, etc. The values of t for various degrees of freedom and confidence levels are shown in Table 4-2, page 78 of your textbook.
Student’s t test Lets go back to the % chloride data and calculate the 50%, 90%, 95% and 99% confidence intervals for the results. xi xmean = 18.58 s = 0.08 18.56% 18.65% 18.49% 18.54% 18.70% 18.53% At the 50% CI, = 18.58 (0.727)(0.079 / 6 = 18.58 0.023 = 18.58 0.02. Note that the value for t is at the intersection of the 50% column and the row for number of degrees of freedom = 5
Student’s t test Now repeating the calculation with the appropriate values of t At the 90% CI, = 18.58 (2.015)(0.079 / 6 = 18.58 0.065 = 18.58 0.07 At the 95% CI, = 18.58 (2.571)(0.079 / 6 = 18.58 0.082 = 18.58 0.08. At the 99% CI, = 18.58 (4.032)(0.079 / 6 = 18.58 0.130 = 18.58 0.13.
Student’s t test Note that the tolerance quantity (t s / n) becomes larger as we increase the percent probability that we desire to include. Or, another way of looking at it is that at the 50% CI there is a 50% probability that the true value () lies outside the 0.02, whereas at the 99% CI that is a 1% probability that lies outside the 0.13 Also note that the tolerance quantity (t s / n) is reported to the same number of decimal places as the mean value, though I carried an extra place through the calculation and rounded after the final step.
Student’s t test From the equation = xmean t s / n we see that the size of the (t s / n) is inversely proportional to the n; thus, one way to increase the probability that a x mean value is close to the true value is to increase the number of results, assuming that x mean and s are not affected by the multiple runs.
Student’s t test • Problem – For n = 3 the x mean and s were found to be 15.78 and 0.30 respectively. Calculate the 95% confidence interval. • For n = 3, (n - 1) = 2; t 95, 2 = 4.303 • = 15.78 (4.303)(0.30 / 3 = 15.78 0.745 • = 15.78 0.75 • Relative uncertainty = (0.75/15.78) X 100 = 4.75%
Student’s t test • Repeat the previous calculation for n = 7 with the same x mean and s values: • For n = 7, (n-1) = 6; t 95, 6 = 2.447 • = 15.78 (2.447)(0.30 / 7 = 15.78 0.277 = • = 15.78 0.28 • Relative uncertainty = (0.28/15.78) X 100 = 1.77%
Student’s t test The t test is also valuable to compare two different sets of data to determine if they are ‘the same’ or ‘different’, or stated statistically, “are there significant differences between the two sets of data?”
Student’s t test Example – As the director of a research laboratory you are paid to decide if there is a significant difference between the mean values of two sets of data obtained by two different scientists, a senior scientist and one recently hired. Data of Senior Scientist: xmean = 24.66% with s = 0.06% for n = 5 Data of the New Kid: xmean = 24.55% with s = 0.10% for n = 7
Student’s t test What we need to do here is the compare the two mean values, x1 mean to x2 mean as their difference (x1mean- x2 mean) to (t s / n). Because there are two different standard deviations, we need to calculate the pooled standard deviation, spool which is defined as _________________________________ spool = {(n1 – 1)s12 + (n2 – 1)s22} / (n1 + n2 – 2) spool = {(5 – 1)(0.06)2 + (7 – 1)(0.10)2 / (5 + 7 – 2)}1/2
Student’s t test spool = {(5)(0.0036) + (6)(0.010) / (10)}1/2 = {(0.018 + 0.060) / (10)}1/2 = {0.0078 }1/2 spool = 0.088 = 0.09 Note that the value of spool will always fall between the two individual values of s; it is like a weighed average value.
Student’s t test ____________ Test if |(x1 mean- x2 mean)| > t spool / n1 + n2 / n1 n2 ) ? We will use the value of t95 for 7 + 5 – 2 or 10 degrees of freedom; according to Table 4-2, t 95,10 = 2.228. Substitution, is | 24.66 – 24.55| > {(2.228)(0.088) / (12/35)}1/2 ? 0.11 > {(0.196) / (0.343)}1/2 ? 0.11 > {(0.196) / (0.343)}1/2 ? 0.11 > {(0.572)}1/2 ? 0.11 > 0.756 ? No, there is no significant difference between the mean values of the two scientists.
Student’s t test The testing for significant differences between the true value () and the mean value (xmean) of a set of data is very similar to the previous test. If |(- x mean)| > t spool / n1 + n2 / n1 n2 ), there is a significant difference between the true value and the mean.
F test for Differences in Precisions In addition to comparing a mean value to the true value and two mean values, it is often valuable to compare the precisions of two different sets of data. Your textbook does not discuss this test, so I will briefly explain it and apply it to a typical problem. The variance v is defined as the standard deviation squared, that is, v = s2. Variances are calculated for both sets of data. The larger variance is placed in the numerator of a term known as Fc and defined as Fc = vlarger / vsmaller. The value of Fc is then compared to the tabulated values of Ft at a specified confidence level, generally 95%.
F test for Differences in Precisions Problem – Were there significant differences between the precisions of the two scientists in the last problem above? Data of Senior Scientist: xmean = 24.66%, s = 0.06% for n = 5 Data of the New Kid: xmean = 24.55%, s = 0.10% for n = 7 For the new kid, v = (0.10)2 = 0.010; For the senior scientist, v = 0.0036. Fc = (0.010 / 0.0036) = 2.78. From the Ft table, Ft = 6.16. Since Fc < Ft there are no significant differences between the precisions of the two scientists.
Conclusions of the Differences between Mean Values and Precisions of the Two Scientists • The first test allowed us to test for significant differences in the mean values obtained by the two scientists. Since the difference in the 2 mean values was less than the tolerance quantity, there is no significant difference between the mean values of the two scientists at the 95% confidence level. • The second test (F-test) allowed us to test for differences in the precision of the two scientists. Since the calculated value of F cal < Ftable , there is no significant difference between the precision of the 2 scientists at the 95% confidence level.
Rejection of Suspect Data 1) The Q-Test Occasionally in a set of data there is one value that appears to not belong with the rest of the set. If the experimenter is aware of some mistake or malfunction, she/he do not need to employ one of these tests to reject that result. If no known error has occurred (so that the suspect result appears to be random, the analyst is then faced with whether to retain or reject this suspect value. He/she needs some sound basis for their decision, not just ‘eyeballing’ it. Your textbook describes one such test, the Q-test. After I have discussed the Q-test, I will then discuss two additional less rigorous, but useful tests for rejection of suspect data.
Rejection of Suspect Data Problem – Given the following set of data for the determination of % Acidic Substance in a Cleansing Agent. May the suspect result be rejected, or must it be retained by the criteria of the Q-test? % Acid 10.19% 10.08% 10.52% 10.13% Calculate the mean values both retaining and rejecting the suspect value (which is the 10.52 result). xmean (retaining) = 10.23% xmean (rejecting) = 10.13%
Rejection of Suspect Data Clearly the suspect value undutifully influences the mean value. To employ the Q-test we need the range and the difference between the suspect value and the value nearest it. Range = (10.52 – 10.08) = 0.44 Difference of Suspect and its nearest value = (10.52 – 10.19) = 0.33 Qcal = (xsuspect – xnearest) / (Range) = 0.33 / 0.44 = 0.75 Since Qcal < Qtable (0.75 < 0.76) we must retain the suspect value at the 90% confidence level. 38
Rejection of Suspect Data Referring to Table 4-4, textbook page 82 Qt = 0.76 for n = 4 at the 90% Confidence Level. Thus we must retain the suspect value by this criterion. (Not in your textbook, but Qtable at the 96% confidence level has a value of 0.85 for n = 4 a; by this criterion, the suspect value of 10.52% would also be retained.) aSkoog and West, “Fundamentals of Analytical Chemistry, 4e, c1982, CBS College Publishing, p62. 39
Rejection of Suspect Data 2) The 4d and 2.5d Rules Although less rigorous, this test may also be used to decide whether to retain or reject a suspect. In order to use it, one needs to calculate the average deviation which is defined as average deviation = i|(x i – xmean)| / n 40
Rejection of Suspect Data Since 4 x avg d < di for the suspect value from the mean, we could reject the suspect value. The 2.5d is done identically except the multiplier is 2.5 instead of 4; 2.5d equals 0.093 or 0.09 in this problem. Clearly the 2.5d rule allows easier rejection than the 4d rule. The deviation of the suspect value (0.39) could be rejected by both of these criteria.
Rejection of Suspect Data In the analysis of your laboratory results, you may use any of the above tests in an attempt to reject one suspect result; if you meet the criterion for rejection, reject the suspect value and state that basis in your laboratory report.
Corrections to Errors in Earlier Slides The following slides are corrections to the errors in the earlier slides.
Rejection of Suspect Data Clearly the suspect value undutifully influences the mean value. To employ the Q-test we need the range and the difference between the suspect value and the value nearest it. Range = (10.52 – 10.08) = 0.44 Difference of Suspect and its nearest value = (10.52 – 10.19) = 0.33 Qcal = (xsuspect – xnearest) / (Range) = 0.33 / 0.44 = 0.75 Since Qcal < Qtable (0.75 < 0.76) we must retain the suspect value at the 90% confidence level. 38
Rejection of Suspect Data Referring to Table 4-4, textbook page 82 Qt = 0.76 for n = 4 at the 90% Confidence Level. Thus we must retain the suspect value by this criterion. (Not in your textbook, but Qtable at the 96% confidence level has a value of 0.85 for n = 4 a; by this criterion, the suspect value of 10.52% would also be retained.) aSkoog and West, “Fundamentals of Analytical Chemistry, 4e, c1982, CBS College Publishing, p62. 39
Rejection of Suspect Data Note the correction (underlined) is the last statement in the proceeding slide. I could not find a less restrictive Q-Table (Confidence Level less than 90%). If such a table exists, say at 50% CL, its Qtable would be less than the 0.76 value at the 90% CL used in this problem.
Rejection of Suspect Data Since 4 x avg d < di for the suspect value from the mean, we could reject the suspect value. The 2.5d is done identically except the multiplier is 2.5 instead of 4; 2.5d equals 0.093 or 0.09 in this problem. Clearly the 2.5d rule allows easier rejection than the 4d rule. The deviation of the suspect value (0.39) could be rejected by both of these criteria.