Chapter 2

Chapter 2 Statistics of repeated measurements

Mean and standard deviation

The distribution of repeated measurements • Although the standard deviation gives a measure of the spread of a set of results about the mean value, it does not indicate the shape of the distribution. • To illustrate this we need a large number of measurements

= 0.500 S = 0.0165

The set of all possible measurements is called the population. I f there are no systematic errors, then the mean of this population, denoted by µ, is the true value of the nitrate ion concentration, and the standard deviation denoted by  • s usually gives an estimate of ????

Normal or Gaussian distribution The mathematical model that describes a continuous curve: The curve is symmetrical about u and the greater the value of  the greater the spread of the curve

More detailed analysis shows that, whatever the values of  and  and a, the normal distribution has the following properties. • For a normal distribution with  and , approximately 68% of the population values lie within ±  of the mean, • approximately 95% of the population values lie within ±2  of the mean • and approximately 99.7% of the population values lie within ±3  of the mean.

Standardized normal cumulative distribution function, F(z) • For a normal distribution with known mean, u, and standard deviation, , the exact proportion of values which lie within any interval can be found from tables, provided that the values are first standardized so as to give z-values. • This is done by expressing a value of x in terms of its deviation from the mean in units of standard deviation, . That is

Standardized normal cumulative distribution function, F(z) • Table A.1 (Appendix 2) gives the proportion of values, F(z), that lie below a given value of z. • F(z) is called the standard normal cumulative distribution function. • For example the proportion of values below z = 2 is F(2) = 0.9772 • and the proportion of values below z = -2 is F(-2) = 0.0228. • Thus the exact value for the proportion of measurements lying within two standard deviations of the mean is 0.9772 - 0.0228 = 0.9544.

Standardized normal cumulative distribution function, F(z) • If repeated values of a titration are normally distributed with mean 10.15 ml and standard deviation 0.02 ml, find the proportion of measurements which lie between 10.12 ml and 10.20 ml. • Standardizing the first value gives z = (10.12 - 10.15)/0.02 = From Table A.1, F(-1.5) = 0.0668.Standardizing the second value gives z 10.15)/0.02 = 2. From Table A.1, F(2.5) = 0.9938.Thus the proportion of values between x = 10.12 to 10.20 (which corresponds to z = -1.5 to 2.5) is 0.9938 - 0.0668 = 0.927. • Values of F(z) can also be found using Excel or Minitab

Log-normal distribution

Another example of a variable which may follow a log-normal distribution is the particle size of the droplets formed by the nebulizers used in flame spectroscopy. • Particle size distributions in atmospheric aerosols may also take the log-normal form, and the distribution is used to describe equipment failure rates • Minitab allows this distribution to be simulated and studied. However, by no means all asymmetrical population distributions can be converted to normal ones by the logarithmic transformation. • The distribution of the logarithms of the blood serum concentration shown in Figure 2.5(b) has mean 0.15 and standard deviation 0.20. • This means that approximately 68% of the logged values lie in the interval 0.15 - 0.20 to 0.15 + 0.20, that is -0.05 to 0.35. • Taking antilogarithms we find that 68% of the original measurements lie in the interval 10-0.05 to 100.35, that is 0.89 to 2.24. • The antilogarithm of the mean of the logged values, 100.15 = 1.41, gives the geometric mean of the original distribution

Definition of a sample • The Commission on Analytical Nomenclature of the Analytical Chemistry Division of the International Union of Pure and Applied Chemistry has pointed out that confusion and ambiguity can arise if the term `sample' is also used in its colloquial sense of the `actual material being studied' (Commission on Analytical Nomenclature, 1990). • It recommends that the term sample is confined to its statistical concept. • Other words should be used to describe the material on which measurements are being made, in each case preceded by 'test', for example test solution or test extract. • We can then talk unambiguously of a sample of measurements on a test extract, or a sample of tablets from a batch. • A test portion from a population which varies with time, such as a river or circulating blood, should be described as a specimen. • Unfortunately this practice is by no means usual, so the term 'sample' remains in use for two related but distinct uses.

The sampling distribution of the mean • In the absence of systematic errors, the mean of a sample of measurements provides us with an estimate of the true value, u, of the quantity we are trying to measure. • Even in the absence of systematic errors, the individual measurements vary due to random errors and so it is most unlikely that the mean of the sample will be exactly equal to the true value. • For this reason it is more useful to give a range of values which is likely to include the true value. • The width of this range depends on two factors. • The first is the precision of the individual measurements, which in turn depends on the standard deviation of the population. • The second is the number of measurements in the sample. The more measurements we make, the more reliable our estimate of ,u, the true value, will be.

Sampling distribution of the mean • Assuming each column is a sample: • The mean for a each sample: 0.506, 0.504, 0.502, 0.492, 0.506, 0.504, • 0.500, 0.486 • The distribution of all possible sample means (in this case an infinite • number) is called the sampling distribution of the mean. • Its mean is the same as the mean of the original population. • Its standard deviation is called the standard error of the mean (s.e.m.). • There is an exact mathematical relationship between the latter and the • standard deviation, , of the distribution of the individual • measurements:

For a sample of n measurements, standard error of the mean (s.e.m.) =

Confidence limits of the mean for large samples • The range which may be assumed to include the true value is known as a confidence interval and the extreme values of the range are called the confidence limits. • The term `confidence' implies that we can assert with a given degree of confidence, i.e. a certain probability, that the confidence interval does include the true value. • The size of the confidence interval will obviously depend on how certain we want to be that it includes the true value: the greater the certainty, the greater the interval required.

If we assume that this distribution is normal then 95% of the sample means will lie in the range given by:

In practice, we usually have one sample, of known mean, and we require a range for ,u, the true value:

Confidence limits of the mean for small Examples

The subscript (n - 1) indicates that t depends on this quantity, which is known as the number of degrees of freedom, d.f. (usually given the symbol ). • The term 'degrees of freedom' refers to the number of independent deviations • In this case the number is (n - 1), because when (n - 1) deviations are known the last can be deduced since • The value of t also depends on the degree of confidence required.

Chapter 2

Chapter 2

Presentation Transcript

Chapter 2-2

Chapter 2-2

Chapter 2 - 2

Chapter 2

Chapter 2

Chapter 2

CHAPTER 2

Chapter 2

Chapter 2

Chapter 2

Chapter 2

Chapter 2

Chapter 2

Chapter 2

Chapter 2

Chapter 2

Chapter 2

Chapter 2

Chapter 2

Chapter 2

Chapter 2

CHAPTER 2