Physics 114: Lecture 14 Mean of Means

Physics 114: Lecture 14 Mean of Means Dale E. Gary NJIT Physics Department

The Goal of Measurement • When we make measurements of a quantity, we are mainly after two things: (1) the value of the quantity (the mean), and (2) a sense of how well we know this value (for which we need the spread of the distribution, or standard deviation). • Remember that the goal of measurement is to obtain these two bits of information. The mean is of no use without the standard deviation as well. • We have seen that repeated measurement of a quantity can be used to improve the estimate of the mean. Let’s take a closer look at what is going on. • Say we create a random set of 100 measurements of a quantity whose parent distribution has a mean of 5 and standard deviation of 1: • x = randn(1,100)+5; • Create a histogram of that set of measurements: [y z] = hist(x,0:0.5:10); • Here, y is the histogram (frequency of points in each bin), and z is the bin centers. Now plot it: plot(z,y,’.’). If you prefer bars, use stairs(z,y).

Comparing Measurements • If you make repeated sets of 100 measurements, you will obtain different samples from the parent distribution, whose averages are approximations to the mean of the parent distribution. • Let’s make 100 sets of 100 measurements: • y = zeros(100,21); • for i = 1:100; x = randn(1,100)+5; [y(i,:) z] = hist(x,0:0.5:10); end • Plotting some of the histograms, • for i = 1:16; subplot(4,4,i); stairs(z-0.25,y(i,:)); axis([2,8,0,25]); end

Comparing Measurements • you can see that the samples means vary from one set of 100 measurements to another.

Comparing Measurements • Now the mean can be determined from the original values (xi): • mean(x) • or from the histograms themselves: • mean(y(100,:).*z)/mean(y(100,:)) • Make sure you understand why these two should be the same (nearly), and why they might be slightly different. • Since we have saved the histograms, let’s print out the means of these 16 sample distributions: • for i=1:16; a = mean(y(i,:).*z)/mean(y(i,:)); fprintf('%f\n',a); end • Here is one realization of those means. • 5.015 4.915 5.000 4.940 • 4.995 4.980 4.975 4.960 • 4.990 4.920 4.965 5.040 • 5.010 4.870 5.135 4.890 • We might surmise that the mean of these means might be a better estimate of the mean of the parent distribution, and we would be right!

Distribution of Means • Let’s now calculate the 100 means • a = zeros(1,100); for i=1:100; a(1,i) = mean(y(i,:).*z)/mean(y(i,:)); end • And plot them • subplot(1,1,1) • plot(a,'.') • This is the distribution of means. • The mean of this distribution is 4.998, clearly very close to the mean of the parent distribution (5)

Mean of Means • We can think about this distribution in two different, but equivalent ways. • If we simply sum all of the histograms, we obtain a much better estimate of the parent population: • mom = sum(y); • stairs(z-0.25,mom) • mean(mom.*z)/mean(mom) • gives: 4.998

Mean of Means • Alternatively, we can think of these different estimates of the mean of the original population as being drawn from a NEW parent population, one representing the distribution of means. • This NEW parent population has a different (smaller) standard deviation than the original parent population. std(a) is 0.0976

Calculation of Error in the Mean • Recall in Lectures 11 and 12 we introduced the general formula for propagation of errors of a combination of two measurements u and v as: • Generalizing further for our case of N measurements of the mean m’, and ignoring correlations between measurements (i.e. setting the cross-terms to zero), we have • We can make the assumption that all of the si are equal (this is just saying that the samples are all of equal size and drawn from the same parent population). Also so

Calculation of Error in the Mean • This is no surprise, and it says what we already knew, that the error in the mean gets smaller according to the square-root of the number of means averaged. • Again, this is the case when all of the errors in the means used are equal. What would we do if, say, some of the means were determined by averaging different numbers of observations instead of 100 each time? • In this case, we can do what is called weighting the data. If we know the different values of si, then the weighted average of the data points is

Error in the Weighted Mean • In this case, if we want to combine a number of such weighted means to get the error in the weighted mean, we still have to calculate the propagation of errors: but now the si are not all the same, and also we use the weighted mean to get the gradient of the mean • Inserting that into the above equation, we have

Relative Uncertainties • In some cases, we do not necessarily know the uncertainties of each measurement, but we do know the relative values of si. That is, we may know that some of our estimates of the mean used 100 measurements, and some used only 25. In that case, we can guess that the latter measurements have errors twice as large (since the standard deviations are proportional to the square-root of the number of measurements). • So, say Then • In other words, because of the nature of the ratio, the proportionality constant cancels and we need only the relative weights. • To get the overall variance in this case, we must appeal to an average variance of the data: • So the standard deviation is found using this, as • Inserting that into the above equation, we have

Physics 114: Lecture 14 Mean of Means