640 likes | 802 Views
Uncertainty and Sampling. Dr. Richard Young Optronic Laboratories, Inc. Introduction. Uncertainty budgets are a growing requirement of measurements. Multiple measurements are generally required for estimates of uncertainty. Multiple measurements can also decrease uncertainties in results.
E N D
Uncertainty and Sampling Dr. Richard Young Optronic Laboratories, Inc.
Introduction • Uncertainty budgets are a growing requirement of measurements. • Multiple measurements are generally required for estimates of uncertainty. • Multiple measurements can also decrease uncertainties in results. • How many measurement repeats are enough?
Random Data Simulation Here is an example probability distribution function of some hypothetical measurements. We can use a random number generator with this distribution to investigate the effects of sampling.
Random Data Simulation Here is a set of 10,000 data points…
Random Data Simulation Plotting Sample # on a log scale is better to show behaviour at small samples.
Random Data Simulation There is a lot of variation, but how is this affected by the data set?
Sample Mean Here we have results for 200 data sets.
Sample Standard Deviation The most probable value for the sample standard deviation of 2 samples is zero! Many samples are needed to make 10 most probable.
Cumulative Distribution Sometimes it is best to look at the CDF. The 50% level is where lower or higher values are equally likely.
Uniform Distribution What if the distribution was uniform instead of normal? The most probable value for >2 samples is 10.
Uniform Distribution Underestimated values are still more probable because the PDF is asymmetric.
Uniform Distribution • Throwing a die is an example of a uniform random distribution. • A uniform distribution is not necessarily random however. • It may be cyclic e.g. temperature variations due to air conditioning. • With computer controlled acquisition, data collection is often at regular intervals. • This can give interactions between the cycle period and acquisition interval.
Cyclic Variations For symmetric cycles, any multiple of two data points per cycle will average to the average of the cycle.
Cyclic Variations Unless synchronized, data collection may begin at any point (phase) within the cycle. Correct averages are obtained when full cycles are sampled, regardless of the phase.
Cyclic Variations Standard Deviation Again, whole cycles are needed to give good values. The value is not 10 because sample standard deviation has a (n-1)0.5 term.
Cyclic Variations The population standard deviation is 10 at each complete cycle. Each cycle contains all the data of the population. The standard deviation for full cycle averages = 0.
Smoothing • Smoothing involves combining adjacent data points to create a smoother curve than the original. • A basic assumption is that data contains noise, but the calculation does NOT allow for uncertainty. • Smoothing should be used with caution.
Smoothing What is the difference?
Savitzky-Golay Smoothing Here is a spectrum of a white LED. It is recorded at very short integration time to make it deliberately noisy.
Savitzky-Golay Smoothing A 25 point Savitzky-Golay smooth gives a line through the center of the noise.
Savitzky-Golay Smoothing The result of the smooth is very close to the same device measured with optimum integration time
Spectral Sampling But how does the number of data points affect results? Here we have 1024 data points.
Spectral Sampling Now we have 512 data points.
Spectral Sampling Now we have 256 data points.
Spectral Sampling Now we have 128 data points.
Spectral Sampling A 25 point smooth follows the broad peak but not the narrower primary peak.
Spectral Sampling But it doesn’t work so well on the broad peak. To follow the primary peak we need to use a 7 point smooth…
Spectral Sampling This is because some of the higher signal data have been removed. Comparing to the optimum scan, the intensity of the primary peak is underestimated.
Spectral Sampling Beware of under-sampling peaks – you may underestimate or overestimate intensities.
Exponential Smoothing Here is the original data again. What about other types of smoothing?
Exponential Smoothing An exponential smooth shifts the peak. Beware of asymmetric algorithms!
Sampling Without Noise This is the optimum integration scan but with 128 points like the noisy example. With lower noise, can we describe curves with fewer points?
Sampling Without Noise … 64 points.
Sampling Without Noise … 32 points. Is this enough to describe the peak?
Interpolation • Interpolation is the process of estimating data between given points. • National Laboratories often provide data that requires interpolation to be useful. • Interpolation algorithms generally estimate a smooth curve.
Interpolation • There are many forms of interpolation: • LeGrange, B-spline, Bezier, Hermite, Cardinal spline, cubic, etc. • They all have one thing in common: • They go through each given point and hence ignore uncertainty completely. • Generally, interpolation algorithms are local in nature and commonly use just 4 points.
Let’s zoom this portion… Interpolation The interesting thing about interpolating data containing random noise is you never know what you will get.
Interpolation Uneven sampling can cause overshoots. The Excel curve can even double back.
Combining a Smoothand Interpolation • If a spectrum can be represented by a function, e.g. polynomial, the closest “fit” to the data can provide smoothing and give the values between points. • The “fit” is achieved by changing the coefficients of the function until it is closest to the data. • A least-squares fit.
Combining a Smoothand Interpolation • The square of the differences between values predicted by the function, and those given by the data are added to give a “goodness of fit” measure. • Coefficients are changed until the “goodness of fit” is minimized. • Excel has a regression facility that performs this calculation.
Combining a Smoothand Interpolation • Theoretically, any simple smoothly varying curve can be fitted by a polynomial. • Sometimes it is better to “extract” the data you want to fit by some reversible calculation. • This means you can use, say, 9th order polynomials instead of 123rd order to make the calculations easier.
Polynomial Fitting NIST provide data at uneven intervals. To use the data, we have to interpolate to intervals required by our measurements.
Method 1 NIST recommend to fit a high-order polynomial to data values multiplied by l5/exp(a+b/l) for interpolation. The result looks good, but…
Method 1 ...on a log scale, the match is very poor at lower values.
Method 1 When converted back to the original scale, lower values bear no relation to the data.
What went wrong? • The “goodness of fit” parameter is a measure of absolute differences, not relative differences. • NIST use a weighting of 1/E2 to give relative differences, and hence closer matching, but that is not easy in Excel. • Large values tend to dominate smaller ones in the calculation. • A large dynamic range of values should be avoided. • We are trying to match data over 4 decades!
How do NIST deal with it? • Although NIST’s 1/E2 weighting gives closer matches than this data, to get best results they split the data into 2 regions and calculate separate polynomials for each. • This a reasonable thing to do but can lead to local data effects and arbitrary splits that do not fit all examples. • Is there an alternative?
Alternative Method 1 A plot of the log of E*l5 values vs. l-1 is a gentle curve – almost a straight line. We can calculate a polynomial without splitting the data. The fact that we are fitting a log scale means we are effectively using relative differences in the least squares calculation.