380 likes | 416 Views
Learn how statistical tools can identify trends and reduce errors in experimental data. Explore concepts like central tendency and dispersion, and analyze various forms of data distributions. Practice with sample histograms and understand the role of population, sample, and random variables in data analysis.
E N D
MECH 373Instrumentation and Measurements Lecture 12 Statistical Analysis of Experimental Data (Chapter 6) • Introduction • General Concepts and Definitions • Probability
Introduction • Random features are observed in virtually all measurements. • That is, even if the same measuring system is used to measure a fixed measurand repeatedly, the results will not have all the same value. • Sources that contribute to these variations include the following: • • Measurement system (Resolution, Repeatability) • • Measurement procedures and technique (Repeatability) • • Measured variable (Temporal variation, Spatial variation) • The uncontrolled variables also affect the measurand and introduce randomness in the measurand. • In some cases the randomness in the data is so dominant that it is difficult to distinguish the sought-after trend. This is common in experiments in social sciences and also occurs in engineering. • In such cases, statistics may offer tools that can identify trends from what appears to be a set of confused data. • Therefore, statistical tools are often needed to identify and generalize the characteristics of test data or determine bounds on the uncertainty of the data.
Systematic error – reduce by calibration Random error – statistical analysis Range of random error True value measurand Systematic error Average of measured value Experimental Errors
92 90 88 Output, mV 86 84 82 80 0 6 12 18 24 30 36 42 48 54 60 66 Reading Random Nature of Experimental Data Monitor a pressure sensor over a period of time, its output is supposed to be constant if the pressure is constant. The actual readings from the sensor show variations. Question: What is the valve are we reading?
Example 1 Consider 60 temperature measurements of the hot gas flowing through a duct over a period of one hour. • A typical problem associated with data such as these would be to determine whether it is likely that the temperature might exceed certain limits. • For example, is there a significant chance that the temperature will ever exceed 1117°C? • To help visualize the data, it is helpful to plot them in the form of a bar graph as shown below.
Example 1 • A bar graph of this type used for the statistical analysis is called a histogram. • To create this graph, the data first be arranged into groups called bins, as shown below. • Each bin has the same width (the range of temperature values). • The height of each bar represents the number of values (or samples) occur in that bar. This is also termed as frequency of occurrence.
Example 1 Comments on the histogram given in the previous figure (pp. 5): 1) There is a peak in the number of readings near the center of the temperature range 2) The number of readings at temperatures less than or greater than the central value drops off rapidly 3) The “curve” is bell shaped, not parabolic – the number of samples for bins away from the center in not zero. These characteristics of the data are common properties of experimental results, though not necessary. Figures in next slide show some other forms of distributions of that may found in engineering applications. Some general guidelines apply to the construction of histograms.
Analysis of Experimental Data Some shapes of distributions: (a) Symmetric; (b) skewed; (c) J-shaped; (d) bimodal; (e) uniform
Analysis of Experimental Data • To apply statistical analysis to experimental data, the data are usually characterized by determining parameters that specify the central tendency and the dispersion of the data. • The next step is to select a theoretical distribution function that is most suitable for explaining the behavior of the data. • This theoretical function can then be used to make predictions or estimates about various properties of the data.
General Definitions • Population • The entire collection of objects, measurements, observations, etc. For example, wind speed at a certain location over a 24-h period. • Sample A representative subset of a population • Sample Space The set of possible outcomes of an experiment (discrete, continuous) • Random Variable The variable being measured • Distribution Function A graphical or mathematical relationship that is used to represent the values of the random variable
General Definitions • Parameter • A numerical attribute of the entire population. As an example, the average value of a random variable in a population is a parameter for that population • Event The outcome of a random experiment • Statistic A numerical attribute of the sample. For example, the average value of a sample property is a statistic of the sample • Probability The chance of the occurrence of an event in an experiment
Measures of Central Tendency • Mean (sample mean, population mean) • Median • If the measured data are arranged in ascending or descending order, the median is the value at the center of the set. • Odd: middle one • Even: average of the middle two • Mode • The mode is the value that occurs most often. If no number is repeated, then there is no mode for the list. • Range • The range is just the difference between the largest and smallest values.
Measures of Dispersion • Dispersion Spread or variability of the data • Deviation • Mean deviation • Standard deviation (Sample, Population) • Variance
Example 2 Find the mean, median, mode, range, standard deviation and variance of the following dataset.
Example 2 Find the mean, median, mode, range, standard deviation and variance of the following dataset.
Basics of Probability • Probability Probability is a numerical value expressing the likelihood of occurrence of an event relative to all possibilities in a sample space. successful occurrences Probability of event A = total number of possible outcomes • If A is certain to occur, P(A) = 1 • If A is certain not to occur, P(A) = 0 • If B is the complement of A, then P(B) = 1 - P(A) • If A and B are mutually exclusive (the probability of simultaneous occurrence is zero), P(A or B) = P(A) + P(B) • If A and B are independent, P(AB) = P(A)P(B) (occurrence of both A and B) • P(AB) = P(A) + P(B) - P(AB) (occurrence of A or B or both)
r Basics of Probability • Probability Mass Function • Normalization • Mean • Variance N = 3 Hit purple region, score = 3 Hit blue region, score = 2 Hit red region, score = 1 P(p)<P(b)<P(r)
Basics of Probability (continued) • Probability Density Function • Probability of occurrence in an interval xi and xi+dx • Probability of occurrence in an interval [a,b] • Mean of population • Variance of population
-1 0 1 2 3 4 5 Probability Distribution Function Normal Distribution • Symmetric about m • Bell-shaped • Mean m: the peak of the density occurs • Standard deviation s: indicates the spread of the bell curve. m = 2
3s Z [-3,3] 1s Z[-1,1] 2s Z [-2,2] 68% 95% 99.7% Standard Normal Distribution (mean=0, standard deviation=1)
95% 99.7% 68% Normal Distribution Example • The distribution of heights of American women aged 18 to 24 is approximately normally distributed with mean 65.5 inches and standard deviation 2.5 inches. • 68% of these American women have heights between 65.5 – 1(2.5) and 65.5 + 1(2.5) inches, or between 63 and 68 inches, • 95% of these American women have heights between 65.5 - 2(2.5) and 65.5 + 2(2.5) inches, or between 60.5 and 70.5 inches. • 99.7% of these American women have heights between 65.5 - 3(2.5) and 65.5 + 3(2.5) inches, or between 58 and 73 inches.
Parameter Estimation • Estimate population mean using sample mean • Estimate population standard deviation using sample standard deviation
Interval Estimation of Population Mean • Estimate of population mean • Confidence interval • Confidence level (degree of confidence) • Level of significance a = 1 – confidence level
From Sample to Population • Choose many samples from population • Find the mean of each sample • Determine the uncertainly range of the means (Sample mean is a variable !!!) This method is very costly How to estimate the statistics of a population from only one single sample?
Central Limit Theorem (a) • Several different samples, each of size n, mean of • If sample size n is sufficiently large (>30), then (standard error of the mean) • If original population is normal, the distribution for the is normal • If original population is not normal and n is large (n > 30) the distribution for the is normal • If original population is not normal and n < 30 the distribution for the is only approximately normal
Confidence Interval Interval Estimation of Population Mean (n>30) With confidence level 1-a
Example 6.11 • n = 36 • Average = 25 • Sample standard deviation S = 0.5 • Determine 90% confidence interval of the mean • Solution: • 1-α=90%, -> α = 0.1 • 0.5- α/2=0.5-0.05=0.45 • Table 6.3 -> Z α/2=1.645
Student’s t-distribution (n<30) • Student’s t, degree of freedom ν = n-1
Confidence Interval Interval Estimation of Population Mean (n<30) With confidence level 1-a
Example 6.12 • n=6 • 1250, 1320, 1542, 1464, 1275, and 1383 h • Estimate population mean and 95% confidence interval on the mean • Solution: • Mean=(1250+1320+1542+1464+1275+1383)/6=1372 h • S=[(1250-1372)2+(1320-1372)2+(1542-1372)2+(1464-1372)2+(1275-1372)2+(1383-1372)2]/(6-1)]1/2 = 114 h • ν=n-1=6-1=5, α=1-95%=0.05, α/2=0.05/2=0.025 • Table 6.6 -> t α/2=2.571
Example 6.13 • Reduce the 95% confidence interval to ±50 h from ±120 h • Determine how many more systems should be tested • Solution: Assume n>30, • α=1-95%=0.05, 0.5-α/2=0.5-0.05/2=0.475 • Table 6.3 -> z α/2=1.96 • n = (1.96x114/50)2=20 <30 => t-distribution • ν=n-1=20-1=19, α/2=0.05/2=0.025 • Table 6.6 -> t α/2=2.093 • n = (2.093x114/50)2=23
ν = 4 Area=α/2 ν = 10 Area=α/2 Chi-squared Distribution Function • Relates the sample variance to the population variance
ν = 4 Area=α/2 ν = 10 Area=α/2 Interval Estimation of Population Variance With confidence level 1-a
Example 6.14 • n = 20, mean = 0.32500 in, S = 0.00010 in • obtain 95% confidence interval for the standard deviation • Solution: • ν=n-1=20-1=19, α=1-95%=0.05, α/2=0.05/2=0.025, 1- α/2=1-0.05/2=0.975 • Table 6.7 -> χ2ν, α/2=32.825, χ2ν, 1-α/2=8.9065