250 likes | 426 Views
Statistical Methods For UO Lab — Part 1. Calvin H. Bartholomew Chemical Engineering Brigham Young University. Background. Statistics is the science of problem-solving in the presence of variability (Mason 2003). Statistics enables us to: Assess the variability of measurements
E N D
Statistical Methods For UO Lab — Part 1 Calvin H. Bartholomew Chemical Engineering Brigham Young University
Background • Statistics is the science of problem-solving in the presence of variability (Mason 2003). • Statistics enables us to: • Assess the variability of measurements • Avoid bias from unconsidered causes variation • Determine probability of factors, risks • Build good models • Obtain best estimates of model parameters • Improve chances of making correct decisions • Make most efficient and effective use of resources
Some U.S. Cultural Statistics • 58.4% have called into work sick when we weren't. • 3 out of 4 of us store our dollar bills in rigid order with singles leading up to higher denominations. • 50% admit they regularly sneak food into movie theaters to avoid the high prices of snack foods. • 39% of us peek in our host's bathroom cabinet. • 17% have been caught by the host. • 81.3% would tell an acquaintance to zip his pants. • 29% of us ignore RSVP. • 35% give to charity at least once a month. • 71.6% of us eavesdrop.
Population statistics Characterizes the entire population, which is generally the unknown information we seek Mean generally designated m Variance & standard deviation generally designated as s 2, and s, respectively Sample statistics Characterizes a random, hopefully representative, sample – typically data from which we infer population statistics Mean generally designated Variance & standard deviation generally designated as s2 and s, respectively Population vs. Sample Statistics
Point estimation Characterizes a single, usually global measurement Generally simple mathematic and statistical analysis Procedures are unambiguous Model development Characterizes a function of dependent variables Complexity of parameter estimation and statistical analysis depend on model complexity Parameter estimation and especially statistics are somewhat ambiguous Point vs. Model Estimation
Overall Approach • Use sample statistics to estimate population statistics • Use statistical theory to indicate the accuracy with which the population statistics have been estimated • Use linear or nonlinear regression methods/statistics to fit data to a model and to determine goodness of fit • Use trends indicated by theory to optimize experimental design
Sample Statistics • Estimate properties of probability distribution function (PDF), i.e., mean and standard deviation using Gaussian statistics • Use student t-test to determine variance and confidence interval • Estimate random errors in the measurement of data • For variables that are geometric functions of several basic variables, use the propagation of errors approach estimate: (a) probable error (PE) and (b) maximum possible error (MPE) • PE and MPE can be estimated by differential method; MPE can also be estimated by brute force method • Determine systematic errors (bias) • Compare estimated errors from measurements with calculated errors from statistics—will reveal whether methods of measurement or quantity of data is limiting
Some definitions: x = sample mean s = sample standard deviation m = exact mean s = exact standard deviation As the sampling becomes larger: x m s st chart z chart not valid if bias exists (i.e. calibration is off) Random Error: Single Variable (i.e. T) Questions • Several measurements • are obtained for a • single variable (i.e. T). • What is the true value? • How confident are you? • Is the value different on • different days?
small large (n>30) How do you determine bounds of m? • Let’s assume a “normal” Gaussian distribution • For small sample: s is known • For large sample: s is assumed we’ll pursue this approach Use z tables for this approach
Properties of a Normal PDF • About 68.26%, 95.44%, and 99.74% of data lie within 1, 2, and 3 standard deviations of the mean, respectively. • When mean is zero and standard deviation is 1, it is referred to as a standard normal distribution. • Plays fundamental role in statistical analysis because of the Central Limit Theorem.
Central Limit Theorem • Distribution of means calculated from a large data set is approximately normal • Becomes more accurate with larger number of samples • Sample mean approaches true mean as n → • Assumes distributions are not peaked close to a boundary and variances are finite
Student t-Distribution • Widely used in hypothesis testing and determining confidence intervals • Equivalent to normal distribution for large sample size • Student is a pseudonym, not an adjective – actual name was W. S. Gosset who published in early 1900s.
Student t-Distribution • Used to compute confidence intervals according to • Assumes mean and variance are estimated by sample values • Value of t decreases with DOF or number of data points n; increases with increasing % confidence
Student t-test (determine error from s) 5% 5% t a = 1- probability r = n -1 error = ts/n0.5 e.g. From Example 1: n = 7, s = 3.27
Values of Student t Distribution • Depend on both confidence level desired and amount of data. • Degrees of freedom are n-1, where n = number of data points (assumes mean and variance are estimated from data). • This table assumes two-tailed distribution of area.
Example 2 • Five data points with sample mean and standard deviation of 713.6 and 107.8, respectively. • The estimated population mean and 95% confidence interval is (from previous table ta = 2.77645):
Example 3: Comparing Averages Day 1: Day 2: What is your confidence that mx≠my? 99% confident different 1% confident same nx+ny-2
Error Propagation: Multiple Variables Obtain value (i.e. from model) using multiple input variables. What is the uncertainty of your value? Each input variable has its own error Example: How much ice cream do you buy for the AIChE event? Ice cream = f (time of day, tests, …) Example: You take measurements of r, A, v to determine m = rAv. What is the range of m and its associated uncertainty?
Value and Uncertainty • Values are used to make decisions by managers — uncertainty of a value must be specified • Ethics and societal impact of values are important • How do you determine the uncertainty of a value? • Sources of uncertainty: • Estimation- we guess! • Discrimination- device accuracy (single data point) • Calibration- may not be exact (error of curve fit) • Technique- i.e. measure ID rather than OD • Constants and data- not always exact! • Noise- which reading do we take? • Model and equations- i.e. ideal gas law vs real gas • Humans- transposing, …
Estimates of Error (d) for Input Variable (Methods or rules) • Measured variable (as we just did): measure multiple times; obtain s; • d≈ 2.57s (t chart shows > 2.57s for 99% confidence • e.g. s = 2.3 ºC for thermocouple, d= 5.8 ºC2.Tabulated variable:d ≈ 2.57 times last reported significant digit (e.g. r = 1.0 g/ml at 0º C, d = 0.257 g/ml)
Estimates of Error (d) for Variable • Manufacturer specs: use given accuracy data (ex. Pump is ± 1 ml/min, d = 1 ml/min) • Variable from regression (i.e. calibration curve):d≈ standard error (e.g. Velocity from equation with std error = 2 m/s ) • Judgment for a variable: use judgment for d (e.g. graph gives pressure to ± 1 psi, d= 1 psi)
Calculating Maximum or Probable Error • Maximum error can be calculated as shown previously: • Brute force method • Differential method • Probable error is more realistic – positive and negative errors can lower the error. You need standard deviations (s or s) to calculate probable error (PE) (i.e. see previous example). PE =d= 2.57 s Ψ = y ± 1.96 SQRT(s2y) 95% Ψ = y ± 2.57 SQRT(s2y) 99%
Calculating Maximum (Worst) Error 1.Brute force method: substitute upper and lower limits of all x’s into function to get max and min values of y. Range of y (Ψ ) is between ymin and ymax. 2.Differential method: from a given model y = f(a,b,c…, x1,x2,x3,…) Exact constants Independent variables Range of y (Ψ) = y ± dy
Example 4: Differential method m = rA v y x1 x2 x3 x1 = r= 2.0 g/cm3 (table) x2 = A = 3.4 cm2 (measured avg) x3 = v = 2 cm/s (calibration) d1 = 0.257 g/cm3 (Rule 2) d2 = 0.2 cm2 (Rule 1) d3 = 0.1 cm/s (Rule 4) Ψ = 13.6 ± 3.2 g/s y = (2.0)(3.4)(2) = 13.6 g/s dy = (6.8)(0.257)+(4.0)(0.2)+(6.8)(0.1) = 3.2 g/s Which product term contributes the most to uncertainty? This method works only if errors are symmetrical