230 likes | 366 Views
Data Analysis Examples. Anthony E. Butterfield CH EN 4903-1. #1: The Normal PDF.
E N D
Data Analysis Examples Anthony E. Butterfield CH EN 4903-1
#1: The Normal PDF • Your coworker tells you the temperature fluctuations of the outlet temperature from a certain coal gassifier have an average of 1304 K and keep within 12 K of that mean for 95% of her measurements, over months of operation. If we assume the temperature measurements are normally distributed, what is the standard deviation and what are the odds that a temperature measurement would be above 1310 K? • T = 1304 ± 12 K (95% Confidence Level)
Normal Distribution • Probability density function (PDF):
#2: Error Propagation • In a falling bead viscometer, the viscosity may be found by the following equation: • Where r is the bead radius, g is gravitational acceleration, V is the terminal velocity, rB is the bead density and rF is the fluid density. If we find, within a 95% confidence level, that the bead density is 2 ± 0.1 g/cm3, the radius is 3 ± 0.1 mm, the fluid density is 1.1 ± 0.2 g/cm3, and, after terminal velocity is achieved, the bead falls 10 ± 0.2 cm in 12 ± 0.5 seconds. What is the calculated viscosity and the uncertainty in its value? Which measurement is the greatest source of error?
#2: Error Propagation • A couple options:
#3: Log Normal • 2. You find the following particle size distributions from a spray dryer experiment: Table of dataIf we were to assume this distribution of particle sizes is log-normal, what would be the mean and standard deviation for the log-normal pdf? • Nonlinear fitting problem, like #6.
#4: Hypothesis Testing • On a certain stage of a distillation column theory predicts the ethanol concentration should be 27%. You take the following measurements over several runs: • What is the likelihood that your measurements match theory?
#4: Hypothesis Testing • Student’s T-Test. • Mean = 24.52 • StDev = 4.2163 • Degrees of Freedomv = na – 1 = 10 -1 = 9
#4: Hypothesis Testing • T-Statistic:
#4: Hypothesis Testing • Use t-statistic in CDF to find probability. Answer = 9.6%
#5: Hypothesis Testing 2 • You are measuring the effectiveness of a new catalyst on a reaction with a great deal of normally distributed variability. You measure the time to 99% conversion of your reactants with both your new and old catalyst for several experimental runs and find the following data: • Given this data, what is the probability that the new catalyst is more effective than the old? What is the probability that they are equally effective?
#5: Hypothesis Testing 2 • Mean A = 10.25, Mean B = 9.50 • StDev A = 1.071, StDev B = 1.066 • Number A = 22, Number B = 20 • Degrees of Freedomv = na + nb – 2 = 40
#5: Hypothesis Testing 2 • T-Statistic:
#5: Hypothesis Testing 2 • Simple rule: • Greater or less than tests use one tail (two unequal areas) and you can easily know which % you want to use by looking at the means. • Equaltest uses two equal tails. • For T-CDF with v = 40 and at t-statistic of -2.295, P = 2.7%. • P that new catalyst is more effective is a one tail test. • More effective (one tail) = 100% - 2.7% = 97% • Equal (two tail) = 2*2.7% = 5%
#6: Non-Linear Fit • The rate of population growth in a bacteria culture are found to be: • It is thought that this data could be fit to the equation: Rate=b1*sin(b2*t)where b1 and b2 are constants to be determined and t is time. Determine the least squares estimated values for b1 and b2 and give an appropriate confidence interval for a confidence level of 90%. Also, what would you anticipate the rate to be at 24 hr? What would the confidence interval for a 95% confidence level be at 24 hr?
%Anthony Butterfield 2009 %Example of nonlinear fit with CIs clear close all b(1)=1/3; b(2)=1; re=0.1; %random noise strength x=linspace(0,6,20)'; %x data for fitting x2=linspace(0,6,100)'; %x data for plotting n=length(x); y=b(1)*sin(b(2)*x)+re*randn(n,1); %y data for fitting, note the random error added in to make it realistic yt=b(1)*sin(b(2)*x2); %theoretical y data for plotting [beta r J]=nlinfit(x,y,@nlinfitsin,[1 1]); %numerically performs a nonlinear fit bci = nlparci(beta,r,J); %returns the c.i. for the parameters, beta [ypred,delta] = nlpredci(@nlinfitsin,x2,beta,r,J); %returns a predicted y and the c.i. for each y [ypred,delta] = nlpredci(@nlinfitsin,x2,beta,r,J); %returns a predicted y and the c.i. for each y disp('Fit to equation: y = b1 sin(b2 * x)') disp(' x data y data') for i=1:n txt=sprintf(' %5.3f %5.3f',x(i),y(i)); disp(txt) end txt=sprintf('b1 was %3.1f, and is estimated to be: %f ± %f (95%% CL)',b(1),beta(1),abs(beta(1)-bci(1,1))); disp(txt) txt=sprintf('b2 was %3.1f, and is estimated to be: %f ± %f (95%% CL)',b(2),beta(2),abs(beta(2)-bci(2,1))); disp(txt) figure(1) hold on grid on scatter(x,y,10,'r') plot(x2,yt,'Color',[1 0.5 0]) %just wanted to give you an example of how to change the line color to something not preset plot(x2,ypred,'b',x2,ypred+delta,'b:',x2,ypred-delta,'b:') hold off
#6: Non-Linear Fit • nlparci • In “theory” b1 = 0.3; estimated b1 = 0.35 ± 0.05 (90% CL) • In “theory” b2 = 1.0; estimated b2 = 1.04 ± 0.04 (90% CL) • nlpredci • At 24 hr “theory” predicts: • Rate = -0.3019 • Fit predicts: • Rate = -0.1090 ± 0.3839 (95% CL)