220 likes | 304 Views
Stats for Engineers Lecture 9. Summary From Last Time. Confidence Intervals for the mean. If is known , confidence interval for is to , where is obtained from Normal tables. If is unknown ( only know sample variance ):.
E N D
Summary From Last Time Confidence Intervals for the mean If is known, confidence interval for is to , where is obtained from Normal tables. Ifis unknown (only know sample variance ): Assuming independent Normal data, the confidence interval for is: to . Student t-distribution Q t-tables
Sample size How many random samples do you need to reach desired level of precision? Suppose we want to estimate to within , where (and the degree of confidence) is given. Want Need: - Estimate of (e.g. previous experiments) - Estimate of . This depends on n, but not very strongly. e.g. take for 95% confidence. Rule of thumb: for 95% confidence, choose
Example A large number of steel plates will be used to build a ship. Ten are tested and found to have sample mean weight and sample variance How many need to be tested to determine the mean weight with 95% confidence to within ? Answer: Assuming plates have independent weights with a Normal distribution Take for 95% confidence. i.e. need to test about 28
Number of samplesIf you need 28 samples for the confidence interval to be approximately how many samples would you need to get a more accurate answer with confidence interval • 88.5 • 280 • 2800 • 28000 so need more. i.e. 2800
Linear regression We measure a response variable at various values of a controlled variable e.g. measure fuel efficiency at various values of an experimentally controlled external temperature Linear regression: fitting a straight line to the mean value of as a function of
Distribution of when Regression curve: fits the mean values of the distributions
From a sample of values at various , we want to fit the regression curve. e.g.
Straight line plotsWhich graph is of the line ? 1. • Plot • Plot2 • Plot3 • Plot4 2. 3. 4.
From a sample of values at various , we want to fit the regression curve. e.g.
Or is it What do we mean by a line being a ‘good fit’?
Equation of straight line is Simple model for data: Straight line Random error Simplest assumption: for all , and 's are independent - Linear regression model
Model is Want to estimate parametersa and b, using the data. e.g. - choose and to minimize the errors Maximum likelihood estimate = least -squares estimate Minimize Data point Straight-line prediction E is defined and can be minimized even when errors not Normal – least-squares is simple general prescription for fitting a straight line (but statistical interpretation in general less clear)
The line has been proposed as a line of best fit for the following four sets of data. For which data set is this line the best fit (minimum )? 1. • Pic1 • Pic2 • Pic correct • pic4 Question from Derek Bruff 2. 3. 4.
How to find and that minimize ? For minimum want and , see notes for derivation Solution is the least-squares estimates and : Sample means Where Equation of the fitted line is
Note that since i.e. is on the line
Example: The data y has been observed for various values of x, as follows: Fit the simple linear regression model using least squares. Answer: Want to fit n = 9 , ,
Answer: Want to fit n = 9 , , Now just need So the fit is approximately
Which of the following data are likely to be most appropriately modelled using a linear regression model? 1. • Correct • Errors change • Not straight 2. 3.
Quantifying the goodness of the fit Estimating : variance of y about the fitted line Estimated error is: , so the ordinary sample variance of the 's is In fact, this is biased since two parameters, a and b have been estimated. The unbiased estimate is: [derivation in notes] Residual sum of squares
Which of the following plots would have the greatest residual sum of squares [variance of about the fitted line]? 1. • Pic1 • Pic2 • Pic correct Question from Derek Bruff 2. 3.