230 likes | 730 Views
Topic for Today -Statistical Estimation. Estimation Bias and Error Efficiency Point Estimation Population Mean MSE RMSE Interval Estimation – Confidence Intervals When x is normally distributed When x is not normally distributed
E N D
Topic for Today -Statistical Estimation • Estimation • Bias and Error • Efficiency • Point Estimation • Population Mean • MSE • RMSE • Interval Estimation – Confidence Intervals • When x is normally distributed • When x is not normally distributed • When x is normally distributed and Population Standard Deviation unknown
Reference • Burt and Barber, Chapter 8, Pages 253-274
Estimation • Definition- that act of guessing the value of a population parameter • An inferential statistical method closely related to hypothesis testing • Two types of estimation • Point Estimation- estimating a specific value • Interval Estimation- determining the range or interval in which the value of the parameter is thought to be
Estimator Selection • Estimates are statistical guesses, in order to choose the best estimator of a population parameter, two factors must be considered • Bias- The mean error or expected error of a estimator • Note that the mean of a Sampling Distribution is an unbiased estimator of the Population mean • This doesn’t mean that a specific sample mean is equal to the population mean, but it does mean that the errors are zero on average
Measuring Error • A parameter θ – theta is being estimated • The Bias or Mean Error of the Estimator can be calculated by the equation below: • The ^ denotes that the value is an estimate • It is usually best to minimize Mean Error or Bias, but you also have to take into account efficiency
Measuring Error • Efficiency is the likelihood of an estimator delivering an estimate near the true value. • If Bias is the mean in estimation, then efficiency is the Variance Graphical Example on the Chalkboard
Evaluating an Estimator • It is unwise to use only mean error because the function for mean error is really a sum of deviations, so positive errors and negative errors will cancel one another • Mean Squared Error - MSE
A Hypothetical Example of Estimating the Population Mean • μ is unknown • The sample mean is known to be just one value of a normally distributed sampling distribution and is therefore unbiased (but not without error) • The variance of the sampling distribution is related to the sample size and is σ^2/n • Since MSE = Bias^2+Variance, the MSE in this case is 0+σ^2/n, since σ is the population standard deviation (a constant) we know that as n increases, the the MSE decreases
MSE and RMSE • The Mean Squared Error has a shortcoming in that since it is a square, in cases where there are units involved it provides the error in units squared. • To overcome this problem, it is typical for us to take the square root of the MSE and measure the error by using Root Mean Square Error (RSME)
Interval Estimation • Interval Estimates are an improvement upon point estimates because they provide a range of values for θ • Point Estimates are very straightforward because you are basically assuming that the sample parameter and the population parameter are the same, with the only variation being the increase in efficiency as the sample size increases
Confidence • Confidence in statistics is a measure of our surety that a key value falls within a specified interval • A Confidence Interval is always accompanied by a probability that defines the risk of being wrong • This probability of error is usually called α (alpha)
Confidence Intervals • α is user specified and dependent upon the nature of the research, but a common value for α is 0.05 which is also referred to as the 95% Confidence Interval • If we choose a 95% Confidence Interval, what we are saying is that if an infinite number of intervals were constructed at this level, 95% of them would contain the population parameter and 5% would not. • Because the normal distribution has two tails, what we do is place half of the probability in each tail (0.025 in the upper tail and 0.025 in the lower tail)
Confidence Intervals • Since we are using both tails of the distribution, the Z-scores for an α = 0.05 is ± 1.96 which leaves 95% of all possible outcomes in the center of the distribution • The result is a slight change to the equation that defines the relation between the population mean and sample mean Lets look at this in detail on the board
Three Cases • Interval Estimation of μ when x is normally distributed • Interval Estimation of μ when x is not normally distributed but the population Standard Deviation is known • Interval Estimation of μ when x is normally distributed and the population Standard Deviation unknown
Case 1 • Interval Estimation of μ when x is normally distributed • This is the standard situation and you simply use the equation to estimate the population mean at the desired confidence interval
Case 2 • Interval Estimation of μ when x is not normally distributed but the population Standard Deviation is known • If the distribution of x is not normally distributed, then the central limit theorem can only be applied loosely and we can only say that the sampling distribution is approximately normal • When n is greater than 30, this approximation is generally good
Case 3 • Interval Estimation of μ when x is normally distributed and the population Standard Deviation unknown • Without σ we are unable to use the standard form of the equation and can’t directly substitute the s for σ because the equation would have two random variables (the sample mean and sample standard deviation) that are not from the standard normal distribution
Case 3 Continued • Since our confidence intervals are based off of a normal distribution and our equation required a known σ, we have to use another approach to estimate the interval for μ • Note that if s is substituted for σ the uncertainty of the estimate would be greater than if σ were used • But if we had a distribution that was flatter and wider than the normal distribution, then an approximation would be possible
T-Distribution • The Student’s T-distribution is a symmetric and somewhat normal distribution that has a higher variance depending upon the sample size • Page 608-612 of the text, Tables A-4 to A-6 • Degrees of Freedom • History behind the this distribution
William Sealy Gosset • Student (Gosset) worked for the Guinness Brewing Company.
Example Problem – Interval Estimate σ Unknown • Sample mean = 16 minutes travel for supermarket patrons • Sample standard deviation = 4 minutes • n=25 shoppers • n-1 degrees of freedom • α = 0.05
Homework • Given this approximately normally distributed sample with a large sample size (n=50) estimate the population mean at the 90% and 99% Confidence Interval. • 8 12 3 2 1 10 6 6 10 7 9 5 9 8 25 2 24 4 25 11 10 3 12 9 19 12 11 11 10 13 5 5 19 5 5 6 20 3 16 21 37 17 6 54 39 18 29 0 12 9 • What else is needed?
Homework Continued • The value for the population standard deviation is 11.