Statistical Estimation and Confidence Intervals in Sampling Distribution

Lecture 14Sections 7.1 – 7.2 Objectives: • Estimation and Statistical Intervals • Point Estimation • Large sample confidence interval

Statistical Inference • When we cannot get information from the entire population, we take a sample. • However, as we have seen before statistics calculated from samples vary from sample to sample. (Recall sampling distribution). • When we obtain a statistic from a sample, we do not expect it to be the same as the corresponding parameter. • A single number (point estimate) or an interval of numbers (interval estimate) can be used to estimate a population parameter. • It would be desirable to have a range of plausible values which take into account the sampling distribution of the statistic. A range of values will capture the value of the parameter of interest with some level of confidence. This is known as a confidence interval (CI). • We can find confidence intervals for any parameter of interest, however we will be primarily concerned with the CI’s for a population mean μ , a population proportion π , and population mean difference μ1 − μ2 in this chapter.

Point Estimation Definition. A point estimate of some parameter (unknown) is a single number, calculated from sample data, that can be regarded as an educated guess for the value of the parameter. Example. x = 86oF: a single number for the average temperature in summer in Auburn. μ = the actual average temperature in summer in Auburn, which is unknown. A point estimate is usually obtained by selecting a suitable statistic and calculate its value for the given sample data. The statistic used to calculate an estimate is called an estimator.

Some Point Estimators • Point estimator for a population mean μ is the sample mean . • Point estimator for a population variance σ2 is the sample variance s2. • Point estimator for a population proportion π is the sample proportion p. • Point estimator for a population correlation coefficient ρ is the sample correlation coefficient r.

Properties of a good estimator 1) Unbiasedness Definition. Denote a population parameter generically by the latter θ and denote any estimator of this parameter by . Then, is an unbiased estimator if the mean of the sampling distribution is equal to the true value of the parameter being estimated, i.e., Otherwise, it is said to be biased, and the quantity is called the bias of θ . • Example • is an unbiased estimator of μ . • s2 is an unbiased estimator of σ2 . • p is an unbiased estimator of π .

Properties of a good estimator • 2) Consistency • Definition. If the probability that an estimator falls close to a population parameter θ can be made as near to 1 as desired by increasing the sample size n, then it is said to be a consistent estimator of θ . • Example • is a consistent estimator of μ . • s2 is a consistent estimator of σ2 . • p is a consistent estimator of π .

Large-sample confidence intervals for a population mean A point estimate ,since it is a single number , provides no information about the precision and reliability of the estimator. Example. Use the sample mean to calculate a point estimator for the true average height of students of AU and suppose that =170cm. Because of sampling variability, μ won’t equal to . Point estimate (170cm) says nothing about how close it might be to μ. Instead we calculate and report an entire interval of plausible values – an interval estimate or CI.

Confidence Interval • A CI is usually constructed in such a way that we have a certain confidence that the interval does contain the unknown parameter. • CI is always calculated by first selecting a confidence level, which is a measure of the degree of reliability of interval. • The higher the confidence level, the more strongly we believe that the value of the parameter being estimated lies within the interval. • The precision of an interval estimate is conveyed by the width of the interval. If the confidence level is high and the resulting interval is quite narrow, our knowledge of the value of the parameter is reasonably precise.

Confidence Interval By CLT, for large n, ~ N(μ,σ2/n) approximately and so where 1−α is the degree of confidence or confidence level and z_α / 2 is the upper α / 2 percentile of a standard normal distribution.

Confidence Interval for a population mean μ A 100(1−α )% CI for μ is given by The most common levels of confidence are 90%, 95% and 99%, and the corresponding z critical values are given in the table below.

Things to note 1. We can use this formulas when (a) n is sufficiently large (say, n > 30) and (b) σ is known. 2. It is unrealistic to know σ , in practice. The sample standard deviation (s) can replace σ in the formula if n is sufficiently large. 3. We should not use the above formula when n is small and σ is unknown We need the t-distribution (section 7.4) or nonparametric statistics (won’t be covered in this course) when this happens.

Examples Suppose a large hospital wants to estimate the average length of time patients remain in the hospital. Since finding the average of all patient stays is difficult and time-consuming, we select an SRS of 100 previous patients’ records and find the average of these stays to be 4.53 days with a standard deviation of 3.68 days. Construct a 95% confidence interval for μ . Also, construct 90% CI and 99% CI.

Examples • The alternating current (AC) breakdown voltage of an insulating liquid indicates the dielectric strength. The following data on breakdown voltage (kV) of a particular circuit under certain conditions: • 62 50 53 57 41 53 55 61 59 64 50 53 64 62 50 68 • 54 55 57 50 55 50 56 55 46 55 53 54 52 47 47 55 • 57 48 63 57 57 55 53 59 53 52 50 55 60 50 56 58 • Find a point estimate for the true average breakdown. • b. Construct a 95% confidence interval for the true average breakdown.

True meaning of C.I. With 95% confidence, we can say that µ should be within roughly 2 standard deviations (2*s/√n) from our sample mean . • In 95% of all possible samples of this size n, µ will indeed fall in our confidence interval. • In only 5% of samples would be farther from µ.

Choosing the sample size The half-width of a CI is called the bound on the error of estimation. i.e. or To get a desired bound of error (B) by adjusting the sample size n we use the following: - Determine the desired bound of error (B). - Use the following formula: If σ is known If σ is unknown

Examples Suppose we wish to construct a 95% confidence interval that is to be within .5days of the true value of the average stay. It is known that σ =3.68 from a previous study. What is the sample size required to achieve the desired accuracy? Revisit the breakdown voltage data example. Find the appropriate sample size for estimating the average breakdown voltage to within 1kV with confidence level 95%.

One-sided C.I.s • Sometimes, • An investigator may wish to calculate a 95% CI only upper confidence bound for true reaction time to a particular stimulus, • A reliability engineer may wish to find only a lower confidence bound for true average lifetime of components of a certain type. • A large sample upper confidence bound for μ is A large sample lower confidence bound for μ is The three most commonly used confidence levels, 90%, 95% and 99% use z critical values of 1.28, 1.645 and 2.33 respectively.

Example A sample of 48 shear strength observations gave a sample mean strength of 17.17N/mm2 and a sample standard deviation of 3.28 N/mm2. Find a 95% lower confidence bound for true average shear strength μ .

Statistical Estimation and Confidence Intervals in Sampling Distribution