300 likes | 489 Views
Chapter 7. Inference for Distributions. Confidence Interval Review.
E N D
Chapter 7 Inference for Distributions
Confidence Interval Review • By measuring the heights of 62 six-year-old girls selected at random, someone has determined that a 95% confidence interval for the population mean height is (42.2 inches, 46.1 inches). Answer the following questions with “Yes”, “No”, or “Can’t tell” • Does the population mean lie in the interval (42.2, 46.1) • Does the sample mean lie in the interval (42.2, 46.1) • For a future sample of 62 six-year-old girls, will the sample mean lie in the interval (42.2, 46.1) • Do 95% of the sample data lie in the interval (42.2, 46.1) • For a greater confidence, say 99%, will the confidence interval calculation from the same data produce an interval narrower than (42.2, 46.1)
Introduction • We began our study of data analysis by learning graphical and numerical tools for describing the distribution of a single variable and for comparing several distributions • Our study of the practice of statistical inference begins in the same way • With inference about a single distribution and comparison of two distributions
Preface • Two important aspects of any distribution are its center and spread (week 1 again!) • If the distribution is Normal, we describe its center by the mean and its spread by the standard deviation • The previous chapter emphasized the reasoning of tests and confidence intervals • Now we emphasize statistical practice • We no longer assume that population standard deviations are known ( is no longer known)
Section 7.1 Inference for the Mean of a Population
Introduction • So far in all our inference for the population mean, , we have assumed we know . • Note that both CI’s and significant tests depend on • But usually we don’t know . Now what!? • A sensible idea would be to use s, the sample standard deviation, as an estimate of , the population standard deviation.
Introduction • We know that s changes from sample to sample. • So we are adding some variability into our equations. • Question: How does this affect the distribution of ?
Introduction • We know that if x is normal then: • Where is the standard deviation of the sampling distribution of x-bar. • When we don’t know we will use instead. This is called the standard error of x-bar • We usually denote it as
Introduction • Now we also know that if x is normal • But if we use the standard error, , does
Introduction • Unfortunately no. But using s does follow a distribution called the t-distribution. • Where n-1 is the degrees of freedom • Notice then that for each sample size there is a different t-distribution.
History of the t distribution • The t distributions were discovered in 1908 by William S. Gosset, a statistician working for the Guinness brewing company • He published under the pen name “Student” because Guinness didn’t want competitors to know that they were gaining an industrial advantage from employing statisticians Brilliant!
Properties of the t-distributions • Symmetric • Mean = 0 • Bell shaped • The smaller the df, the larger the tail area. • The smaller the df, the larger the spread. • This larger spread or variability is due to using the sample standard deviation as an estimate for .
Example Notice that the t-distribution approaches the standard normal as the df increase Source: http://en.wikipedia.org/wiki/Student's_t-distribution
T-table • Since we have a new distribution, we get to learn to use a new table to find areas under the curve and quantiles. • This is table D (back cover) • Notice that the table is not nearly as comprehensive as the standard normal table • So, finding P-values from a t-table is a little different from finding the values from a z-table.
The One-Sample t Confidence Interval • How does using s affect confidence intervals for the mean ? • You will see that the one-sample t confidence interval is similar in both reasoning and computational detail to the z confidence interval of Chapter 6
The One-Sample t Confidence Interval • Suppose that a SRS of size n is drawn from a population with an unknown mean and unknown standard deviation . A level C confidence interval for is: • Where t* is the value for the tn-1 density curve with area C between -t* and t*. • The margin of error is • This interval is exact when the population distribution is Normal and approximately correct for large n in other cases.
One-sample t confidence interval The area between the critical values –t* and t* under the t(n-1) curve is C. t(n-1) Curve P=(1-C)/2 P=(1-C)/2 -t* 0 +t*
Case 7.1 • The following data are the amounts of vitamin C, measured in milligrams per 100 grams of blend, for a random sample of size 8 from a production run: 26 31 23 22 11 22 14 31 • We want to find a 95% confidence interval for , the mean vitamin C content of the CSB produced during this run
Answer n = 8 x-bar = 22.5 s = 7.19 From Table D we find t*(7) = 2.365 We are 95% confident that the mean vitamin C content of the CSB for this run is between 16.5 and 28.5 mg/100g.
Answer • In this example we have given the actual interval ( 16.5 , 28.5 ) as our answer • Sometimes, we prefer to report the mean and margin of error: The mean vitamin C content is 22.5 mg/100g with a margin of error of 6.0 mg/100g.
One-sample t test • In tests of significance, as in confidence intervals, we allow for unknown by using s and replacing z by t • Remember: let n be the sample size • If σ is known and n is large then • If σ is NOT known then
The one-sample t test • Suppose that a SRS of size n is drawn from a population having unknown mean • To test the hypothesis H0 : = 0 based on a SRS of size n, compute the one-sample t statistic:
The one-sample t test • The P-value for a test of H0 against • Ha : > 0 is P(T t) • Ha : < 0 is P(T t) • Ha : 0 is 2P(T |t|) • These P-values are exact if the population distribution is Normal and are approximately correct for large n in other cases
Case 7.1 continued… • Recall that n = 8, x-bar = 22.50, and s = 7.19 • Suppose we want to test that the mean vitamin C content in the final product is 40 • Hypotheses: H0: µ = 40 Ha: µ ≠ 40
Test statistic: • P-value: • Conclusion: • We reject H0 and conclude that the vitamin C content for this run is below specifications.
Section 7.1 Summary • Significance tests and confidence intervals for the mean µ of a Normal population are based on the sample mean x-bar of a SRS. Because of the Central Limit Theorem, the resulting procedures are approximately correct for other population distributions when the sample is large.
Section 7.1 Summary • The standardized sample mean, or one-sample z statistic, has the N(0,1) distribution. If the standard deviation of x-bar is replaced by the standard error, the one-sample t statistic has the t distribution with n-1 degrees of freedom.
Section 7.1 Summary • There is a t distribution for every positive degrees of freedom k. All are symmetric distributions similar in shape to the Normal distributions. The t(k) distribution approaches the N(0,1) distribution as k increases.
Section 7.1 Summary • A level C confidence interval for the mean µ of a Normal population is where t* is the value for the t(n-1) density curve with area C between –t* and t*. The quantity that is +/- is the margin of error.
Section 7.1 Summary • Significance tests for H0: µ = µ0 are based on the t statistic. P-values or fixed significance levels are computed from the t(n-1) distribution.