230 likes | 533 Views
Confidence intervals: The basics. BPS chapter 14. © 2006 W.H. Freeman and Company. Objectives (BPS chapter 14). Confidence intervals: the basics Estimating with confidence Confidence intervals for the mean How confidence intervals behave Choosing the sample size.
E N D
Confidence intervals:The basics BPS chapter 14 © 2006 W.H. Freeman and Company
Objectives (BPS chapter 14) Confidence intervals: the basics • Estimating with confidence • Confidence intervals for the mean • How confidence intervals behave • Choosing the sample size
Estimating with confidence Although the sample mean xBar is a unique number for any particular sample, if you pick a different sample, you will probably get a different sample mean. In fact, you could get many different values for the sample mean, and virtually none of them would actually equal the true population mean, .
But the sampling distribution is narrower than the population distribution, by a factor of √n. Thus, the estimates gained from our samples are highly likely to be close to the population parameter µ. n Sample means,n subjects Population, xindividual subjects m How can we quantify this? If the population is normally distributed N(µ,σ), then the sampling distribution is N(µ,σ/√n).
Example: CA SATM scores • Suppose we want to estimate the mean SATM score, m, for all California high school seniors. • We take SRS of 500 California seniors, and give them SATM. • Our sample mean SATM score turns out to be xBar = 461. • Suppose we somehow know (past experience?) that s = 100 is a reasonable estimate of the standard deviation of the SATM score among all CA seniors. • We’d like to know m, the true (population) mean SATM score among all CA seniors. • How well does the sample mean xBar = 461 estimate the (unknown) parameter m ? Start by reminding yourself of the basics
Population: Sample: Population variable: Sample mean: = mean of the 500 sample SATM scores ( = 461) Distribution of sample mean: Example: CA SATM scores All CA high-school seniors The n = 500 CA high-school seniors that we tested X = SATM score
Question: how likely is it that is within z* = 2 standard deviations of the (unknown) mean m? Note: how much does 2 standard deviations amount to in this case? This is called the margin of errorm So: we are 95% confident that the true (but unknown) mean m differs from by no more than 9 points. Example: CA SATM scores Answer: by the Empirical rule, the liklihood is about 95%! Lingo: The 95% confidence interval for m is (452, 470)!
We have just computed a 95% confidence interval for m The endpoints of the interval (452, 470) depended on three values: The point estimatexBar for μ. The level of confidenceC, which determined how many standard deviations of xBar to use. The standard deviation of the point estimate, which also depends on the sample size, n. The interval also depended on the fact that we knew that the sampling distribution of xBar is normal (or at least approximately normal, since n is large)! Example: CA SATM scores
Suppose we took another sample of 500 CA high-school seniors, gave them the SATM test, and computed the sample average In all liklihood we’d get a different sample mean. Say we found that Example: CA SATM scores What Does It Mean?: If we re-did the confidence interval calculation, we’d still get the same margin of error m = 9. The new 95% confidence interval would be (459, 477). Every sample would result in a different confidence interval. But we know that 95% of these confidence intervals would contain the true (but unknown) mean m. We don’t know that our particular interval containes m. But we’re 95% confident that it does!
We are 95% confident that the true population mean is contained in the confidence interval. WHY? Because according to the Central Limit Theorem, there is a 95% chance that the estimate of the population mean (i.e. the sample mean), will be within 2 standard deviations (of the sampling distribution!) of the true population mean. Let’s try playing with the “Confidence Interval” applet on the stats portal…. Page 347
n All we need is one SRS of size n, and relying on the properties of the sampling distribution of the sample mean to infer the population mean m. n Sample Population m Implications We don’t need to take lots of random samples to “rebuild” the sampling distribution and find m at its center.
Confidence interval (page 346) A level C confidence interval for xBar has 2 parts: • An interval calculated from the data, usually of the form xBar margin of error • A confidence level C, which gives the probability that the interval will capture the true parameter value in repeated samples, or the success rate for the method.
Go to the standard normal…. m-m m + m Standard normal density 10% in here -z* z* 0 Changing the Confidence Level Suppose we wanted an 80% confidence interval. (C = 0.80) Then the distance between xBar and m must be within the margin of error 80% of the time (i.e. for 80% of all samples xBar) 80% in here So z* = invNorm(0.90) = 1.282 ! 80% in here
Changing the Confidence Level m-m m + m Standard normal density (1-C)/2% in here -z* z* 0 Suppose we wanted a level C confidence interval. Then the distance between xBar and m must be within the margin of error C% of the time (i.e. for C% of all samples xBar) Go to the standard normal…. C% in here So in general z* = invNorm((1+C)/2) ! C% in here The book calls z* a critical value
From z* to the margin of error m Population variable z* is the number of standard deviations of xBar needed to obtain the confidence level C. So the margin of error is z* times the standard deviation of xBar. where
Let’s find z* for various C-levels. (see page 349) Standard Normal Density Curve ?? ?? ?? ?? Percentile = (1-C)/2 + C = (1+C)/2 TI8x command: z* = invNorm((C+1)/2)
Example: CA SATM scores (continued) What if we needed a 98% confidence interval for m, the true mean SATM score for all CA high-school seniors? Confidence Level C = 0.98 Margin of Error z* = invNorm(1.98/2) = invNorm(0.99) = 2.326 (rounded) 98% Confidence Interval
C mm −Z* Z* Link between confidence level and margin of error The confidence level C determines the value of z* (in Table C). The margin of error also depends on z*. Higher confidence C implies a larger margin of error m A lower confidence level C produces a smaller margin of error m
Impact of sample size The spread in the sampling distribution of the mean is a function of the number of individuals per sample. • The larger the sample size, the smaller the standard deviation (spread) of the sample mean distribution. • But the spread only decreases at a rate equal to √n. Standard error ⁄ √n Sample size n
Goals for Estimating Population Parameters: High Confidence Low Margin of Error How to Reduce the Margin of Error Change C-Level? Change Sample Size? Change Population Standard Deviation? Lower C-Level Results in Smaller Value of . Smaller will reduce m. This is usually not be possible to do. Larger n will reduce m since you will divide by a larger value.
Let’s solve this equation for n!! Page 355 Suppose we want a certain margin of error for our confidence interval for a population mean. What sample size will be needed to get the desired result? (Assume we know the population standard deviation.)
Example: CA SATM scores (continued) How large a sample size would we need in order to reduce the margin of error for the 98% confidence interval to plus or minus 4 SATM points? What we know: m = 4, z* = 2.326 (for C = 0.98) s = 100 Find n: So we need 3383 samples.
Sample size and experimental design You may need a certain margin of error (e.g., drug trial, manufacturing specs). In many cases, the population variability (s) is fixed, but we can choose the number of measurements (n). So plan ahead what sample size to use to achieve that margin of error. Remember, though, that sample size is not always stretchable at will. There are typically costs and constraints associated with large samples. The best approach is to use the smallest sample size that can give you useful results.