300 likes | 399 Views
Chapters 18 - 19. Sampling Distribution Models and Confidence Intervals. Example. We want to find out the proportion of men in U.S. population. We draw a sample and calculate the proportion of men in the sample. Suppose we find that the proportion = 60%.
E N D
Chapters 18 - 19 Sampling Distribution Models and Confidence Intervals
Example • We want to find out the proportion of men in U.S. population. • We draw a sample and calculate the proportion of men in the sample. Suppose we find that the proportion = 60%. • We conclude that 60% of the population are men. How much should we trust this estimate? • Suppose the actual percentage is p = 52%
Margin of Errors • Example: • 60% of U.S. population are men with margin of errors ± 5% • This yields an interval of estimate from 55% to 65% • The extent of the interval on either side of the sample proportion or mean is called the margin of error (ME). • In general, the intervals of estimate have the form estimate± ME.
Sampling Distribution • Now imagine what would happen if we looked at the sample proportions for these samples. • The histogram we’d get if we could see all the proportions from all possible samples is called the sampling distribution of the proportions. • What would the histogram of all the sample proportions look like? • It turns out that the histogram is unimodal, symmetric, and centered at p. It’s a normal distribution.
Sampling Distribution Model • The mean of the sampling distribution is at p. • The standard deviation of the distribution is • So, the distribution of the sample proportions is modeled with normal model
Assumptions • Most models are useful only when specific assumptions are true. • There are two assumptions in the case of the model for the distribution of sample proportions: • The Independence Assumption: The sampled values must be independent of each other. • The Sample Size Assumption: The sample size, n, must be large enough.
Assumptions and Conditions • Assumptions are hard—often impossible—to check. • Still, we need to check whether the assumptions are reasonable by checking conditions that provide information about the assumptions. • The corresponding conditions to check before using the Normal to model the distribution of sample proportions are the Randomization Condition, 10% Condition and the Success/Failure Condition.
Conditions • Randomization Condition: The sample should be a simple random sample of the population. • 10% Condition: If sampling has not been made with replacement, then the sample size, n, must be no larger than 10% of the population. • Success/Failure Condition: The sample size has to be big enough so that both np and nq are at least 10.
A Sampling Distribution Model • A proportion is no longer just a computation for a set of data. • It is now a random quantity that has a distribution. • This distribution is called the sampling distribution model for proportions. • Even though we depend on sampling distribution models, we never actually get to see them. • We never actually take repeated samples from the same population and make a histogram. We only imagine or simulate them.
The Central Limit Theoremfor a Proportion Provided that the sampled values are independent and the sample size is large enough, the sampling distribution of is modeled by a Normal model with • Mean: • Standard deviation:
Means – The “Average” of One Die • Let’s start with a simulation of 10,000 tosses of a die. A histogram of the results is:
Means – Averaging More Dice • Looking at the average of two dice after a simulation of 10,000 tosses: • The average of three dice after a simulation of 10,000 tosses looks like:
Means – Averaging Still More Dice • The average of 5 dice after a simulation of 10,000 tosses looks like: • The average of 20 dice after a simulation of 10,000 tosses looks like:
Means – What the Simulations Show • As the sample size (number of dice) gets larger, each sample average is more likely to be closer to the population mean. • So, we see the shape continuing to tighten around 3.5 • And, it probably does not shock you that the sampling distribution of a mean becomes Normal.
The Central Limit Theorem: The Fundamental Theorem of Statistics • The sampling distribution of any mean becomes more nearly Normal as the sample size grows. • All we need is for the observations to be independent and collected with randomization. • We don’t even care about the shape of the population distribution! • The Fundamental Theorem of Statistics is called the Central Limit Theorem (CLT).
The Central Limit Theorem (CLT) The mean of a random sample has a sampling distribution whose shape can be approximated by a Normal model. The larger the sample, the better the approximation will be.
Assumptions and Conditions The CLT requires essentially the same assumptions we saw for modeling proportions: • Independence Assumption: The sampled values must be independent of each other. • Sample Size Assumption: The sample size must be sufficiently large. We can’t check these directly, but we can think about whether the Independence Assumption is plausible, and check • Randomization Condition • 10% Condition:n is less than 10% of the population. • Large Enough Sample Condition: The CLT doesn’t tell us how large a sample we need. For now, you need to think about your sample size in the context of what you know about the population.
CLT and Standard Error • The CLT says that the sampling distribution of any mean or proportion is approximately Normal. • For proportions, the sampling distribution is centered at the population proportion. • For means, it’s centered at the population mean. • For proportions • For means
Standard Error • But we don’t know p or σ, we’re stuck, right? • Since we don’t know p orσ, we can’t find the true standard deviation of the sampling distribution model, so we need to estimate the S..D. of a sampling distribution. We call this estimate (of the S.D. of a sampling distribution a standard error (SE). • For a sample proportion, • For the sample mean,
The Real World & the Model World Be careful! Now we have two distributions to deal with. • The first is the real world distribution of the sample, which we might display with a histogram. • The second is the math world sampling distribution of the statistic, which we model with a Normal model based on the Central Limit Theorem. • Don’t confuse the two!
By the 68-95-99.7% Rule, we know about 68% of all samples will have ’s within 1 SE of p about 95% of all samples will have ’s within 2 SEs of p about 99.7% of all samples will have ’s within 3 SEs of p A Confidence Interval
Consider the 95% level: There’s a 95% chance that p is no more than 2 SEs away from . So, if we reach out 2 SEs, we are 95% sure that p will be in that interval. In other words, if we reach out 2 SEs in either direction of , we can be 95% confident that this interval contains the true proportion. This is called a 95% confidence interval. A Confidence Interval
What Does “95% Confidence” Mean? • Each confidence interval uses a sample statistic to estimate a population parameter. • But, since samples vary, the statistics we use, and thus the confidence intervals we construct, vary as well. • Our confidence is in the process of constructing the interval, not in any one interval itself. • “95% confidence” means there is 95% chance that our interval will contain the true parameter.
M. E: Certainty vs. Precision • We can claim, with 95% confidence, that the interval contains the true population proportion. • The more confident we want to be, the larger our ME needs to be (makes the interval wider).
M.E: Certainty vs. Precision • To be more confident, we wind up being less precise. • Because of this, every confidence interval is a balance between certainty and precision. • The tension between certainty and precision is always there. Fortunately, in most cases we can be both sufficiently certain and sufficiently precise to make useful statements. • The choice of confidence level is somewhat arbitrary, but keep in mind this tension between certainty and precision when selecting your confidence level. • The most commonly chosen confidence levels are 90%, 95%, and 99% (but any percentage can be used).
Critical Values • The ‘2’ in (our 95% confidence interval) came from the 68-95-99.7% Rule. • Using a table or technology, we find that a more exact value for our 95% confidence interval is 1.96 instead of 2. We call 1.96 the critical value and denote it z*. • For any confidence level, we can find the corresponding critical value. • Example: • For a 90% confidence interval, the critical value is 1.645:
One-Proportion z-Interval • When the conditions are met, we are ready to find the confidence interval for the population proportion, p. • The confidence interval is where • The critical value, z*, depends on the particular confidence level, C, that you specify.
What Can Go Wrong? Margin of Error Too Large to Be Useful: • We can’t be exact, but how precise do we need to be? • One way to make the margin of error smaller is to reduce your level of confidence. (That may not be a useful solution.) • You need to think about your margin of error when you design your study. • To get a narrower interval without giving up confidence, you need to have less variability. • You can do this with a larger sample…
Homework Assignment Chapter 18: • Problem # 11, 15, 23, 29, 31, 47. Chapter 19: • Problem # 7, 9, 11, 13, 17, 27, 35, 37.