1 / 30

Chapters 18 - 19

Chapters 18 - 19. Sampling Distribution Models and Confidence Intervals. Example. We want to find out the proportion of men in U.S. population. We draw a sample and calculate the proportion of men in the sample. Suppose we find that the proportion = 60%.

italia
Download Presentation

Chapters 18 - 19

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapters 18 - 19 Sampling Distribution Models and Confidence Intervals

  2. Example • We want to find out the proportion of men in U.S. population. • We draw a sample and calculate the proportion of men in the sample. Suppose we find that the proportion = 60%. • We conclude that 60% of the population are men. How much should we trust this estimate? • Suppose the actual percentage is p = 52%

  3. Margin of Errors • Example: • 60% of U.S. population are men with margin of errors ± 5% • This yields an interval of estimate from 55% to 65% • The extent of the interval on either side of the sample proportion or mean is called the margin of error (ME). • In general, the intervals of estimate have the form estimate± ME.

  4. Sampling Distribution • Now imagine what would happen if we looked at the sample proportions for these samples. • The histogram we’d get if we could see all the proportions from all possible samples is called the sampling distribution of the proportions. • What would the histogram of all the sample proportions look like? • It turns out that the histogram is unimodal, symmetric, and centered at p. It’s a normal distribution.

  5. Sampling Distribution Model • The mean of the sampling distribution is at p. • The standard deviation of the distribution is • So, the distribution of the sample proportions is modeled with normal model

  6. Assumptions • Most models are useful only when specific assumptions are true. • There are two assumptions in the case of the model for the distribution of sample proportions: • The Independence Assumption: The sampled values must be independent of each other. • The Sample Size Assumption: The sample size, n, must be large enough.

  7. Assumptions and Conditions • Assumptions are hard—often impossible—to check. • Still, we need to check whether the assumptions are reasonable by checking conditions that provide information about the assumptions. • The corresponding conditions to check before using the Normal to model the distribution of sample proportions are the Randomization Condition, 10% Condition and the Success/Failure Condition.

  8. Conditions • Randomization Condition: The sample should be a simple random sample of the population. • 10% Condition: If sampling has not been made with replacement, then the sample size, n, must be no larger than 10% of the population. • Success/Failure Condition: The sample size has to be big enough so that both np and nq are at least 10.

  9. A Sampling Distribution Model • A proportion is no longer just a computation for a set of data. • It is now a random quantity that has a distribution. • This distribution is called the sampling distribution model for proportions. • Even though we depend on sampling distribution models, we never actually get to see them. • We never actually take repeated samples from the same population and make a histogram. We only imagine or simulate them.

  10. The Central Limit Theoremfor a Proportion Provided that the sampled values are independent and the sample size is large enough, the sampling distribution of is modeled by a Normal model with • Mean: • Standard deviation:

  11. Means – The “Average” of One Die • Let’s start with a simulation of 10,000 tosses of a die. A histogram of the results is:

  12. Means – Averaging More Dice • Looking at the average of two dice after a simulation of 10,000 tosses: • The average of three dice after a simulation of 10,000 tosses looks like:

  13. Means – Averaging Still More Dice • The average of 5 dice after a simulation of 10,000 tosses looks like: • The average of 20 dice after a simulation of 10,000 tosses looks like:

  14. Means – What the Simulations Show • As the sample size (number of dice) gets larger, each sample average is more likely to be closer to the population mean. • So, we see the shape continuing to tighten around 3.5 • And, it probably does not shock you that the sampling distribution of a mean becomes Normal.

  15. The Central Limit Theorem: The Fundamental Theorem of Statistics • The sampling distribution of any mean becomes more nearly Normal as the sample size grows. • All we need is for the observations to be independent and collected with randomization. • We don’t even care about the shape of the population distribution! • The Fundamental Theorem of Statistics is called the Central Limit Theorem (CLT).

  16. The Central Limit Theorem (CLT) The mean of a random sample has a sampling distribution whose shape can be approximated by a Normal model. The larger the sample, the better the approximation will be.

  17. Assumptions and Conditions The CLT requires essentially the same assumptions we saw for modeling proportions: • Independence Assumption: The sampled values must be independent of each other. • Sample Size Assumption: The sample size must be sufficiently large. We can’t check these directly, but we can think about whether the Independence Assumption is plausible, and check • Randomization Condition • 10% Condition:n is less than 10% of the population. • Large Enough Sample Condition: The CLT doesn’t tell us how large a sample we need. For now, you need to think about your sample size in the context of what you know about the population.

  18. CLT and Standard Error • The CLT says that the sampling distribution of any mean or proportion is approximately Normal. • For proportions, the sampling distribution is centered at the population proportion. • For means, it’s centered at the population mean. • For proportions • For means

  19. Standard Error • But we don’t know p or σ, we’re stuck, right? • Since we don’t know p orσ, we can’t find the true standard deviation of the sampling distribution model, so we need to estimate the S..D. of a sampling distribution. We call this estimate (of the S.D. of a sampling distribution a standard error (SE). • For a sample proportion, • For the sample mean,

  20. The Real World & the Model World Be careful! Now we have two distributions to deal with. • The first is the real world distribution of the sample, which we might display with a histogram. • The second is the math world sampling distribution of the statistic, which we model with a Normal model based on the Central Limit Theorem. • Don’t confuse the two!

  21. By the 68-95-99.7% Rule, we know about 68% of all samples will have ’s within 1 SE of p about 95% of all samples will have ’s within 2 SEs of p about 99.7% of all samples will have ’s within 3 SEs of p A Confidence Interval

  22. Consider the 95% level: There’s a 95% chance that p is no more than 2 SEs away from . So, if we reach out 2 SEs, we are 95% sure that p will be in that interval. In other words, if we reach out 2 SEs in either direction of , we can be 95% confident that this interval contains the true proportion. This is called a 95% confidence interval. A Confidence Interval

  23. A 95 % Confidence Interval

  24. What Does “95% Confidence” Mean? • Each confidence interval uses a sample statistic to estimate a population parameter. • But, since samples vary, the statistics we use, and thus the confidence intervals we construct, vary as well. • Our confidence is in the process of constructing the interval, not in any one interval itself. • “95% confidence” means there is 95% chance that our interval will contain the true parameter.

  25. M. E: Certainty vs. Precision • We can claim, with 95% confidence, that the interval contains the true population proportion. • The more confident we want to be, the larger our ME needs to be (makes the interval wider).

  26. M.E: Certainty vs. Precision • To be more confident, we wind up being less precise. • Because of this, every confidence interval is a balance between certainty and precision. • The tension between certainty and precision is always there. Fortunately, in most cases we can be both sufficiently certain and sufficiently precise to make useful statements. • The choice of confidence level is somewhat arbitrary, but keep in mind this tension between certainty and precision when selecting your confidence level. • The most commonly chosen confidence levels are 90%, 95%, and 99% (but any percentage can be used).

  27. Critical Values • The ‘2’ in (our 95% confidence interval) came from the 68-95-99.7% Rule. • Using a table or technology, we find that a more exact value for our 95% confidence interval is 1.96 instead of 2. We call 1.96 the critical value and denote it z*. • For any confidence level, we can find the corresponding critical value. • Example: • For a 90% confidence interval, the critical value is 1.645:

  28. One-Proportion z-Interval • When the conditions are met, we are ready to find the confidence interval for the population proportion, p. • The confidence interval is where • The critical value, z*, depends on the particular confidence level, C, that you specify.

  29. What Can Go Wrong? Margin of Error Too Large to Be Useful: • We can’t be exact, but how precise do we need to be? • One way to make the margin of error smaller is to reduce your level of confidence. (That may not be a useful solution.) • You need to think about your margin of error when you design your study. • To get a narrower interval without giving up confidence, you need to have less variability. • You can do this with a larger sample…

  30. Homework Assignment Chapter 18: • Problem # 11, 15, 23, 29, 31, 47. Chapter 19: • Problem # 7, 9, 11, 13, 17, 27, 35, 37.

More Related