1 / 25

Statistical Inference for Two Proportions: Concepts & Procedures

Learn how to compare and draw conclusions about population proportions using statistical inference techniques and probability-based decisions. Understand the steps, tools, and conditions involved in making inferences in statistics.

nadinek
Download Presentation

Statistical Inference for Two Proportions: Concepts & Procedures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Inference for Two Proportions Concepts in Statistics

  2. Inferences about a Population Proportion • Random samples vary. When we use a sample proportion to make an inference about a population proportion, there is uncertainty. Inference involves probability. • Under certain conditions, we can model the variability in sample proportions with a normal curve. We use the normal curve to make probability-based decisions about population values. • We can estimate a population proportion with a confidence interval. The confidence interval is an actual sample proportion with a margin of error. We state our confidence in the accuracy of these intervals using probability. • We can test a hypothesis about a population proportion using an actual sample proportion. We base our conclusion on probability using a P-value. The P-value describes the strength of our evidence in rejecting a hypothesis about the population.

  3. Steps in a Statistical Investigation Produce Data:Determine what to measure, then collect the data. Collect categorical data from two samples. In an observational study, we begin with two populations and randomly select a sample from each population. In an experiment, we randomly assign individuals to two treatments. Exploratory Data Analysis:Analyze and summarize the data. We are working with categorical data, so from each sample, we compute a sample proportion. To compare the two samples, we subtract the proportions. When we conduct inference in the next step, our goal is to determine if the actual difference in the sample proportions is significantly different from what we expect in random sampling.

  4. Steps in a Statistical Investigation (continued) Draw a Conclusion:Use data, probability, and statistical inference to draw a conclusion about the populations. • We use simulation to observe the behavior of the differences in sample proportions when we randomly select many, many samples. We create the simulation to reflect a claim about the populations. Then develop a probability model to describe the shape, center, and spread of the sampling distribution. We are interested in the conditions that allow us to use a normal curve. • We use this model to determine when a given difference is unusual in a formal hypothesis test. • We also construct confidence intervals to estimate the difference between two population proportions. As before, we make a probability statement about our confidence in the accuracy of these intervals.

  5. Distribution of Differences in Sample Proportions In the previous module, we learned to estimate and test hypotheses regarding the value of a single population proportion. In this module we want to develop tools comparing two unknown population proportions. The first step is to examine how random samples from the populations compare. In this investigation, we assume we know the population proportions in order to develop a model for the sampling distribution. 

  6. Sampling from Populations with Assumed Parameter Values

  7. Distribution of Differences in Sample Proportions We want to create a mathematical model of the sampling distribution, so we need to understand when we can use a normal curve. We also need to understand how the center and spread of the sampling distribution relates to the population proportions. Shape: In each situation we have encountered so far, the distribution of differences between sample proportions appears somewhat normal, but that is not always true. We discuss conditions for use of a normal model later. Center: Regardless of shape, the mean of the distribution of sample differences is the difference between the population proportions, p1 – p2. This is always true if we look at the long-run behavior of the differences in sample proportions. Spread: We have observed that larger samples have less variability. Advanced theory gives us a formula for the standard error in the distribution of differences between sample proportions.

  8. Distribution of Differences in Sample Proportions Spread: We have observed that larger samples have less variability. Advanced theory gives us this formula for the standard error in the distribution of differences between sample proportions: Notice the following: • The terms under the square root are familiar. These terms are used to compute the standard errors for the individual sampling distributions of  and. • The sample size is in the denominator of each term. As we learned earlier this means that increases in sample size result in a smaller standard error.

  9. Distribution of Differences in Sample Proportions The mean of the differences is the difference of the means. The mean of each sampling distribution of individual proportions is the population proportion, so the mean of the sampling distribution of differences is the difference in population proportions. The standard error of differences relates to the standard errors of the sampling distributions for individual proportions. Since we add these terms, the standard error of differences is always larger than the standard error in the sampling distributions of individual proportions. In other words, there is more variability in the differences.

  10. When is a Normal Model a Good Fit for the Sampling Distribution of Differences in Proportions We use a normal model for the sampling distribution of differences in proportions if the following conditions are met: The number of expected successes and failures in both samples must be at least 10. Here we complete the table to compare the individual sampling distributions for sample proportions to the sampling distribution of differences in sample proportions

  11. Using the Normal Model in Inference When conditions allow the use of a normal model, we use the normal distribution to determine P-values when testing claims and to construct confidence intervals for a difference between two population proportions. We can standardize the difference between sample proportions using a -score. We calculate a -score as we have done before. For a difference in sample proportions, the -score formula is shown below:

  12. Confidence Interval for a Difference in Two Population Proportions: the Basics Every confidence interval has this form: statistic ± margin of error statistic ± margin of error To estimate a difference in population proportions (or a treatment effect), the statistic is a difference in sample proportions, so the confidence interval is (difference in sample proportions) ± margin of error If a normal model is a good fit for the sampling distribution, we use the normal model to describe our confidence that the difference in population proportions lies within a given margin of error of the difference in sample proportions. For example, we can state that we are 95% confident that the difference in population proportions is contained in the following interval: (difference in sample proportions) ± 2(standard error)

  13. 95% Confident Comes from a Normal Model of the Sampling Distribution The following normal model represents the sampling distribution. In the sampling distribution, we can see that the error in this sample difference is less than the margin of error. We know this because the distance between the sample difference and the population difference is shorter than the length of the margin of error (abbreviated MOE in the figure). 

  14. 95% Confidence Here is another illustration of 95% confidence. If we construct confidence intervals with a margin of error equal to 2 standard errors, then 95% confidence means that in the long run, 95% of these confidence intervals will contain the population difference, and 5% of the time, the interval we calculate will not contain it. We show one of these less common intervals with a red dot at the sample difference.

  15. Other Levels of Confidence We can create confidence intervals for other levels of confidence. Changing the level of confidence changes the critical -score. The following image shows the three most commonly used confidence levels and their critical -scores.

  16. Hypothesis Test for Difference in Two Population Proportions This table has examples of research questions and studies that involve two populations or two treatments with a categorical response variable.

  17. Stating Hypotheses about Two Population Proportions Whenever we test a hypothesis, we begin by stating null and alternative hypotheses. The null hypothesis is a statement of “no effect” or “no difference,” so the null hypothesis for all hypothesis tests about two population proportions is . When we say there is no difference in the population proportions (or no treatment effect), it is equivalent to saying that the population proportions are equal: p1 = p2. The alternative hypothesis is one of the following:

  18. Hypothesis Test If a normal model is a good fit for the sampling distribution, we use it to find the P-value. But let’s look at a simulation of the sampling distribution to remind ourselves what the P-value really means. The P-value is the probability that random samples have results at least as extreme as the data if the null hypothesis is true. In terms of -scores, the P-value is the probability that the test statistic has a value more extreme than that associated with the data if the null hypothesis is true.

  19. Finding P-values In a hypothesis test, the P-value is based on the assumption that the null hypothesis is true. But the P-value is also related to the alternative hypothesis.

  20. Thinking Critically about Conclusions from Statistical Studies It is not uncommon to see debate over the conclusions and implications of statistical studies. When we read summaries of statistical studies, it is important to evaluate whether the conclusions are reasonable. Here we discuss two common pitfalls in drawing conclusions from statistical studies. • The conclusion is not appropriate to the study design. • The conclusion confuses statistical significance with practical importance.

  21. Study Design Conclusions

  22. Statistical Significance and Practical Importance Is a statistically significant difference always large enough to be important on a practical level? The answer is no. Recall that when a P-value is less than the level of significance, we say the results are statistically significant. It means that the results are not due to chance. In the case of a difference in sample proportions, we are saying that the observed difference is larger than we expect to see in random samples from populations with the same population proportions. But this does not necessarily mean the difference is large enough to be important in real life.

  23. Review of Type I and Type II Errors Inference is based on probability, so there is a chance of making a wrong decision. When we reject a null hypothesis that is true, we commit a type I error. When we fail to reject a null hypothesis that is false, we commit a type II error.

  24. Decreasing the Chance of Type I or Type II Error How can we decrease the chance of a type I or type II error? Decreasing the chance of a type I error increases the chance of a type II error. The probability of committing a type I error is . If the null hypothesis is true, then the probability that we will reject a true null hypothesis is . The smaller is, the smaller the probability of a type I error. General guidelines for choosing a level of significance: • If the consequences of a type I error are more serious, choose a small level of significance (). • If the consequences of a type II error are more serious, choose a larger level of significance (). • In general, pick the largest level of significance that we can tolerate as the chance of a type I error.

  25. Quick Review • The standard error of differences between sample proportions is related to? • We use a normal model if the counts of expected successes and failures are at least? • What are the critical -values for commonly used confidence levels? • What is the null hypothesis? • How do we make a type I error? • We estimate the standard error by using sample proportions in what formula? • A type II error occurs when?

More Related