1 / 23

Warm up

Learn how to use the Chi-Squared Goodness of Fit test to compare observed and expected distributions of categorical variables. Explore examples such as color distribution in M&M's candies and age distribution in the US population.

tberry
Download Presentation

Warm up

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Warm up • On slide

  2. Section 11.1 Chi-Square

  3. Inference Summary

  4. The questions then are… • What if we want to compare MORE than 2 proportions? • i.e. Let’s examine the proportion of high school students who go on to four-year colleges. Is that proportion different based on race (White, African American, Asian, Hispanic)? We’d be comparing 4 proportions! • What if we want to make a prediction of results based on a predicted model? • i.e. We want to predict the results of mating two red-eyed fruit flies by comparing the actual results to the predicted model. • What if we want to compare two categorical variables to see if there is a relationship? • i.e. Is smoking behavior (current smoker, former smoker, never smoked) associated to socioeconomic status (high, medium, low)?

  5. The answer is… Spelled Chi-Squared. Pronounced like KITE without the “te.”

  6. Then there were three • There are three types of tests • Goodness of fit • Homogeneity of Proportions • Association / Independence • Today our focus will be the Chi-Squared Goodness of Fit test.

  7. Goodness of Fit • The Chi-squared goodness of fit test measures whether an observed sample distribution is significantly different from the hypothesized distribution. • The idea is to compare the observed counts in each category to the expected count for each category based on the hypothesized distribution.

  8. H0:The specified distribution of the categorical variable is correct. • Ha: The specified distribution of the categorical variable is not correct.

  9. Conditions • Use the chi-squared test if • SRS • The variable under study is categorical. -The expected value of the number of sample observations in each level of the variable is at least 5.

  10. Mars, Incorporated makes milk chocolate candies. Here’s what the company’s Consumer Affairs Department says about the color distribution of its M&M’S Milk Chocolate Candies: On average, the new mix of colors of M&M’S Milk Chocolate Candies will contain 13 percent of each of browns and reds, 14 percent yellows, 16 percent greens, 20 percent oranges and 24 percent blues

  11. The one-way table below summarizes the data from a sample bag of M&M’S Milk Chocolate Candies. In general, one-way tables display the distribution of a categorical variable for the individuals in a sample Since the company claims that 24% of all M&M’S Milk Chocolate Candies are blue, we might believe that something fishy is going on. We could use the one-sample z test for a proportion from Chapter 9 to test the hypotheses H0: p = 0.24 Ha: p ≠ 0.24 where p is the true population proportion of blue M&M’S. We could then perform additional significance tests for each of the remaining colors.

  12. Hypotheses The null hypothesis in a chi-square goodness-of-fit test should state a claim about the distribution of a single categorical variable in the population of interest. In our example, the appropriate null hypothesis is H0: The company’s stated color distribution for M&M’S Milk Chocolate Candies is correct. Ha: The company’s stated color distribution for M&M’S Milk Chocolate Candies is not correct.

  13. We can also write the hypotheses in symbols as H0: pblue= 0.24, porange= 0.20, pgreen= 0.16, pyellow= 0.14, pred= 0.13, pbrown= 0.13, Ha: At least one of the pi’s is incorrect where pcolor = the true population proportion of M&M’S Milk Chocolate Candies of that color.

  14. The formula Remember Σ means sum. So complete this equation for each and add them all up!!!!

  15. Example • Back in 1980, the US population had the following distribution by age:

  16. 1996… • Suppose I take a sample of 500 US residents in 1996 and find the following distribution: I want to know: does the distribution of my sample in 1996 match the distribution of age from 1980?

  17. Let’s Compare: Help me fill in the last column! 0-24 25-44 45-64 65+

  18. We see that the distributions are different. The question is ARE THEY SIGNIFICANTLY DIFFERENT?

  19. Characteristics of the Chi-Squared Statistic • Chi-Square is ALWAYS (always? Yes, always) skewed RIGHT. • As the degrees of freedom increase, the graph becomes less skewed. It becomes more symmetric and looks more like a normal curve. • The total area under a chi-square curve is 1. WHY?

  20. In Calc • Put Observed in L1 and Expected in L2 • Stat, Test, χ2 GOF-Test • Enter your df • CAUTION!!!! You still need to know how to use the formula and table… Sometimes your calculator will give you an error! This happened in the 2008 Free Response!

  21. How to recognize Χ2 Goodness of Fit • You have many percents and you want to know if your sample matches the distribution.

  22. Homework Chapter 11 #9, 10, 13(a-c), 15, 19-22explain

More Related