1 / 24

Chi-square test or c 2 test

Chi-square test or c 2 test. What if we are interested in seeing if my “ crazy ” dice are considered “fair”? What can I do?. Chi-square test. Used to test the counts of categorical data Three types Goodness of fit (univariate) Independence (bivariate)

Download Presentation

Chi-square test or c 2 test

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chi-square testorc2 test

  2. What if we are interested in seeing if my “crazy” dice are considered “fair”? What can I do?

  3. Chi-square test • Used to test the counts of categorical data • Three types • Goodness of fit (univariate) • Independence (bivariate) • Homogeneity (univariate with two samples)

  4. Chi-square distributions

  5. Upper-tail Areas for Chi-square Distributions

  6. c2 distribution • Different df have different curves • Skewed right • Cannot take on negative values • As df increases, curve shifts toward right & becomes more like a normal curve • Each curve has a mode at df-2 and a mean at df

  7. c2 assumptions • SRS – reasonably random sample • Have countsof categorical data & we expect each category to happen at least once • Sample size – to insure that the sample size is large enough we should expect at least five in each category. ***Be sure to list expected counts!! Combine these together: All expected counts are at least 5.

  8. c2 formula

  9. c2 Goodness of fit test Based on df – df = number of categories - 1 • Uses univariate data (one sample, one variable) • Want to see how well the observedcounts “fit” what we expect the counts to be • Use c2cdf function on the calculator to find p-values

  10. Let’s test our dice!

  11. Hypotheses – written in words H0: proportions are equal Ha: at least one proportion is not the same Be sure to write in context!

  12. Does your zodiac sign determine how successful you will be? Fortune magazine collected the zodiac signs of 256 heads of the largest 400 companies. Is there sufficient evidence to claim that successful people are more likely to be born under some signs than others? Aries 23 Libra 18 Leo 20 Taurus 20 Scorpio 21 Virgo 19 Gemini 18 Sagittarius 19 Aquarius 24 Cancer 23 Capricorn 22 Pisces 29 How many would you expect in each sign if there were no difference between them? How many degrees of freedom? I would expect CEOs to be equally born under all signs. So 256/12 = 21.333333 Since there are 12 signs – df = 12 – 1 = 11

  13. Assumptions: • Have a random sample of CEO’s • All expected counts are greater than 5. (I expect 21.33 CEO’s to be born in each sign.) • H0: The proportions of CEO’s born under each sign are the same. • Ha: At least one of the proportion of CEO’s born under each sign is different.

  14. 2.) Compute the residuals. (Observed – Expected)

  15. 3.) Square the residuals

  16. 4. Compute the components for each cell

  17. 5. Find the sum of the components (that’s the chi-square statistic) Σ = 5.094

  18. P-value = c2cdf(5.094, 10^99, 11) = .9265 a = .05 Since p-value > a, I fail to reject H0. There is not sufficient evidence to suggest that the CEOs are born under some signs more than under others.

  19. Offspring of certain fruit flies may have yellow or ebony bodies and normal wings or short wings. Genetic theory predicts that these traits will appear in the ratio 9:3:3:1 (yellow & normal, yellow & short, ebony & normal, ebony & short) A researcher checks 100 such flies and finds the distribution of traits to be 59, 20, 11, and 10, respectively. What are the expected counts? df? Are the results consistent with the theoretical distribution predicted by the genetic model? (see next page) Since there are 4 categories, df = 4 – 1 = 3 Expected counts: Y & N = 56.25 Y & S = 18.75 E & N = 18.75 E & S = 6.25 We expect 9/16 of the 100 flies to have yellow and normal wings. (Y & N)

  20. Assumptions: • Have a random sample of fruit flies • All expected counts are greater than 5. • Expected counts: • Y & N = 56.25, Y & S = 18.75, E & N = 18.75, E & S = 6.25 • H0: The proportions of fruit flies are the same as the theoretical model. • Ha: At least one of the proportions of fruit flies is not the same as the theoretical model. • P-value = c2cdf(5.671, 10^99, 3) = .129 a = .05 • Since p-value > a, I fail to reject H0. There is not sufficient evidence to suggest that the distribution of fruit flies is not the same as the theoretical model.

  21. A company says its premium mixture of nuts contains 10% Brazil nuts, 20% cashews, 20% almonds, 10% hazelnuts and 40% peanuts. You buy a large can and separate the nuts. Upon weighing them, you find there are 112 g Brazil nuts, 183 g of cashews, 207 g of almonds, 71 g or hazelnuts, and 446 g of peanuts. You wonder whether your mix is significantly different from what the company advertises? Why is the chi-square goodness-of-fit test NOT appropriate here? What might you do instead of weighing the nuts in order to use chi-square? Because we do NOT have counts of the type of nuts. We could count the number of each type of nut and then perform a c2 test.

  22. Example: Does the color of a car influence the chance that it will be stolen? Of 830 cars reported stolen, 140 were white, 100 were blue, 270 were red, 230 were black, and 90 were other colors. It is known that 15% of all cars are white, 15% are blue, 35% are red, 30% are black, and 5% are other colors.

  23. Let π1, π2, . . . Π5 denote true proportions of stolen cars that fall into the 5 color categories Ho: π1 = .15, π2 = .15, π3 = .35, π4 = .30, π5 = .05 Ha: Ho is not true. α = .01 Test statistic: Assumptions: The sample was a random sample of stolen cars. All expected counts are greater than 5, so the sample size is large enough to use the chi-square test.

  24. Calculations: = 1.93 + 4.82 + 1.45 + 1.45 + 56.68 = 66.33 P-value: All expected counts exceed 5, so the P-value can be based on a chi-square distribution with 4 df. The computed value is larger than 18.46, so P-value < .001. Because P-value < α, Ho is rejected. There is convincing evidence that at least one of the color proportions for stolen cars differs from the corresponding proportion for all cars.

More Related