1 / 70

Section 10.2

Section 10.2. How Can We Test whether Categorical Variables are Independent?. A Significance Test for Categorical Variables. The hypotheses for the test are: H 0 : The two variables are independent H a : The two variables are dependent (associated)

Download Presentation

Section 10.2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Section 10.2 How Can We Test whether Categorical Variables are Independent?

  2. A Significance Test for Categorical Variables • The hypotheses for the test are: H0: The two variables are independent Ha: The two variables aredependent (associated) • The test assumes random sampling and a large sample size

  3. What Do We Expect for Cell Counts if the Variables Are Independent? • The count in any particular cell is a random variable • Different samples have different values for the count • The mean of its distribution is called an expected cell count • This is found under the presumption that H0 is true

  4. How Do We Find the Expected Cell Counts? • Expected Cell Count: • For a particular cell, the expected cell count equals:

  5. Example: Happiness by Family Income

  6. The Chi-Squared Test Statistic • The chi-squared statistic summarizes how far the observed cell counts in a contingency table fall from the expected cell counts for a null hypothesis

  7. Example: Happiness and Family Income

  8. Example: Happiness and Family Income • State the null and alternative hypotheses for this test • H0: Happiness and family income are independent • Ha: Happiness and family income are dependent (associated)

  9. Example: Happiness and Family Income • Report the statistic and explain how it was calculated: • To calculate the statistic, for each cell, calculate: • Sum the values for all the cells • The value is 73.4

  10. Example: Happiness and Family Income • The larger the value, the greater the evidence against the null hypothesis of independence and in support of the alternative hypothesis that happiness and income are associated

  11. The Chi-Squared Distribution • To convert the test statistic to a P-value, we use the sampling distribution of the statistic • For large sample sizes, this sampling distribution is well approximated by the chi-squared probability distribution

  12. The Chi-Squared Distribution

  13. The Chi-Squared Distribution • Main properties of the chi-squared distribution: • It falls on the positive part of the real number line • The precise shape of the distribution depends on the degrees of freedom: df = (r-1)(c-1)

  14. The Chi-Squared Distribution • Main properties of the chi-squared distribution: • The mean of the distribution equals the df value • It is skewed to the right • The larger the value, the greater the evidence against H0: independence

  15. The Chi-Squared Distribution

  16. The Five Steps of the Chi-Squared Test of Independence 1. Assumptions: • Two categorical variables • Randomization • Expected counts ≥ 5 in all cells

  17. The Five Steps of the Chi-Squared Test of Independence 2. Hypotheses: • H0: The two variables are independent • Ha: The two variables are dependent (associated)

  18. The Five Steps of the Chi-Squared Test of Independence 3.Test Statistic:

  19. The Five Steps of the Chi-Squared Test of Independence 4. P-value: Right-tail probability above the observedvalue, for the chi-squared distribution with df = (r-1)(c-1) 5. Conclusion: Report P-value and interpret in context • If a decision is needed, reject H0 when P-value ≤ significance level

  20. Chi-Squared is Also Used as a “Test of Homogeneity” • The chi-squared test does not depend on which is the response variable and which is the explanatory variable • When a response variable is identified and the population conditional distributions are identical, they are said to be homogeneous • The test is then referred to as a test of homogeneity

  21. Example: Aspirin and Heart Attacks Revisited

  22. Example: Aspirin and Heart Attacks Revisited • What are the hypotheses for the chi-squared test for these data? • The null hypothesis is that whether a doctor has a heart attack is independent of whether he takes placebo or aspirin • The alternative hypothesis is that there’s an association

  23. Example: Aspirin and Heart Attacks Revisited • Report the test statistic and P-value for the chi-squared test: • The test statistic is 25.01 with a P-value of 0.000 • This is very strong evidence that the population proportion of heart attacks differed for those taking aspirin and for those taking placebo

  24. Example: Aspirin and Heart Attacks Revisited • The sample proportions indicate that the aspirin group had a lower rate of heart attacks than the placebo group

  25. Limitations of the Chi-Squared Test • If the P-value is very small, strong evidence exists against the null hypothesis of independence But… • The chi-squared statistic and the P-value tell us nothing about the nature of the strength of the association

  26. Limitations of the Chi-Squared Test • We know that there is statistical significance, but the test alone does not indicate whether there is practical significance as well

  27. Section 10.3 How Strong is the Association?

  28. In a study of the two variables (Gender and Happiness), which one is the response variable? • Gender • Happiness

  29. What is the Expected Cell Count for ‘Females’ who are ‘Pretty Happy’? • 898 • 801.5 • 902 • 521

  30. What is the Expected Cell Count for ‘Females’ who are ‘Pretty Happy’? • 898 • 801.5 • 902 = N*(898+705)/N*(163+898+502)/N • 521

  31. Calculate the • 1.75 • 0.27 • 0.98 • 10.34

  32. At a significance level of 0.05, what is the correct decision? • ‘Gender’ and ‘Happiness’ are independent • There is an association between ‘Gender’ and ‘Happiness’

  33. Analyzing Contingency Tables • Is there an association? • The chi-squared test of independence addresses this • When the P-value is small, we infer that the variables are associated

  34. Analyzing Contingency Tables • How do the cell counts differ from what independence predicts? • To answer this question, we compare each observed cell count to the corresponding expected cell count

  35. Analyzing Contingency Tables • How strong is the association? • Analyzing the strength of the association reveals whether the association is an important one, or if it is statistically significant but weak and unimportant in practical terms

  36. Measures of Association • A measure of association is a statistic or a parameter that summarizes the strength of the dependence between two variables

  37. Difference of Proportions • An easily interpretable measure of association is the difference between the proportions making a particular response

  38. Difference of Proportions

  39. Difference of Proportions • Case (a) exhibits the weakest possible association – no association Accept Credit Card • The difference of proportions is 0

  40. Difference of Proportions • Case (b) exhibits the strongest possible association: Accept Credit Card • The difference of proportions is 100%

  41. Difference of Proportions • In practice, we don’t expect data to follow either extreme (0% difference or 100% difference), but the stronger the association, the large the absolute value of the difference of proportions

  42. Example: Do Student Stress and Depression Depend on Gender?

  43. Example: Do Student Stress and Depression Depend on Gender? • Which response variable, stress or depression, has the stronger sample association with gender?

  44. Example: Do Student Stress and Depression Depend on Gender? Example: Do Student Stress and Depression Depend on Gender? Stress: • The difference of proportions between females and males was 0.35 – 0.16 = 0.19

  45. Example: Do Student Stress and Depression Depend on Gender? Depression: • The difference of proportions between females and males was 0.08 – 0.06 = 0.02

  46. Example: Do Student Stress and Depression Depend on Gender? • In the sample, stress (with a difference of proportions = 0.19) has a stronger association with gender than depression has (with a difference of proportions = 0.02)

  47. Example: Relative Risk for Seat Belt Use and Outcome of Auto Accidents

  48. Example: Relative Risk for Seat Belt Use and Outcome of Auto Accidents • Treating the auto accident outcome as the response variable, find and interpret the relative risk

  49. Large Does Not Mean There’s a Strong Association • A large chi-squared value provides strong evidence that the variables are associated • It does not imply that the variables have a strong association • This statistic merely indicates (through its P-value) how certain we can be that the variables are associated, not how strong that association is

  50. Section 10.4 How Can Residuals Reveal the Pattern of Association?

More Related