1 / 21

Chi-Square and Analysis of Variance (ANOVA)

Chi-Square and Analysis of Variance (ANOVA). Lecture 9. The Chi-Square Distribution and Test for Independence. Hypothesis testing between two or more categorical variables. Chi-square Test of Independence. Tests the association between two nominal (categorical) variables.

Download Presentation

Chi-Square and Analysis of Variance (ANOVA)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chi-Square and Analysis of Variance (ANOVA) Lecture 9

  2. The Chi-Square Distribution and Test for Independence Hypothesis testing between two or more categorical variables

  3. Chi-square Test of Independence • Tests the association between two nominal (categorical) variables. • Null Hyp: The 2 variables are independent. • Its really just a comparison between expected frequencies and observed frequencies among the cells in a crosstabulation table.

  4. Example Crosstab: gender x binary question

  5. Degrees of freedom • Chi-square degrees of freedom • df = (r-1) (c-1) • Where r = # of rows, c = # of columns • Thus, in any 2x2 contingency table, the degrees of freedom = 1. • As the degrees of freedom increase, the distribution shifts to the right and the critical values of chi-square become larger.

  6. Chi-Square Distribution • The chi-square distribution results when independent variables with standard normal distributions are squared and summed.

  7. Requirements for Chi-Square test • Must be a random sample from population • Data must be in raw frequencies • Variables must be independent • Categories for each I.V. must be mutually exclusive and exhaustive

  8. Using the Chi-Square Test • Often used with contingency tables (i.e., crosstabulations) • E.g., gender x race • Basically, the chi-square test of independence tests whether the columns are contingent on the rows in the table. • In this case, the null hypothesis is that there is no relationship between row and column frequencies.

  9. Practical Example: • Expected frequencies versus observed frequencies • General Social Survey Example

  10. ANOVA and the f-distribution Hypothesis testing between a 3+ category variable and a metric variable

  11. Analysis of Variance • In its simplest form, it is used to compare means for three or more categories. • Example: • Life Happiness scale and Marital Status (married, never married, divorced) • Relies on the F-distribution • Just like the t-distribution and chi-square distribution, there are several sampling distributions for each possible value of df.

  12. What is ANOVA? • If we have a categorical variable with 3+ categories and a metric/scale variable, we could just run 3 t-tests. • The problem is that the 3 tests would not be independent of each other (i.e., all of the information is known). • A better approach: compare the variability between groups (treatment variance + error) to the variability within the groups (error)

  13. The F-ratio • MS = mean square • bg = between groups • wg = within groups • Numerator is the “effect” and denominator is the “error” • df = # of categories – 1 (k-1)

  14. Between-Group Sum of Squares (Numerator) • Total variability – Residual Variability • Total variability is quantified as the sum of the squares of the differences between each value and the grand mean. • Also called the total sum-of-squares • Variability within groups is quantified as the sum of squares of the differences between each value and its group mean • Also called residual sum-of-squares

  15. Null Hypothesis in ANOVA • If there is no difference between the means, then the between-group sum of squares should = the within-group sum of squares.

  16. F-distribution • F-test is always a one-tailed test. • Why?

  17. Logic of the ANOVA • Conceptual Intro to ANOVA

  18. Bringing it all together: Choosing the appropriate bivariate statistic

  19. Reminder About Causality • Remember from earlier lectures: bivariate statistics do not test causal relationships, they only show that there is a relationship. • Even if you plan to use more sophisticated causal tests, you should always run simple bivariate statistics on your key variables to understand their relationships.

  20. Choosing the Appropriate Statistical Test • General rules for choosing a bivariate test: • Two categorical variables • Chi-Square (crosstabulations) • Two metric variables • Correlation • One 3+ categorical variable, one metric variable • ANOVA • One binary categorical variable, one metric variable • T-test

  21. Assignment #2 • Online (course website) • Due next Monday in class (April 10th)

More Related