130 likes | 339 Views
Analysis of Variance (ANOVA). I231B Quantitative Methods. Syllabus Changes. Thursday April 24 th , Regression April 29: Multivariate Regression May 1: Regression Diagnostics May 6 th : Logistic Regression May 8 th : Display of some advanced topics; Course Review. Analysis of Variance.
E N D
Analysis of Variance (ANOVA) I231B Quantitative Methods
Syllabus Changes • Thursday April 24th, Regression • April 29: Multivariate Regression • May 1: Regression Diagnostics • May 6th: Logistic Regression • May 8th: Display of some advanced topics; Course Review
Analysis of Variance • In its simplest form, it is used to compare means for three or more categories. • Example: • Income (metric) and Marital Status (many categories) • Relies on the F-distribution • Just like the t-distribution and chi-square distribution, there are several sampling distributions for each possible value of df.
What is ANOVA? • If we have a categorical variable with 3+ categories and a metric/scale variable, we could just run 3 t-tests. • The problem is that the 3 tests would not be independent of each other (i.e., all of the information is known). • A better approach: compare the variability between groups (treatment variance + error) to the variability within the groups (error)
The F-ratio • MS = mean square • bg = between groups • wg = within groups df = # of categories – 1 (k-1)
Interpreting the F-ratio • Generally, an f-ratio is a measure of how different the means are relative to the variability within each sample • Larger values of ‘f’ greater likelihood that the difference between means are not just due to chance alone
Null Hypothesis in ANOVA • If there is no difference between the means, then the between-group sum of squares should = the within-group sum of squares.
Visual ANOVA and f-ratio • http://www.psych.utah.edu/stat/introstats/anovaflash.html
F-distribution • A right-skewed distribution • It is a ratio of two chi-square distributions
F-distribution • F-test is always a one-tailed test. • Why?
Relationship to t-test • Why not just run many t-tests between all possible combinations? • As number of comparisons grow, likelihood of some differences are expected– but do not necessarily indicate an overall difference. • Still, t-tests become important after an ANOVA so that we can find out which pairs are significantly different. • Certain ‘corrections’ can be applied to such post-hoc t-tests so that we account for multiple comparisons (e.g., Bonferroni correction, which divides p-value by the number of comparisons being made)
Logic of the ANOVA • Conceptual Intro to ANOVA • Class Example: anova.do and sm96_compressed.dta