150 likes | 162 Views
Chi-square Basics. The Chi-square distribution. Positively skewed but becomes symmetrical with increasing degrees of freedom Mean = k where k = degrees of freedom Variance = 2k Assuming a normally distributed dataset and sampling a single z 2 value at a time 2 (1) = z 2
E N D
The Chi-square distribution • Positively skewed but becomes symmetrical with increasing degrees of freedom • Mean = k where k = degrees of freedom • Variance = 2k • Assuming a normally distributed dataset and sampling a single z2 value at a time • 2(1) = z2 • If more than one… 2(N) =
Why used? • Chi-square analysis is primarily used to deal with categorical (frequency) data • We measure the “goodness of fit” between our observed outcome and the expected outcome for some variable • With two variables, we test in particular whether they are independent of one another using the same basic approach.
One-dimensional • Suppose we want to know how people in a particular area will vote in general and go around asking them. • How will we go about seeing what’s really going on?
Hypothesis: Dems should win district • Solution: chi-square analysis to determine if our outcome is different from what would be expected if there was no preference
Reject H0 • The district will probably vote democratic • However…
Conclusion • Note that all we really can conclude is that our data is different from the expected outcome given a situation • Although it would appear that the district will vote democratic, really we can only conclude they were not responding by chance • Regardless of the position of the frequencies we’d have come up with the same result • In other words, it is a non-directional test regardless of the prediction
More complex • What do stats kids do with their free time?
Is there a relationship between gender and what the stats kids do with their free time? • Expected = (Ri*Cj)/N • Example for males TV: (100*50)/200 = 25
df = (R-1)(C-1) • R = number of rows • C = number of columns
Interpretation • Reject H0, there is some relationship between gender and how stats students spend their free time
Other • Important point about the non-directional nature of the test, the chi-square test by itself cannot speak to specific hypotheses about the way the results would come out • Not useful for ordinal data because of this
Assumptions • Normality • Rule of thumb is that we need at least 5 for our expected frequencies value • Inclusion of non-occurences • Must include all responses, not just those positive ones • Independence • Not that the variables are independent or related (that’s what the test can be used for), but rather as with our t-tests, the observations (data points) don’t have any bearing on one another. • To help with the last two, make sure that your N equals the total number of people who responded
Measures of Association • Contingency coefficient • Phi • Cramer’s Phi • Odds Ratios • Kappa • These were discussed in 5700