160 likes | 168 Views
Explore Chi-square test of independence with bivariate categorical data using examples to examine if variables are independent. Understand contingency tables, joint/marginal distributions, independence criteria, chi-square deviation, testing significance, and interpretation.
E N D
Set 7 Bivariate distribution Chi-square test of independence
Bivariate Categorical & Discrete Data • Example 1 • X = Smoking habit • Y = On-the-job-accident • Are smoking and accident occurrence independent? • Example 2 • X = Number of rooms in a house • Y = Number of bathrooms in a house • Are the two variables independent? • Example 3 • X = Number of children in a household • Y = Attitude toward a local proposition • Are the two variables independent?
Contingency Table • Cross-tabulation of individuals according to two characteristics • Example: Smoking and On-the-Job accident study • Data: Table of observed frequencies (Observed counts) Accident Smoking
Table of proportions Joint distribution f(x,y) • Observed probabilities Marginal distribution of Smoking f(x) Marginal distribution accident f(y) • Are smoking and accident occurrence independent?
Conditional distributions of smoking • Distribution of smoking among accident (or no accident) • Column percentages
Conditional distributions of accident • Distribution of accident in each level of smoking • Row percentages
Are the Two Variables Independent? • Example 1 • X = Smoking habit • Y = On-the-job-accident • Are smoking and accident occurrence independent? • X and Y independent when: • f(y|x) = f(y) for all values of x and y • f(x|y) = f(x) for all values of x and y • f(x,y) = f(x) f(y) for all values of x and y
Independent model • Values of one variable do not give any information about the probability of other variable • Conditional and marginal distributions all are equal • Example: Column percentages
Independent model • Conditional and marginal distributions all are equal • Example: Row percentages
Independent model • Product rule: f(x,y) = f(x)f(y), for all pairs (x,y) .24 x .52=.1248 Note that for all pairs (x,y) f(y|x) = f(y) f(x|y) = f(x)
Independent model: Expected counts • Counts given by the independent model • Expected count (x,y) = nf(x)f(y), for all (x,y) 66 x .24 x .52 = 66 x .1248
Deviation of data from independent model • For each pair (x,y), compute the deviation between the observed and expected counts given by the independent model • Chi-square deviation
Chi-square statistic • Chi-square statistic for testing independent is the total chi-square deviations • Is the discrepancy statistically significant? • When the counts are large, the distribution of the X2 statistic is approximately c2 distribution • df=(r-1)(c-1) • r = number of rows • c = number of columns • Approximation works well when frequencies > 5
Test of independence • Is the discrepancy statistically significant? • Is above a threshold? • Select an upper tail probability threshold a • a =.10, .05, .01 • a is called the significance level • Find the chi-square threshold from the table • Reject the independent model at a level when X2 > the threshold
Example of a chi-square density • Upper 5 percentile c2 • P(c2 > c) = 0.05 • df From table 0.05
MINITAB computation • Frequency table in worksheet • Stat >> Tables >> Chi-square Test • Individual data in worksheet (coded) • Stat >> Tables >> Cross tabulation and Chi-square • Row • Column • Chi-square • Chi-square analysis • Expected cell counts • Each cell’s contribution to the Chi-square statistic