200 likes | 323 Views
Stats 330: Lecture 31. Two contingency table examples. Example 1. A (German) company publishing a women’s magazine surveys its readers aged 18-49, receiving 941 responses. Questions asked were Are you a regular reader (Yes/No) Your Age (18-29, 30-39, 40-49)
E N D
Stats 330: Lecture 31 Two contingency table examples
Example 1 • A (German) company publishing a women’s magazine surveys its readers aged 18-49, receiving 941 responses. Questions asked were • Are you a regular reader (Yes/No) • Your Age (18-29, 30-39, 40-49) • Your education level (L1, L2, L3, L4)
data RegularReader Age Education Freqs 1 No 18-29 L1 38 2 Yes 18-29 L1 4 3 No 30-39 L1 60 4 Yes 30-39 L1 10 5 No 40-49 L1 87 6 Yes 40-49 L1 12 7 No 18-29 L2 90 8 Yes 18-29 L2 44 9 No 30-39 L2 125 10 Yes 30-39 L2 43 11 No 40-49 L2 103 12 Yes 40-49 L2 26 13 No 18-29 L3 54 14 Yes 18-29 L3 39 15 No 30-39 L3 69 . . . . + other lines, 24 in all …………………………………….
Cross-tabs > my.table=xtabs(Freqs~RegularReader+Age+Education, data=reader.df) > my.table , , Education = L1 Age RegularReader 18-29 30-39 40-49 Yes 4 10 12 No 38 60 87 , , Education = L2 Age RegularReader 18-29 30-39 40-49 Yes 44 43 26 No 90 125 103 ,, Education = L3 Age RegularReader 18-29 30-39 40-49 Yes 39 25 13 No 54 69 26 , , Education = L4 Age RegularReader 18-29 30-39 40-49 Yes 22 9 2 No 16 14 10
Fitting models > model1 = glm(Freqs~Age*Education*RegularReader, family=poisson, data=reader.df) > anova(model1, test="Chi") Analysis of Deviance Table Df Deviance Resid. Df Resid. Dev Pr(>Chi) NULL 23 639.39 Age 2 9.346 21 630.05 0.009346 ** Education 3 287.017 18 343.03 < 2.2e-16 *** RegularReader 1 217.035 17 125.99 < 2.2e-16 *** Age:Education 6 64.264 11 61.73 6.097e-12 *** Age:RegularReader 2 21.402 9 40.33 2.253e-05 *** Education:RegularReader 3 32.788 6 7.54 3.570e-07 *** Age:Education:RegularReader 6 7.541 0 0.00 0.273671 Suggests homogeneous association model
AIC > AIC(glm(Freqs~Age+Education+RegularReader, family=poisson, data=reader.df)) [1] 262.1496 > AIC(glm(Freqs~Age+Education*RegularReader, family=poisson, data=reader.df)) [1] 224.6304 > AIC(glm(Freqs~Age*Education+RegularReader, family=poisson, data=reader.df)) [1] 209.8855 > AIC(glm(Freqs~Age*RegularReader+Education, family=poisson, data=reader.df)) [1] 244.7481 > AIC(glm(Freqs~Age*Education+Age*RegularReader, family=poisson,data=reader.df)) [1] 192.4839 >AIC(glm(Freqs~Age*Education+Education*RegularReader,family=poisson,data=reader.df)) [1] 172.3663 > AIC(glm(Freqs~Age*RegularReader+Education*RegularReader,family=poisson, data=reader.df)) [1] 207.2289 > AIC(glm(Freqs~(Age+Education+RegularReader)^2, family=poisson, data=reader.df)) [1] 165.696 > AIC(glm(Freqs~Age*Education*RegularReader, family=poisson, data=reader.df)) [1] 170.1547
Step Call: glm(formula = Freqs ~ Age + Education + RegularReader + Age:Education + Age:RegularReader + Education:RegularReader, family = poisson, data = reader.df) Coefficients: (Intercept) Age30-39 Age40-49 1.9919 0.1332 0.3399 EducationL2 EducationL3 EducationL4 1.8048 1.6382 0.9733 RegularReaderNo Age30-39:EducationL2 Age40-49:EducationL2 1.5540 -0.2226 -0.8150 Age30-39:EducationL3 Age40-49:EducationL3 Age30-39:EducationL4 -0.4080 -1.6065 -0.8759 Age40-49:EducationL4 Age30-39:RegularReadNo Age40-49:RegularReadNo -1.8305 0.4420 0.5995 EducationL2:RegularReaderNo EducationL3:RegularReadNo EducationL4:RegularReadNo -0.8570 -1.1716 -1.5961 Degrees of Freedom: 23 Total (i.e. Null); 6 Residual Null Deviance: 639.4 Residual Deviance: 7.541 AIC: 165.7 All methods agree: homogeneous association model!
Odds ratios: saturated model , , Education = L1 Age RegularReader 18-29 30-39 40-49 Yes 4 10 12 No 38 60 87 L1 table: OR (Odds Yes for 18-29)/(odds Yes for 40-49) = 4*87/(38*12) = 0.7631579 L2 table: OR (Odds Yes for 18-29)/(odds Yes for 40-49) = 1.936752 L3 table: OR (Odds Yes for 18-29)/(odds Yes for 40-49) = 1.444444 L4 table: OR (Odds Yes for 18-29)/(odds Yes for 40-49) = 6.87500
Odds ratios: saturated model Estimate Std Error Age30-39:RegularReaderNo -0.4595 0.6269 Age40-49:RegularReaderNo -0.2703 0.6092 EducationL2:RegularReaderNo -1.5357 0.5569 EducationL3:RegularReaderNo -1.9259 0.5661 EducationL4:RegularReaderNo -2.5697 0.6199 Age30-39:EducationL2:RegularReadNo 0.811 0.6768 Age40-49:EducationL2:RegularReadNo 0.9313 0.6732 Age30-39:EducationL3:RegularReadNo 1.1493 0.7012 Age40-49:EducationL3:RegularReadNo 0.6380 0.7285 Age30-39:EducationL4:RegularReadNo 1.2198 0.8267 Age40-49:EducationL4:RegularReadNo 2.1982 1.0388 > exp(-0.2703) > exp(-0.2703+0.6380) [1] 0.7631505 [1] 1.444409 > exp(-0.2703+0.9313) > exp(-0.2703+ 2.1982) [1] 1.936728 [1] 6.875057
Odds ratios: Homogeneous association From fitting the homogeneous association model: Estimate Std Error Age30-39:RegularReaderNo 0.4420 0.1752 Age40-49:RegularReaderNo 0.5995 0.2013 common estimate (Odds RR Yes for 18-29)/(odds RR Yes for 40-49) > exp(0.5995) [1] 1.821208 odds of being a regular reader 1.8 times higher for 18-29 age group than for 40-49 CI is exp(0.5995 + c(-1,1)*1.96*0.2013) (1.227466, 2.702151)
Example 2: Hair-eye colour 593 students at the University of Delaware classified by sex, eye colour and hair colour. Factors and levels: Sex: male, female Eye colour: Brown, Blue, Hazel, Green Hair Colour: black, brown, red, blond.
data In the form of an array HairEyeColor: , , Sex = Male Eye Hair Brown Blue Hazel Green Black 32 11 10 3 Brown 53 50 25 15 Red 10 10 7 7 Blond 3 30 5 8 , , Sex = Female Eye Hair Brown Blue Hazel Green Black 36 9 5 2 Brown 66 34 29 14 Red 16 7 7 7 Blond 4 64 5 8
Convert to data frame • > HEC.df = as.data.frame(HairEyeColor) • > HEC.df • Hair Eye Sex Freq • 1 Black Brown Male 32 • 2 Brown Brown Male 53 • 3 Red Brown Male 10 • 4 Blond Brown Male 3 • 5 Black Blue Male 11 • 6 Brown Blue Male 50 • 7 Red Blue Male 10 • 8 Blond Blue Male 30 • Black Hazel Male 10 • . . . More lines (32 in all)
Anova > model1 = glm(Freq~Hair*Eye*Sex, family=poisson, data=HEC.df) > anova(model1, test="Chi") Analysis of Deviance Table Df Deviance Resid. Df Resid. Dev Pr(>Chi) NULL 31 475.12 Hair 3 165.592 28 309.53 < 2e-16 *** Eye 3 141.272 25 168.25 < 2e-16 *** Sex 1 1.954 24 166.30 0.16218 Hair:Eye 9 146.444 15 19.86 < 2e-16 *** Hair:Sex 3 8.093 12 11.76 0.04413 * Eye:Sex 3 5.002 9 6.76 0.17162 Hair:Eye:Sex 9 6.761 0 0.00 0.66196
Step • > step(model1, formula(model1), direction = "back") • Call: glm(formula = Freq ~ Hair + Eye + Sex + Hair:Eye + Hair:Sex, • family = poisson, data = HEC.df) • Coefficients: • (Intercept) HairBrown HairRed HairBlond • 3.56273 0.52325 -1.04095 -2.63236 • EyeBlue EyeHazel EyeGreen SexFemale • -1.22378 -1.51146 -2.61007 -0.07411 • HairBrown:EyeBlue HairRed:EyeBlue HairBlond:EyeBlue HairBrown:EyeHazel • 0.87547 0.79889 3.82116 0.72132 • HairRed:EyeHazel HairBlond:EyeHazel HairBrown:EyeGreen HairRed:EyeGreen • 0.89242 1.86813 1.19824 1.99103 • HairBlond:EyeGreen HairBrown:SexFemale HairRed:SexFemale HairBlond:SexFemale • 3.43675 0.07411 0.15867 0.63992 • Degrees of Freedom: 31 Total (i.e. Null); 12 Residual • Null Deviance: 475.1 • Residual Deviance: 11.76 AIC: 190.6 Suggests eye and sex independent given hair
AIC – fancy code formula.list = vector(length=9, mode="list") formula.list[[1]] = Freq ~ Hair + Eye + Sex formula.list[[2]] = Freq ~ Hair + Eye * Sex formula.list[[3]] = Freq ~ Hair * Eye + Sex formula.list[[4]] = Freq ~ Hair * Sex + Eye formula.list[[5]] = Freq ~ Hair*Eye + Hair*Sex formula.list[[6]] = Freq ~ Hair*Sex + Eye*Sex formula.list[[7]] = Freq ~ Hair*Eye + Sex*Eye formula.list[[8]] = Freq ~ (Hair + Eye + Sex)^2 formula.list[[9]] = Freq ~ Hair*Eye*Sex AIC.vec = numeric(9) for(i in 1:9){ model = glm(formula.list[[i]], family=poisson,data=HEC.df) AIC.vec[i] = AIC(model) }
AIC’s for different models > data.frame(as.character(formula.list), AIC.vec) as.character.formula.list. AIC.vec 1 Freq ~ Hair + Eye + Sex 321.1789 2 Freq ~ Hair + Eye * Sex 325.6495 3 Freq ~ Hair * Eye + Sex 192.7353 4 Freq ~ Hair * Sex + Eye 319.0860 5 Freq ~ Hair * Eye + Hair * Sex 190.6425 6 Freq ~ Hair * Sex + Eye * Sex 323.5566 7 Freq ~ Hair * Eye + Sex * Eye 197.2059 8 Freq ~ (Hair + Eye + Sex)^2 191.6400 9 Freq ~ Hair * Eye * Sex 202.8787 Confirms eye colour and sex independent, given hair colour
Marginal independence? Are eye colour and sex independent, ignoring hair colour (in all hair colours combined)? > model4 = glm(formula = Freq ~ Eye*Sex, family = poisson, data = HEC.df) > anova(model4, test="Chi") Analysis of Deviance Table Df Deviance Resid. Df Resid. Dev Pr(>Chi) NULL 31 475.12 Eye 3 141.272 28 333.85 <2e-16 *** Sex 1 1.954 27 331.89 0.1622 Eye:Sex 3 1.529 24 330.36 0.6755 No evidence of interaction, hence eye colour and sex unconditionally independent as well
Marginal odds ratios > xtabs(Freq~Eye+Sex, data=HEC.df) Sex Eye Male Female Brown 98 122 Blue 101 114 Hazel 47 46 Green 33 31 Unconditional OR's (Brown/blue)male / ((Brown/blue)female is 98*114/(101*122) = 0.906671 (Brown/hazel)male / ((Brown/hazel)female is 98*46/(47*122) = 0.7861877 (Brown/Green)male / ((Brown/Green)female is 98*31/(33*122) = 0.7545951
Same calculation from summary summary(glm(Freq ~ Eye*Sex, family=poisson, data=HEC.df)) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 3.19867 0.10102 31.665 < 2e-16 *** EyeBlue 0.03015 0.14179 0.213 0.832 EyeHazel -0.73482 0.17743 -4.142 3.45e-05 *** EyeGreen -1.08846 0.20126 -5.408 6.37e-08 *** SexFemale 0.21905 0.13565 1.615 0.106 EyeBlue:SexFemale -0.097980.19255 -0.509 0.611 EyeHazel:SexFemale -0.24056 0.24782 -0.971 0.332 EyeGreen:SexFemale -0.28157 0.28454 -0.990 0.322 Conf interval for OR Brown/blue)male / ((Brown/blue)female: > exp(-0.09798) [1] 0.906667 > exp(-0.09798+c(-1,1)*1.96*0.19255) [1] 0.62165 1.32236