1 / 20

Stats 330: Lecture 31

Stats 330: Lecture 31. Two contingency table examples. Example 1. A (German) company publishing a women’s magazine surveys its readers aged 18-49, receiving 941 responses. Questions asked were Are you a regular reader (Yes/No) Your Age (18-29, 30-39, 40-49)

Download Presentation

Stats 330: Lecture 31

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Stats 330: Lecture 31 Two contingency table examples

  2. Example 1 • A (German) company publishing a women’s magazine surveys its readers aged 18-49, receiving 941 responses. Questions asked were • Are you a regular reader (Yes/No) • Your Age (18-29, 30-39, 40-49) • Your education level (L1, L2, L3, L4)

  3. data RegularReader Age Education Freqs 1 No 18-29 L1 38 2 Yes 18-29 L1 4 3 No 30-39 L1 60 4 Yes 30-39 L1 10 5 No 40-49 L1 87 6 Yes 40-49 L1 12 7 No 18-29 L2 90 8 Yes 18-29 L2 44 9 No 30-39 L2 125 10 Yes 30-39 L2 43 11 No 40-49 L2 103 12 Yes 40-49 L2 26 13 No 18-29 L3 54 14 Yes 18-29 L3 39 15 No 30-39 L3 69 . . . . + other lines, 24 in all …………………………………….

  4. Cross-tabs > my.table=xtabs(Freqs~RegularReader+Age+Education, data=reader.df) > my.table , , Education = L1 Age RegularReader 18-29 30-39 40-49 Yes 4 10 12 No 38 60 87 , , Education = L2 Age RegularReader 18-29 30-39 40-49 Yes 44 43 26 No 90 125 103 ,, Education = L3 Age RegularReader 18-29 30-39 40-49 Yes 39 25 13 No 54 69 26 , , Education = L4 Age RegularReader 18-29 30-39 40-49 Yes 22 9 2 No 16 14 10

  5. Fitting models > model1 = glm(Freqs~Age*Education*RegularReader, family=poisson, data=reader.df) > anova(model1, test="Chi") Analysis of Deviance Table Df Deviance Resid. Df Resid. Dev Pr(>Chi) NULL 23 639.39 Age 2 9.346 21 630.05 0.009346 ** Education 3 287.017 18 343.03 < 2.2e-16 *** RegularReader 1 217.035 17 125.99 < 2.2e-16 *** Age:Education 6 64.264 11 61.73 6.097e-12 *** Age:RegularReader 2 21.402 9 40.33 2.253e-05 *** Education:RegularReader 3 32.788 6 7.54 3.570e-07 *** Age:Education:RegularReader 6 7.541 0 0.00 0.273671 Suggests homogeneous association model

  6. AIC > AIC(glm(Freqs~Age+Education+RegularReader, family=poisson, data=reader.df)) [1] 262.1496 > AIC(glm(Freqs~Age+Education*RegularReader, family=poisson, data=reader.df)) [1] 224.6304 > AIC(glm(Freqs~Age*Education+RegularReader, family=poisson, data=reader.df)) [1] 209.8855 > AIC(glm(Freqs~Age*RegularReader+Education, family=poisson, data=reader.df)) [1] 244.7481 > AIC(glm(Freqs~Age*Education+Age*RegularReader, family=poisson,data=reader.df)) [1] 192.4839 >AIC(glm(Freqs~Age*Education+Education*RegularReader,family=poisson,data=reader.df)) [1] 172.3663 > AIC(glm(Freqs~Age*RegularReader+Education*RegularReader,family=poisson, data=reader.df)) [1] 207.2289 > AIC(glm(Freqs~(Age+Education+RegularReader)^2, family=poisson, data=reader.df)) [1] 165.696 > AIC(glm(Freqs~Age*Education*RegularReader, family=poisson, data=reader.df)) [1] 170.1547

  7. Step Call: glm(formula = Freqs ~ Age + Education + RegularReader + Age:Education + Age:RegularReader + Education:RegularReader, family = poisson, data = reader.df) Coefficients: (Intercept) Age30-39 Age40-49 1.9919 0.1332 0.3399 EducationL2 EducationL3 EducationL4 1.8048 1.6382 0.9733 RegularReaderNo Age30-39:EducationL2 Age40-49:EducationL2 1.5540 -0.2226 -0.8150 Age30-39:EducationL3 Age40-49:EducationL3 Age30-39:EducationL4 -0.4080 -1.6065 -0.8759 Age40-49:EducationL4 Age30-39:RegularReadNo Age40-49:RegularReadNo -1.8305 0.4420 0.5995 EducationL2:RegularReaderNo EducationL3:RegularReadNo EducationL4:RegularReadNo -0.8570 -1.1716 -1.5961 Degrees of Freedom: 23 Total (i.e. Null); 6 Residual Null Deviance: 639.4 Residual Deviance: 7.541 AIC: 165.7 All methods agree: homogeneous association model!

  8. Odds ratios: saturated model , , Education = L1 Age RegularReader 18-29 30-39 40-49 Yes 4 10 12 No 38 60 87 L1 table: OR (Odds Yes for 18-29)/(odds Yes for 40-49) = 4*87/(38*12) = 0.7631579 L2 table: OR (Odds Yes for 18-29)/(odds Yes for 40-49) = 1.936752 L3 table: OR (Odds Yes for 18-29)/(odds Yes for 40-49) = 1.444444 L4 table: OR (Odds Yes for 18-29)/(odds Yes for 40-49) = 6.87500

  9. Odds ratios: saturated model Estimate Std Error Age30-39:RegularReaderNo -0.4595 0.6269 Age40-49:RegularReaderNo -0.2703 0.6092 EducationL2:RegularReaderNo -1.5357 0.5569 EducationL3:RegularReaderNo -1.9259 0.5661 EducationL4:RegularReaderNo -2.5697 0.6199 Age30-39:EducationL2:RegularReadNo 0.811 0.6768 Age40-49:EducationL2:RegularReadNo 0.9313 0.6732 Age30-39:EducationL3:RegularReadNo 1.1493 0.7012 Age40-49:EducationL3:RegularReadNo 0.6380 0.7285 Age30-39:EducationL4:RegularReadNo 1.2198 0.8267 Age40-49:EducationL4:RegularReadNo 2.1982 1.0388 > exp(-0.2703) > exp(-0.2703+0.6380) [1] 0.7631505 [1] 1.444409 > exp(-0.2703+0.9313) > exp(-0.2703+ 2.1982) [1] 1.936728 [1] 6.875057

  10. Odds ratios: Homogeneous association From fitting the homogeneous association model: Estimate Std Error Age30-39:RegularReaderNo 0.4420 0.1752 Age40-49:RegularReaderNo 0.5995 0.2013 common estimate (Odds RR Yes for 18-29)/(odds RR Yes for 40-49) > exp(0.5995) [1] 1.821208 odds of being a regular reader 1.8 times higher for 18-29 age group than for 40-49 CI is exp(0.5995 + c(-1,1)*1.96*0.2013) (1.227466, 2.702151)

  11. Example 2: Hair-eye colour 593 students at the University of Delaware classified by sex, eye colour and hair colour. Factors and levels: Sex: male, female Eye colour: Brown, Blue, Hazel, Green Hair Colour: black, brown, red, blond.

  12. data In the form of an array HairEyeColor: , , Sex = Male Eye Hair Brown Blue Hazel Green Black 32 11 10 3 Brown 53 50 25 15 Red 10 10 7 7 Blond 3 30 5 8 , , Sex = Female Eye Hair Brown Blue Hazel Green Black 36 9 5 2 Brown 66 34 29 14 Red 16 7 7 7 Blond 4 64 5 8

  13. Convert to data frame • > HEC.df = as.data.frame(HairEyeColor) • > HEC.df • Hair Eye Sex Freq • 1 Black Brown Male 32 • 2 Brown Brown Male 53 • 3 Red Brown Male 10 • 4 Blond Brown Male 3 • 5 Black Blue Male 11 • 6 Brown Blue Male 50 • 7 Red Blue Male 10 • 8 Blond Blue Male 30 • Black Hazel Male 10 • . . . More lines (32 in all)

  14. Anova > model1 = glm(Freq~Hair*Eye*Sex, family=poisson, data=HEC.df) > anova(model1, test="Chi") Analysis of Deviance Table Df Deviance Resid. Df Resid. Dev Pr(>Chi) NULL 31 475.12 Hair 3 165.592 28 309.53 < 2e-16 *** Eye 3 141.272 25 168.25 < 2e-16 *** Sex 1 1.954 24 166.30 0.16218 Hair:Eye 9 146.444 15 19.86 < 2e-16 *** Hair:Sex 3 8.093 12 11.76 0.04413 * Eye:Sex 3 5.002 9 6.76 0.17162 Hair:Eye:Sex 9 6.761 0 0.00 0.66196

  15. Step • > step(model1, formula(model1), direction = "back") • Call: glm(formula = Freq ~ Hair + Eye + Sex + Hair:Eye + Hair:Sex, • family = poisson, data = HEC.df) • Coefficients: • (Intercept) HairBrown HairRed HairBlond • 3.56273 0.52325 -1.04095 -2.63236 • EyeBlue EyeHazel EyeGreen SexFemale • -1.22378 -1.51146 -2.61007 -0.07411 • HairBrown:EyeBlue HairRed:EyeBlue HairBlond:EyeBlue HairBrown:EyeHazel • 0.87547 0.79889 3.82116 0.72132 • HairRed:EyeHazel HairBlond:EyeHazel HairBrown:EyeGreen HairRed:EyeGreen • 0.89242 1.86813 1.19824 1.99103 • HairBlond:EyeGreen HairBrown:SexFemale HairRed:SexFemale HairBlond:SexFemale • 3.43675 0.07411 0.15867 0.63992 • Degrees of Freedom: 31 Total (i.e. Null); 12 Residual • Null Deviance: 475.1 • Residual Deviance: 11.76 AIC: 190.6 Suggests eye and sex independent given hair

  16. AIC – fancy code formula.list = vector(length=9, mode="list") formula.list[[1]] = Freq ~ Hair + Eye + Sex formula.list[[2]] = Freq ~ Hair + Eye * Sex formula.list[[3]] = Freq ~ Hair * Eye + Sex formula.list[[4]] = Freq ~ Hair * Sex + Eye formula.list[[5]] = Freq ~ Hair*Eye + Hair*Sex formula.list[[6]] = Freq ~ Hair*Sex + Eye*Sex formula.list[[7]] = Freq ~ Hair*Eye + Sex*Eye formula.list[[8]] = Freq ~ (Hair + Eye + Sex)^2 formula.list[[9]] = Freq ~ Hair*Eye*Sex AIC.vec = numeric(9) for(i in 1:9){ model = glm(formula.list[[i]], family=poisson,data=HEC.df) AIC.vec[i] = AIC(model) }

  17. AIC’s for different models > data.frame(as.character(formula.list), AIC.vec) as.character.formula.list. AIC.vec 1 Freq ~ Hair + Eye + Sex 321.1789 2 Freq ~ Hair + Eye * Sex 325.6495 3 Freq ~ Hair * Eye + Sex 192.7353 4 Freq ~ Hair * Sex + Eye 319.0860 5 Freq ~ Hair * Eye + Hair * Sex 190.6425 6 Freq ~ Hair * Sex + Eye * Sex 323.5566 7 Freq ~ Hair * Eye + Sex * Eye 197.2059 8 Freq ~ (Hair + Eye + Sex)^2 191.6400 9 Freq ~ Hair * Eye * Sex 202.8787 Confirms eye colour and sex independent, given hair colour

  18. Marginal independence? Are eye colour and sex independent, ignoring hair colour (in all hair colours combined)? > model4 = glm(formula = Freq ~ Eye*Sex, family = poisson, data = HEC.df) > anova(model4, test="Chi") Analysis of Deviance Table Df Deviance Resid. Df Resid. Dev Pr(>Chi) NULL 31 475.12 Eye 3 141.272 28 333.85 <2e-16 *** Sex 1 1.954 27 331.89 0.1622 Eye:Sex 3 1.529 24 330.36 0.6755 No evidence of interaction, hence eye colour and sex unconditionally independent as well

  19. Marginal odds ratios > xtabs(Freq~Eye+Sex, data=HEC.df) Sex Eye Male Female Brown 98 122 Blue 101 114 Hazel 47 46 Green 33 31 Unconditional OR's (Brown/blue)male / ((Brown/blue)female is 98*114/(101*122) = 0.906671 (Brown/hazel)male / ((Brown/hazel)female is 98*46/(47*122) = 0.7861877 (Brown/Green)male / ((Brown/Green)female is 98*31/(33*122) = 0.7545951

  20. Same calculation from summary summary(glm(Freq ~ Eye*Sex, family=poisson, data=HEC.df)) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 3.19867 0.10102 31.665 < 2e-16 *** EyeBlue 0.03015 0.14179 0.213 0.832 EyeHazel -0.73482 0.17743 -4.142 3.45e-05 *** EyeGreen -1.08846 0.20126 -5.408 6.37e-08 *** SexFemale 0.21905 0.13565 1.615 0.106 EyeBlue:SexFemale -0.097980.19255 -0.509 0.611 EyeHazel:SexFemale -0.24056 0.24782 -0.971 0.332 EyeGreen:SexFemale -0.28157 0.28454 -0.990 0.322 Conf interval for OR Brown/blue)male / ((Brown/blue)female: > exp(-0.09798) [1] 0.906667 > exp(-0.09798+c(-1,1)*1.96*0.19255) [1] 0.62165 1.32236

More Related