130 likes | 257 Views
Model Selections and Comparisons. (Categorical Data Analysis, Ch 9.2). Yumi Kubo Alvin Hsieh. Model 2. Model 1. Survey Data. 1992 by Wright State University School of Medicine and United Health Services in Dayton, Ohio 2276 students in the last year of high school (nonurban area)
E N D
Model Selections and Comparisons (Categorical Data Analysis, Ch 9.2) Yumi Kubo Alvin Hsieh Model 2 Model 1
Survey Data • 1992 by Wright State University School of Medicine and United Health Services in Dayton, Ohio • 2276 students in the last year of high school (nonurban area) • We add more dimensions to 8.2.4 • Variables: Alcohol (A), Cigarette (C), Marijuana (M) • Added variables: Gender (G), Race (R)
Association Graphs (Definitions) • association graph - set of vertices, each vertex is a variable • edge - conditional association between 2 variables • path - sequence of edges leading from one variable to another
Association Graphs (Saturated) Variable Path M M G G R R C A Conditional Association
Association Graphs (Reduced) M G R C A
Data Set Marijuana Use ========================================================== Race = White Race = Other ============================ ========================== Female Male Female Male Alcohol Cigarette yes no yes no yes no yes no yes yes 405 268 453 228 23 23 30 19 no 13 218 28 201 2 19 1 18 no yes 1 17 1 17 0 1 1 8 no 1 117 1 133 0 12 0 17
SAS Program Too large to place here: Go to survey.sas
R Program Original codes (modified below): http://math.cl.uh.edu/~thompsonla/RCode.txt survey<-data.frame(expand.grid(cigarette=c("Yes","No"), alcohol=c("Yes","No"), marijuana=c("Yes","No"), gender=c("female","male"), race=c("white","other") ), count=c(405,13,1,1,268,218,17,117,453,28,1,1,228,201,17, 133,23,2,0,0,23,19,1,12,30,1,1,0,19,18,8,17)) library(MASS) fit.GR<-glm(count~ . + gender*race, data=survey, family=poisson) # mutual independence + GR fit.homog.assoc<-glm(count~ .^2, data=survey, family=poisson) # homogeneous association fit.3fact<-glm(count~ .^3, data=survey, family=poisson) # all three factor terms summary(res<-stepAIC(fit.homog.assoc, scope= list(lower = ~ + cigarette + alcohol + marijuana + gender*race), direction="backward")) fit.AC.AM.CM.AG.AR.GM.GR.MR<-res fit.AC.AM.CM.AG.AR.GM.GR<-update(fit.AC.AM.CM.AG.AR.GM.GR.MR, ~. - marijuana:race) fit.AC.AM.CM.AG.AR.GR<-update(fit.AC.AM.CM.AG.AR.GM.GR, ~. - marijuana:gender)
R Program (P-values) 1-pchisq((15.8-15.3),1) 1-pchisq((16.7-15.8),1) 1-pchisq((19.9-16.7),1) 1-pchisq((28.8-19.9),1) 1-pchisq((40.3-28.8),1)
Model Selection • Select an Alpha level (default to use 0.05) • Look at the P-values of the model • Use (in R): 1-pchisq(G2, df) • Stop selecting once you reach the Alpha in (1) • Model 1: G+R+A+C+M+GR • Model 2: G+R+A+C+M+GR+(all pairs)
Model Selection (Continued) Model 3: G+R+A+C+M+GR+(all pairs)+(all 3 factors) Model 4g: lowest change in G2, taking out CR Model 5: lowest change in G2, taking out CG Model 6: lowest change in G2, taking out MR Model 7: lowest change in G2, taking out GM Consider: A+C+M+AC+AM+CM
Thank You! Any Questions???