Model Selections and Comparisons

Model Selections and Comparisons (Categorical Data Analysis, Ch 9.2) Yumi Kubo Alvin Hsieh Model 2 Model 1

Survey Data • 1992 by Wright State University School of Medicine and United Health Services in Dayton, Ohio • 2276 students in the last year of high school (nonurban area) • We add more dimensions to 8.2.4 • Variables: Alcohol (A), Cigarette (C), Marijuana (M) • Added variables: Gender (G), Race (R)

Association Graphs (Definitions) • association graph - set of vertices, each vertex is a variable • edge - conditional association between 2 variables • path - sequence of edges leading from one variable to another

Association Graphs (Saturated) Variable Path M M G G R R C A Conditional Association

Association Graphs (Reduced) M G R C A

Data Set Marijuana Use ========================================================== Race = White Race = Other ============================ ========================== Female Male Female Male Alcohol Cigarette yes no yes no yes no yes no yes yes 405 268 453 228 23 23 30 19 no 13 218 28 201 2 19 1 18 no yes 1 17 1 17 0 1 1 8 no 1 117 1 133 0 12 0 17

SAS Program Too large to place here: Go to survey.sas

R Program Original codes (modified below): http://math.cl.uh.edu/~thompsonla/RCode.txt survey<-data.frame(expand.grid(cigarette=c("Yes","No"), alcohol=c("Yes","No"), marijuana=c("Yes","No"), gender=c("female","male"), race=c("white","other") ), count=c(405,13,1,1,268,218,17,117,453,28,1,1,228,201,17, 133,23,2,0,0,23,19,1,12,30,1,1,0,19,18,8,17)) library(MASS) fit.GR<-glm(count~ . + gender*race, data=survey, family=poisson) # mutual independence + GR fit.homog.assoc<-glm(count~ .^2, data=survey, family=poisson) # homogeneous association fit.3fact<-glm(count~ .^3, data=survey, family=poisson) # all three factor terms summary(res<-stepAIC(fit.homog.assoc, scope= list(lower = ~ + cigarette + alcohol + marijuana + gender*race), direction="backward")) fit.AC.AM.CM.AG.AR.GM.GR.MR<-res fit.AC.AM.CM.AG.AR.GM.GR<-update(fit.AC.AM.CM.AG.AR.GM.GR.MR, ~. - marijuana:race) fit.AC.AM.CM.AG.AR.GR<-update(fit.AC.AM.CM.AG.AR.GM.GR, ~. - marijuana:gender)

R Program (P-values) 1-pchisq((15.8-15.3),1) 1-pchisq((16.7-15.8),1) 1-pchisq((19.9-16.7),1) 1-pchisq((28.8-19.9),1) 1-pchisq((40.3-28.8),1)

Model Selection • Select an Alpha level (default to use 0.05) • Look at the P-values of the model • Use (in R): 1-pchisq(G2, df) • Stop selecting once you reach the Alpha in (1) • Model 1: G+R+A+C+M+GR • Model 2: G+R+A+C+M+GR+(all pairs)

Model Selection (Continued) Model 3: G+R+A+C+M+GR+(all pairs)+(all 3 factors) Model 4g: lowest change in G2, taking out CR Model 5: lowest change in G2, taking out CG Model 6: lowest change in G2, taking out MR Model 7: lowest change in G2, taking out GM Consider: A+C+M+AC+AM+CM

Goodness-of-Fit tests(Table 9.2)

Thank You! Any Questions???

Model Selections and Comparisons

Model Selections and Comparisons

Presentation Transcript

Making Selections

Transition Economies: Porter Model Comparisons

Modifiers and Comparisons

Update on hydrodynamic model comparisons

Connecting Selections

Input Selections

Course and Subject Selections

SEM Selections

SUSTAINABILITY MCDM MODEL COMPARISONS

Selections:

Selections:

CDM Absorption Forward and Inverse Model Comparisons

LYRA Tests and Selections

Selections

Evaluation and model comparisons for Volatil Organic Compounds (VOC)

Flux Model Comparisons

Program Selections

VCA Assessment and Evaluation Model External Comparisons

Model and Variable Selections for Personalized Medicine

PCI Risk Model Comparisons