440 likes | 586 Views
Discrete Multivariate Analysis. Analysis of Multivariate Categorical Data. References. Fienberg, S. (1980), Analysis of Cross-Classified Data , MIT Press, Cambridge, Mass. Fingelton, B. (1984), Models for Category Counts , Cambridge University Press.
E N D
DiscreteMultivariate Analysis Analysis of Multivariate Categorical Data
References • Fienberg, S. (1980), Analysis of Cross-Classified Data , MIT Press, Cambridge, Mass. • Fingelton, B. (1984), Models for Category Counts , Cambridge University Press. • Alan Agresti (1990) Categorical Data Analysis, Wiley, New York.
Two-way table where Note: X and Y are independent if In this case the log-linear model becomes
Log-Linear model for three-way tables Let mijk denote the expected frequency in cell (i,j,k) of the table then in general where
Hierarchical Log-linear models for categorical Data For three way tables The hierarchical principle: If an interaction is in the model, also keep lower order interactions and main effects associated with that interaction
Maximum Likelihood Estimation Log-Linear Model
For any Model it is possible to determine the maximum Likelihood Estimators of the parameters Example Two-way table – independence – multinomial model or
Log-likelihood where With the model of independence
and with also
Let Now
Now or
Hence and Similarly Finally
Hence Now and
Hence Note or
Comments • Maximum Likelihood estimates can be computed for any hierarchical log linear model (i.e. more than 2 variables) • In certain situations the equations need to be solved numerically • For the saturated model (all interactions and main effects)
Goodness of Fit Statistics These statistics can be used to check if a log-linear model will fit the observed frequency table
Goodness of Fit Statistics The Chi-squared statistic The Likelihood Ratio statistic: d.f. = # cells - # parameters fitted We reject the model if c2 or G2 is greater than
Example:Variables • Systolic Blood Pressure (B)Serum Cholesterol (C)Coronary Heart Disease (H)
Goodness of fit testing of Models MODEL DF LIKELIHOOD- PROB. PEARSON PROB. RATIO CHISQ CHISQ ----- -- ----------- ------- ------- ------- B,C,H. 24 83.15 0.0000 102.00 0.0000 B,CH. 21 51.23 0.0002 56.89 0.0000 C,BH. 21 59.59 0.0000 60.43 0.0000 H,BC. 15 58.73 0.0000 64.78 0.0000 BC,BH. 12 35.16 0.0004 33.76 0.0007 BH,CH. 18 27.67 0.0673 26.58 0.0872 n.s. CH,BC. 12 26.80 0.0082 33.18 0.0009 BC,BH,CH. 9 8.08 0.5265 6.56 0.6824 n.s. Possible Models:1. [BH][CH] – B and C independent given H.2. [BC][BH][CH] – all two factor interaction model
Model 1: [BH][CH] Log-linear parameters Heart disease -Blood Pressure Interaction
Multiplicative effect Log-Linear Model
Model 2: [BC][BH][CH] Log-linear parameters Blood pressure-Cholesterol interaction:
Another Example In this study it was determined for N = 4353 males • Occupation category • Educational Level • Academic Aptidude
Self-employed Business • Teacher\Education • Self-employed Professional • Salaried Employed • Occupation categories • Education levels • Low • Low/Med • Med • High/Med • High
Academic Aptitude • Low • Low/Med • High/Med • High
It is common to handle a Multiway table by testing for independence in all two way tables. • This is similar to looking at all the bivariate correlations • In this example we learn that: • Education is related to Aptitude • Education is related to Occupational category • Education is related to Aptitude Can we do better than this?
Fitting various log-linear models Simplest model that fits is: [Apt,Ed][Occ,Ed] This model implies conditional independence betweenAptitude and Occupation given Education.
Log-linear Parameters Aptitude – Education Interaction