200 likes | 394 Views
Loglinear Models for Contingency Tables. Consider an IxJ contingency table that cross-classifies a multinomial sample of n subjects on two categorical responses. The cell probabilities are ( i j ) and the expected frequencies are ( i j = n i j ) .
E N D
Consider an IxJcontingency table that cross-classifies a multinomial sample of n subjects on two categorical responses. • The cell probabilities are (i j) and the expected frequencies are (i j = n i j ). • Loglinearmodel formulas use (i j = n i j )rather than (i j), so they also apply with Poisson sampling for N = IJ independent cell counts (Yi j) having {i j=E(Yi j) }. • In either case we denote the observed cell counts by (nij)
Independence Model Under statistical independence For multinomial sampling Denote the row variable by X and the column variable by Y The formula expressing independence is multiplicative
Thus for a row effect and a column effect This is the loglinearmodel of independence. As usual, identifiability requires constraints such as
The tests using X2and G2are also goodness-of-fit tests of this loglinear model. • Loglinear models for contingency tables are GLMs that treat the N cell counts as independent observations of a Poisson random component. • Loglinear GLMs identify the data as the N cell counts rather than the individual classifications of the n subjects. • The expected cell counts link to the explanatory terms using the log link
The model does not distinguish between response and explanatory variables. • It treats both jointly as responses, modeling ijfor combinations of their levels. • To interpret parameters, however, it is helpful to treat the variables asymmetrically.
We illustrate with the independence model for Ix2 tables. • In row i, the logitequals
The final term does not depend on i; • that is, logit[P(Y=1| X=i)] is identical at each level of X • Thus, independence implies a model of form, logit[P(Y=1| X=i)] = • In each row, the odds of response in column 1 equal exp() = exp(
An analogous property holds when J>2. • Differences between two parameters for a given variable relate to the log odds of making one response, relative to the other, on that variable
Saturated Model Statistically dependent variables satisfy a more complex loglinearmodel The are association terms that reflect deviations from independence. The represent interactions between X and Y, whereby the effect of one variable on ijdepends on the level of the other
Parameter Estimation Let {ij} denote expected frequencies. Suppose all ijk>0 and let ij = log ij . A dot in a subscript denotes the average with respect to that index; for instance, We set , ,
INFERENCE FOR LOGLINEAR MODELS Chi-Squared Goodness-of-Fit Tests • As usual, X 2 and G2 test whether a model holds by comparing cell fitted values to observed counts • Where nijk = observed frequency and =expected frequency . Here df equals the number of cell counts minus the number of model parameters. =2
)=204.38 )=132.95 )=257.24 )=167.34