670 likes | 1.3k Views
Analysis of count data. Introduction to log-linear models. Log-linear analysis. Contingency-table analysis Categorical data analysis Discrete multivariate analysis (Bishop, Fienberg and Holland, 1975) Analysis of cross-classified data
E N D
Analysis of count data Introduction to log-linear models
Log-linear analysis • Contingency-table analysis • Categorical data analysis • Discrete multivariate analysis (Bishop, Fienberg and Holland, 1975) • Analysis of cross-classified data • Multivariate analysis of qualitative data (Goodman, 1978) • Count data analysis
Log-linear model fit a model to a table of counts / frequencies Two data sets: Survey: political attitudes of British electors Survey: leaving parental home in the Netherlands
Survey: political attitudes of British electors Source: Payne, C. (1977) The log-linear model for contingency. In: C.O. Muircheartaigh and C. Payne eds. The analysis of survey data. Vol 2: Model fitting, Wiley, New York, pp. 105-144 [data p. 106].(from Butler and Stokes, ‘Political change in Britain’, Macmillan, 2nd edidition, 1974)
Counts are generated by Poisson process Poisson distribution
The Poisson probability model Let N be a random variable representing the number of events during a unit interval and let n be a realisation of n (COUNT): N is a Poisson r.v. following a Poisson distribution with parameter : The parameter is the expected number of events per unit time interval: = E[N]
Likelihood function Probability mass function: Log-likelihood function: Likelihood equations to determine ‘best’ value of
Likelihood equations Hence: Hence: Var(N) =
Let i represent an individual with characteristics xi The probability of observing ni events during a unit interval is: with or Log-linear model
The log-linear model The objective of log-linear analysis is to determine if the distribution of counts among the cells of a table can be explained by a simpler, underlying structure. Log-linear models specify different structures in terms of the cross-classified variables (rows, columns and layers of the table).
Log-linear models for two-way tables Saturated log-linear model: Overall effect (level) Main effects (marginal freq.) Interaction effect In case of 2 x 2 table: 4 observations 9 parameters Normalisation constraints
Leaving home Descriptive statistics • Counts • Percentages • Odds of leaving home early rather than late Reference category
Leaving home Log-linear models for two-way tables4 models Model 1: Null model or overall effect model All categories are equiprobable (an observation is equally likely to fall into any cell) for all i and j Exp(4.887) = 132.5 = 530/4 = 4.887 s.e. 0.0434 ij is expected count (frequency) in cell (ij): category i of variable A (row) and category j of variable B (column)
Leaving home Where ij is a cell frequency generated by a Poisson process and Var[aX] = a2 Var[X] where a is a constant (e.g. Fingleton, 1984, p. 29)
Leaving home Log-linear models for two-way tables Model 2: B null model Categories of variable B (sex) are equiprobable within levels of variable A (time) for all j GLIM estimate s.e. Parameter Exp(parameter) 4.649 0.06914 Overall effect 104.5 0.0000 TIME(1) 0.4291 0.08886 TIME(2) 1.536
Leaving home Log-linear models for two-way tables Model 2: B null model Categories of variable B (sex) are equiprobable within levels of variable A (time) for all j SPSS estimate s.e. Parameter Exp(parameter) 5.773 0.0558 Overall effect 321.5 -0.4283 0.0888 TIME(1) 0.5616 0.0000 TIME(2)
Leaving home Log-linear models for two-way tables Model 3: independence model (unsaturated model) Categories of variable B (sex) are not equiprobable but the probability is independent of levels of variable A (time) estimate s.e. Parameter Exp(parameter) 4.697 0.0806 Overall effect 109.62 0.4291 0.0889 TIME(2) 1.536 -0.09819 0.0870 SEX(2) 0.906 GLIM
Leaving home LOG-LINEAR MODEL: predictions Females leaving home early: 109.62 Females leaving home late: 109.62 * 1.536 = 168.37 Males leaving home early: 109.62 * 0.906 = 99.37 Males leaving home late: 109.62 * 1.536 * 0.906 = 152.63
Leaving home SPSS Parameter Estimate SE 1 5.0280 .0721 Overall effect 2 -.4291 .0889 Time(1) 3 .0000 . Time(2) 4 .0982 .0870 Sex(1) 5 .0000 . Sex (2)
Leaving home Log-linear models for two-way tables Model 4: saturated model The values of categories of variable B (sex) depend on levels of variable A (time) estimate s.e. parameter 4.905 0.08607 Overall effect 0.05757 0.1200 TIME(2) -0.6012 0.1446 SEX(2) 0.8201 0.1831 TIME(2).SEX(2) GLIM
Leaving home Parameter Estimate SE Parameter 1 5.1846 .0748 Overall effect 2 -.8738 .1379 Time(1) 3 .0000 . Time(2) 4 -.2183 .1121 Sex(1) 5 .0000 . Sex(2) 6 .8164 .1827 Time(1) * Sex(1) 7 .0000 . Time(1) * Sex(2) 8 .0000 . Time(2) * Sex(1) 9 .0000 . Time(2) * Sex(2) SPSS
Leaving home LOG-LINEAR MODEL: predictions Expected frequencies Observed Model 1 Model 2 Model 3 Model 4 Model 5 Fem_<20 F11 135 132.50 104.50 139.00 109.63 135.00 Mal_<20 F12 74 132.50 104.50 126.00 99.37 74.00 Fem_>20 F21 143 132.50 160.50 139.00 168.37 143.00 Mal_>20 F22 178 132.50 160.50 126.00 152.63 178.00 D:\s\1\liebr\2_2\2_2.wq2
Relation log-linear model and Poisson regression model are dummy variables (0 if i or j is equal to 1and1 if i or j equal to 2) and interaction variable is
Log-linear model fit a model to a table of frequencies Data: survey of political attitudes of British electors Source: Payne, C. (1977) The log-linear model for contingency. In: C.O. Muircheartaigh and C. Payne eds. The analysis of survey data. Vol 2: Model fitting, Wiley, New York, pp. 105-144 [data p. 106].(from Butler and Stokes, ‘Political change in Britain’, Macmillan, 2nd edidition, 1974)
The classical approach Geometric means (Birch, 1963) Effect coding (mean is ref. Cat.) Birch, M.W. (1963) ‘Maximum likelihood in three-way contingency tables’,J. Royal Stat. Soc. (B), 25:220-233
Political attitudes The basic model Overall effect : 22.98/4 = 5.7456 Effect of party : Conservative : 11.49/2 - 5.7456 = 0.0018 Labour : 11.49/2 - 5.7456 = -0.0018 Effect of gender : Male : 11.44/2 - 5.7456 = -0.0229 Female : 11.54/2 - 5.7456 = 0.0229 Interaction effects: Gender-Party interaction effect Male conservative : 5.6312 - 5.7456 - 0.0018 + 0.0229 = -0.0933 Female conservative : 5.8636 - 5.7456 - 0.0018 - 0.0229 = 0.0933 Male labour : 5.8141 - 5.7456 + 0.0018 + 0.0229 = 0.0933 Female labour : 5.6733 - 5.7456 + 0.0018 - 0.0229 = -0.0933
Political attitudes The basic model Birch, M.W. (1963) ‘Maximum likelihood in three-way contingency tables’,J. Royal Stat. Soc. (B), 25:220-233 Coding: effect coding Parameters are subject to constraints: normalisation constraints Only first-order contrasts can be estimated:
Political attitudes The basic model (GLIM) Estimate S.E.
Political attitudes The basic model (SPSS)
Political attitudes The basic model (1) ln 11 = 5.7456 + 0.0018 - 0.0229 - 0.0933 = 5.6312 ln 12 = 5.7456 + 0.0018 + 0.0229 + 0.0933 = 5.8636 ln 21 = 5.7456 - 0.0018 - 0.0229 + 0.0933 = 5.8142 ln 22 = 5.7456 - 0.0018 + 0.0229 - 0.0933 = 5.6734
Design matrixunsaturated log-linear model Number of parameters exceeds number of equations need for additional equations (X’X)-1 is singular identify linear dependencies
Design matrixunsaturated log-linear model (additional eq.) Coding!
3 unknowns 3 equations where is the frequency predicted by the model
Political attitudes 314.17*1.0040*0.9772 = 308.23 314.17*[1/1.0040]*0.9772 = 305.78
Political attitudes exp[5.7456+0.0018-0.0229-0.0933] = exp[5.6312] = 279 exp[5.7456-0.0018-0.0229+0.0933] = 335
Design matrix: other restrictions on parameterssaturated log-linear model (SPSS)
Political attitudes Prediction of counts or frequencies: A. Effect coding 279 = 312.80 * 0.97736 * 1.00185 * 0.91092 352 = 312.80 * 1.02316 * 1.00185 * 1.09779 335 = 312.80 * 0.97736 * 0.99815 * 1.09779 291 = 312.80 * 1.02316 * 0.99815 * 0.91092 • B. Contrast coding: GLIM • 291 = 279 * 1.2616 * 1.2007 * 0.6885 (females voting labour) • 279 = 279 * 1 * 1 * 1 (males voting conservative = ref.cat) • 352 = 279 * 1.2616 * 1 * 1 (females voting conservative) • 335 = 279 * 1 * 1.2007 * 1 (males voting labour) C. Contrast coding: SPSS (SPSS adds 0.5 to observed values ) 279.5 = 291.5 * 1.15096 * 1.20925 * 0.68894 352.5 = 291.5 * 1 * 1.20925 * 1 291.5 = 291.5 * 1 * 1 * 1 (females voting labour = ref.cat) 335.5 = 291.5 * 1.15096 * 1 * 1
Political attitudes The Poisson probability model with