1 / 58

Analysis of count data

Analysis of count data. Introduction to log-linear models. Log-linear analysis. Contingency-table analysis Categorical data analysis Discrete multivariate analysis (Bishop, Fienberg and Holland, 1975) Analysis of cross-classified data

truda
Download Presentation

Analysis of count data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Analysis of count data Introduction to log-linear models

  2. Log-linear analysis • Contingency-table analysis • Categorical data analysis • Discrete multivariate analysis (Bishop, Fienberg and Holland, 1975) • Analysis of cross-classified data • Multivariate analysis of qualitative data (Goodman, 1978) • Count data analysis

  3. Log-linear model fit a model to a table of counts / frequencies Two data sets: Survey: political attitudes of British electors Survey: leaving parental home in the Netherlands

  4. Survey: political attitudes of British electors Source: Payne, C. (1977) The log-linear model for contingency. In: C.O. Muircheartaigh and C. Payne eds. The analysis of survey data. Vol 2: Model fitting, Wiley, New York, pp. 105-144 [data p. 106].(from Butler and Stokes, ‘Political change in Britain’, Macmillan, 2nd edidition, 1974)

  5. Survey: leaving parental home in the Netherlands

  6. Counts are generated by Poisson process  Poisson distribution

  7. The Poisson probability model Let N be a random variable representing the number of events during a unit interval and let n be a realisation of n (COUNT): N is a Poisson r.v. following a Poisson distribution with parameter : The parameter  is the expected number of events per unit time interval:  = E[N]

  8. Likelihood function Probability mass function: Log-likelihood function:  Likelihood equations to determine ‘best’ value of 

  9. Likelihood equations Hence: Hence: Var(N) = 

  10. Let i represent an individual with characteristics xi The probability of observing ni events during a unit interval is: with or Log-linear model

  11. The log-linear model The objective of log-linear analysis is to determine if the distribution of counts among the cells of a table can be explained by a simpler, underlying structure. Log-linear models specify different structures in terms of the cross-classified variables (rows, columns and layers of the table).

  12. Log-linear models for two-way tables Saturated log-linear model: Overall effect (level) Main effects (marginal freq.) Interaction effect In case of 2 x 2 table: 4 observations 9 parameters Normalisation constraints

  13. Survey: leaving parental home in the Netherlands

  14. Leaving home Descriptive statistics • Counts • Percentages • Odds of leaving home early rather than late Reference category

  15. Leaving home Log-linear models for two-way tables4 models Model 1: Null model or overall effect model All categories are equiprobable (an observation is equally likely to fall into any cell) for all i and j Exp(4.887) = 132.5 = 530/4  = 4.887 s.e. 0.0434 ij is expected count (frequency) in cell (ij): category i of variable A (row) and category j of variable B (column)

  16. Leaving home Where ij is a cell frequency generated by a Poisson process and Var[aX] = a2 Var[X] where a is a constant (e.g. Fingleton, 1984, p. 29) 

  17. Leaving home Log-linear models for two-way tables Model 2: B null model Categories of variable B (sex) are equiprobable within levels of variable A (time) for all j GLIM estimate s.e. Parameter Exp(parameter) 4.649 0.06914 Overall effect 104.5 0.0000 TIME(1) 0.4291 0.08886 TIME(2) 1.536

  18. Leaving home Log-linear models for two-way tables Model 2: B null model Categories of variable B (sex) are equiprobable within levels of variable A (time) for all j SPSS estimate s.e. Parameter Exp(parameter) 5.773 0.0558 Overall effect 321.5 -0.4283 0.0888 TIME(1) 0.5616 0.0000 TIME(2)

  19. Leaving home Log-linear models for two-way tables Model 3: independence model (unsaturated model) Categories of variable B (sex) are not equiprobable but the probability is independent of levels of variable A (time) estimate s.e. Parameter Exp(parameter) 4.697 0.0806 Overall effect 109.62 0.4291 0.0889 TIME(2) 1.536 -0.09819 0.0870 SEX(2) 0.906 GLIM

  20. Leaving home LOG-LINEAR MODEL: predictions Females leaving home early: 109.62 Females leaving home late: 109.62 * 1.536 = 168.37 Males leaving home early: 109.62 * 0.906 = 99.37 Males leaving home late: 109.62 * 1.536 * 0.906 = 152.63

  21. Leaving home SPSS Parameter Estimate SE 1 5.0280 .0721 Overall effect 2 -.4291 .0889 Time(1) 3 .0000 . Time(2) 4 .0982 .0870 Sex(1) 5 .0000 . Sex (2)

  22. Leaving home Log-linear models for two-way tables Model 4: saturated model The values of categories of variable B (sex) depend on levels of variable A (time) estimate s.e. parameter 4.905 0.08607 Overall effect 0.05757 0.1200 TIME(2) -0.6012 0.1446 SEX(2) 0.8201 0.1831 TIME(2).SEX(2) GLIM

  23. Leaving home Parameter Estimate SE Parameter 1 5.1846 .0748 Overall effect 2 -.8738 .1379 Time(1) 3 .0000 . Time(2) 4 -.2183 .1121 Sex(1) 5 .0000 . Sex(2) 6 .8164 .1827 Time(1) * Sex(1) 7 .0000 . Time(1) * Sex(2) 8 .0000 . Time(2) * Sex(1) 9 .0000 . Time(2) * Sex(2) SPSS

  24. Leaving home LOG-LINEAR MODEL: predictions Expected frequencies Observed Model 1 Model 2 Model 3 Model 4 Model 5 Fem_<20 F11 135 132.50 104.50 139.00 109.63 135.00 Mal_<20 F12 74 132.50 104.50 126.00 99.37 74.00 Fem_>20 F21 143 132.50 160.50 139.00 168.37 143.00 Mal_>20 F22 178 132.50 160.50 126.00 152.63 178.00 D:\s\1\liebr\2_2\2_2.wq2

  25. Relation log-linear model and Poisson regression model are dummy variables (0 if i or j is equal to 1and1 if i or j equal to 2) and interaction variable is

  26. Log-linear model fit a model to a table of frequencies Data: survey of political attitudes of British electors Source: Payne, C. (1977) The log-linear model for contingency. In: C.O. Muircheartaigh and C. Payne eds. The analysis of survey data. Vol 2: Model fitting, Wiley, New York, pp. 105-144 [data p. 106].(from Butler and Stokes, ‘Political change in Britain’, Macmillan, 2nd edidition, 1974)

  27. The classical approach Geometric means (Birch, 1963) Effect coding (mean is ref. Cat.) Birch, M.W. (1963) ‘Maximum likelihood in three-way contingency tables’,J. Royal Stat. Soc. (B), 25:220-233

  28. Political attitudes The basic model Overall effect : 22.98/4 = 5.7456 Effect of party : Conservative : 11.49/2 - 5.7456 = 0.0018 Labour : 11.49/2 - 5.7456 = -0.0018 Effect of gender : Male : 11.44/2 - 5.7456 = -0.0229 Female : 11.54/2 - 5.7456 = 0.0229 Interaction effects: Gender-Party interaction effect Male conservative : 5.6312 - 5.7456 - 0.0018 + 0.0229 = -0.0933 Female conservative : 5.8636 - 5.7456 - 0.0018 - 0.0229 = 0.0933 Male labour : 5.8141 - 5.7456 + 0.0018 + 0.0229 = 0.0933 Female labour : 5.6733 - 5.7456 + 0.0018 - 0.0229 = -0.0933

  29. Political attitudes The basic model Birch, M.W. (1963) ‘Maximum likelihood in three-way contingency tables’,J. Royal Stat. Soc. (B), 25:220-233 Coding: effect coding Parameters are subject to constraints: normalisation constraints Only first-order contrasts can be estimated:

  30. Political attitudes The basic model (GLIM) Estimate S.E.

  31. Political attitudes The basic model (SPSS)

  32. Political attitudes The basic model (1) ln 11 = 5.7456 + 0.0018 - 0.0229 - 0.0933 = 5.6312 ln 12 = 5.7456 + 0.0018 + 0.0229 + 0.0933 = 5.8636 ln 21 = 5.7456 - 0.0018 - 0.0229 + 0.0933 = 5.8142 ln 22 = 5.7456 - 0.0018 + 0.0229 - 0.0933 = 5.6734

  33. The design-matrix approach

  34. Design matrixunsaturated log-linear model  Number of parameters exceeds number of equations  need for additional equations (X’X)-1 is singular  identify linear dependencies

  35. Design matrixunsaturated log-linear model (additional eq.) Coding!

  36. 3 unknowns  3 equations where is the frequency predicted by the model

  37. Political attitudes

  38. Political attitudes  314.17*1.0040*0.9772 = 308.23 314.17*[1/1.0040]*0.9772 = 305.78

  39. Design matrixSaturated log-linear model

  40. Political attitudes exp[5.7456+0.0018-0.0229-0.0933] = exp[5.6312] = 279 exp[5.7456-0.0018-0.0229+0.0933] = 335

  41. Political attitudes

  42. Design matrix: other restrictions on parameterssaturated log-linear model (SPSS)

  43. Political attitudes

  44. Political attitudes

  45. Political attitudes

  46. Political attitudes

  47. Political attitudes Prediction of counts or frequencies: A. Effect coding 279 = 312.80 * 0.97736 * 1.00185 * 0.91092 352 = 312.80 * 1.02316 * 1.00185 * 1.09779 335 = 312.80 * 0.97736 * 0.99815 * 1.09779 291 = 312.80 * 1.02316 * 0.99815 * 0.91092 • B. Contrast coding: GLIM • 291 = 279 * 1.2616 * 1.2007 * 0.6885 (females voting labour) • 279 = 279 * 1 * 1 * 1 (males voting conservative = ref.cat) • 352 = 279 * 1.2616 * 1 * 1 (females voting conservative) • 335 = 279 * 1 * 1.2007 * 1 (males voting labour) C. Contrast coding: SPSS (SPSS adds 0.5 to observed values ) 279.5 = 291.5 * 1.15096 * 1.20925 * 0.68894 352.5 = 291.5 * 1 * 1.20925 * 1 291.5 = 291.5 * 1 * 1 * 1 (females voting labour = ref.cat) 335.5 = 291.5 * 1.15096 * 1 * 1

  48. The Poisson regression model

  49. Political attitudes The Poisson probability model with

More Related