1 / 44

Discrete Multivariate Analysis

Discrete Multivariate Analysis. Analysis of Multivariate Categorical Data. References. Fienberg, S. (1980), Analysis of Cross-Classified Data , MIT Press, Cambridge, Mass. Fingelton, B. (1984), Models for Category Counts , Cambridge University Press.

hanna-wolf
Download Presentation

Discrete Multivariate Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DiscreteMultivariate Analysis Analysis of Multivariate Categorical Data

  2. References • Fienberg, S. (1980), Analysis of Cross-Classified Data , MIT Press, Cambridge, Mass. • Fingelton, B. (1984), Models for Category Counts , Cambridge University Press. • Alan Agresti (1990) Categorical Data Analysis, Wiley, New York.

  3. Log Linear Model

  4. Two-way table where Note: X and Y are independent if In this case the log-linear model becomes

  5. Three-way Frequency Tables

  6. Log-Linear model for three-way tables Let mijk denote the expected frequency in cell (i,j,k) of the table then in general where

  7. Hierarchical Log-linear models for categorical Data For three way tables The hierarchical principle: If an interaction is in the model, also keep lower order interactions and main effects associated with that interaction

  8. Hierarchical Log-linear models for 3 way table

  9. Maximum Likelihood Estimation Log-Linear Model

  10. For any Model it is possible to determine the maximum Likelihood Estimators of the parameters Example Two-way table – independence – multinomial model or

  11. Log-likelihood where With the model of independence

  12. and with also

  13. Let Now

  14. Since

  15. Now or

  16. Hence and Similarly Finally

  17. Hence Now and

  18. Hence Note or

  19. Comments • Maximum Likelihood estimates can be computed for any hierarchical log linear model (i.e. more than 2 variables) • In certain situations the equations need to be solved numerically • For the saturated model (all interactions and main effects)

  20. Goodness of Fit Statistics These statistics can be used to check if a log-linear model will fit the observed frequency table

  21. Goodness of Fit Statistics The Chi-squared statistic The Likelihood Ratio statistic: d.f. = # cells - # parameters fitted We reject the model if c2 or G2 is greater than

  22. Example:Variables • Systolic Blood Pressure (B)Serum Cholesterol (C)Coronary Heart Disease (H)

  23. Goodness of fit testing of Models MODEL DF LIKELIHOOD- PROB. PEARSON PROB. RATIO CHISQ CHISQ ----- -- ----------- ------- ------- ------- B,C,H. 24 83.15 0.0000 102.00 0.0000 B,CH. 21 51.23 0.0002 56.89 0.0000 C,BH. 21 59.59 0.0000 60.43 0.0000 H,BC. 15 58.73 0.0000 64.78 0.0000 BC,BH. 12 35.16 0.0004 33.76 0.0007 BH,CH. 18 27.67 0.0673 26.58 0.0872 n.s. CH,BC. 12 26.80 0.0082 33.18 0.0009 BC,BH,CH. 9 8.08 0.5265 6.56 0.6824 n.s. Possible Models:1. [BH][CH] – B and C independent given H.2. [BC][BH][CH] – all two factor interaction model

  24. Model 1: [BH][CH] Log-linear parameters Heart disease -Blood Pressure Interaction

  25. Multiplicative effect Log-Linear Model

  26. Heart Disease - Cholesterol Interaction

  27. Multiplicative effect

  28. Model 2: [BC][BH][CH] Log-linear parameters Blood pressure-Cholesterol interaction:

  29. Multiplicative effect

  30. Heart disease -Blood Pressure Interaction

  31. Multiplicative effect

  32. Heart Disease - Cholesterol Interaction

  33. Multiplicative effect

  34. Another Example In this study it was determined for N = 4353 males • Occupation category • Educational Level • Academic Aptidude

  35. Self-employed Business • Teacher\Education • Self-employed Professional • Salaried Employed • Occupation categories • Education levels • Low • Low/Med • Med • High/Med • High

  36. Academic Aptitude • Low • Low/Med • High/Med • High

  37. It is common to handle a Multiway table by testing for independence in all two way tables. • This is similar to looking at all the bivariate correlations • In this example we learn that: • Education is related to Aptitude • Education is related to Occupational category • Education is related to Aptitude Can we do better than this?

  38. Fitting various log-linear models Simplest model that fits is: [Apt,Ed][Occ,Ed] This model implies conditional independence betweenAptitude and Occupation given Education.

  39. Log-linear Parameters Aptitude – Education Interaction

  40. Aptitude – Education Interaction (Multiplicative)

  41. Occupation – Education Interaction

  42. Occupation – Education Interaction (Multiplicative)

More Related