490 likes | 613 Views
The Multigraph for Loglinear Models. Harry Khamis Statistical Consulting Center Wright State University Dayton, Ohio, USA. OUTLINE. 1. LOGLINEAR MODEL (LLM) - two-way table - three-way table - examples 2. MULTIGRAPH - construction - maximum spanning tree
E N D
The Multigraph for Loglinear Models Harry Khamis Statistical Consulting Center Wright State University Dayton, Ohio, USA
OUTLINE 1. LOGLINEAR MODEL (LLM) - two-way table - three-way table - examples 2. MULTIGRAPH - construction - maximum spanning tree - conditional independencies - collapsibility 3. EXAMPLES
Loglinear Model Goal Identify the structure of associations among a set of categorical variables.
LLM: two variables Y 1 2 3 … J Total ------------------------------------------------------------------------------ 1 n11 n12 n13 … n1Jn1+ 2 n21 n22 n23 … n2J n2+ . . . . . . X . . . . . . . . . . . . I nI1 nI2 nI3 … nIJnI+ Totaln+1 n+2 n+3 … n+Jn
LLM: two variables Example Survey of High School Seniors in Dayton, Ohio Collaboration: WSU Boonshoft School of Medicine and United Health Services of Dayton Marijuana Use? Yes No Total --------------------------------------------------------------------- Yes 914 581 1495 Cigarette Use? No 46 735 781 Total 960 1316 2276
LLM: two variables Two discrete variables, X and Y Model of independence: generating class is [X][Y]
LLM: two variables LLM of independence:
LLM: two variables Saturated LLM: generating class is [XY]:
LLM: two variables Generating Probabilistic Interpretation Class Model ------------------------------------------------------------------------------------- X and Y independent [X][Y] pij = pi+p+j X and Y dependent [XY] pij
LLM: three variables Example: Dayton High School Data AlcoholCigaretteMarijuana Use UseUse Yes No ---------------------------------------------------------------------------------- Yes Yes 911 538 No 44 456 No Yes 3 43 No 2 279
LLM: three variables Saturated LLM, [XYZ]: 11
LLM: three variables Generating Probabilistic Interpretation Class Model ------------------------------------------------------------------------------------ mutual independence [X][Y][Z] pijk = pi++p+j+p++k joint independence [XZ][Y] pijk = pi+kp+j+ conditional independence [XY][XZ] pijk = pij+pi+k/pi++ homogeneous association* [XY][XZ][YZ] * saturated model [XYZ] pijk *nondecomposable model
Decomposable LLMs closed-form expression for MLEs closed-form expression for asymptotic variances (Lee, 1977) conditional G2 statistic simplifies allow for causal interpretations easier to interpret the LLM
3 Categorical Variables: X, Y, and Z If [X⊗Y] and [Y⊗Z] then [X⊗Z] FALSE!
LLM: three variables Generating Probabilistic Interpretation Class Model ------------------------------------------------------------------------------------ mutual independence [X][Y][Z] pijk = pi++p+j+p++k joint independence [XZ][Y] pijk = pi+kp+j+ conditional independence [XY][XZ] pijk = pij+pi+k/pi++ homogeneous association [XY][XZ][YZ] pijk = ψijφikωjk saturated model [XYZ] pijk
3 Categorical Variables: X, Y, and Z If [Y⊗Z] for all X = 1, 2, …. then [Y⊗Z] FALSE!
LLM: three variables Generating Probabilistic Interpretation Class Model ------------------------------------------------------------------------------------ mutual independence [X][Y][Z] pijk = pi++p+j+p++k joint independence [XZ][Y] pijk = pi+kp+j+ conditional independence [XY][XZ] pijk = pij+pi+k/pi++ homogeneous association [XY][XZ][YZ] pijk = ψijφikωjk saturated model [XYZ] pijk
3 Categorical Variables: X, Y, and Z If [Y⊗Z] then [Y⊗Z] for all X = 1, 2, 3, … FALSE!
Which Treatment is Better? TRIAL 1TRIAL 2 CURED? CURED? Yes No Total Yes No Total ---------------------------------------------- ---------------------------------------- A 40 (.20) 160 200 85 (.85) 15 100 TREATMENT B 30 (.15) 170 200 300 (.75) 100 400 Combine TRIALS 1 and 2: CURED? Yes No Total ----------------------------------------------- A 125 (.42) 175 300 TREATMENT B 330 (.55) 270 600 “Ask Marilyn”, PARADE section, DDN, pages 6-7, April 28, 1996
Florida Homicide Convictions Resulting in Death PenaltyML Radelet and GL Pierce, Florida Law Review 43: 1-34, 1991 Death Penalty Yes No ---------------------------------------- White 53 (0.11) 430 Defendant’s Race Black 15 (0.08) 176 White VictimBlack Victim Death PenaltyDeath Penalty Yes No Yes No ------------------------------------- -------------------------------------- White 53 (0.11) 414 White 0 (0.00) 16 Defendant’s Race Black 11 (0.23) 37 Black 4 (0.03) 139
Multigraph Representation of LLMs Vertices = generators of the LLM Multiedges = edges that are equal in number to the number of indices shared by the two vertices being joined
Multigraph: three variables [XY][XZ] XY XZ
Examples of Multigraphs [AS][ACR][MCS][MAC] AS ACR MAC MCS
Examples of Multigraphs [ABCD][ACE][BCG][CDF] CDF ABCD ACE BCG
Maximum Spanning Tree The maximum spanning tree of a multigraph M: • tree (connected graph with no circuits) • includes each vertex • sum of the edges is maximum
Examples of maximum spanning trees [XY][XZ] XY XZ
Examples of maximum spanning trees [AS][ACR][MCS][MAC] AS ACR MAC MCS
[ABCD][ACE][BCG][CDF] CDF ABCD ACE BCG Examples of maximum spanning trees
Fundamental Conditional Independenciesfor a Decomposable LLM 1. Let S be the set of indices in a branch of the maximum spanning tree 2. Remove each factor of S from the multigraph, M; the resulting multigraph is M/S 3. An FCI is determined as: where C1, C2, …, Ck are the sets of factors in the components of M/S
[XY][XZ] XY XZ FCIs X S = {X} M/S: Y Z [Y⊗Z|X]
Collapsibility Conditions Consider a conditional independence relationship of the form [C1 ⊗ C2|S]. If the levels of all factors in C1 are collapsed, then all relationships among the remaining factors are undistorted EXCEPT for relationships among factors in S.
[XY][XZ] XY XZ FCIs X S = {X} M/S: Y Z [Y⊗Z|X]
Example: Ob-Gyn Study(Darrocca, et al., 1996) n = 201 pregnant mothers Variables: E: EGA (Early, Late) B: Bishop score (High, Low) T: Treatment (Prostin, Placebo)
Example: Ob-Gyn Study BISHOP SCORE (B) High Low EGA (E) EGA (E) TREATMENT (T) Early Late Early Late ------------------------------------------------------------------------------------------------------ Prostin 34 24 27 21 Placebo 22 16 35 22 Best-fitting model: [E][TB]
Example: Ob-Gyn Study Generating Class: [E][TB] Multigraph: E TB FCI: [E⊗T,B]
Example: Ob-Gyn Study Collapsed Table (collapse over EGA): BISHOP SCORE (B) High Low Total ------------------------------------------------- Prostin 58 (0.55) 48 106 TREATMENT (T) Placebo 38 (0.40) 57 95 P = 0.037
Example: WSU-United Way Study M: Marijuana (No, Yes) A: Alcohol (No, Yes) C: Cigarettes (No, Yes) R: Race (Other, White) S: Sex (Female, Male) Observed cell frequencies (n = 2,276): 12 0 19 2 1 0 23 23 117 1 218 13 17 1 268 405 17 0 18 1 8 1 19 30 133 1 201 28 17 1 228 453
Example: WSU-United Way Study Generating class: [ACE][MAC][MCG] Multigraph, M: ACE MCG MAC
Example: WSU-United Way Study M: S = {A,C} ACE M/S: E AC MG M MCG MAC [E⊗M,G|A,C] A = Alcohol C = Cigarette E = Ethnic G = Gender M = Marijuana
Example: WSU PASS Program “Preparing for Academic Success” GPA below 2.0 at the end of first quarter
Example: WSU PASS Program Variables (n = 972): FACTOR LABEL LEVELS -------------------------------------------------------------------------------------------------------------- Retention R 1=No, 2=Yes Cohort C 1, 2, 3, 4 PASS Participation P 1=No, 2=Yes Ethnic Group E 1=Caucasian, 2=African-American, 3=Other Gender G 1=Male, 2=Female
Example: WSU PASS Program The best-fitting LLM has generating class [EG][CP][RC][PG] Multigraph, M: G EG PG P RC C CP
Example: WSU PASS Program M: S = {C} EG PG EG PG RC CP R P C M M/S [E,G,P⊗R|C] C = Cohort E = Ethnic G = Gender P = PASS Participation R = Retention
Example: Affinal Relations in Bosnia-HerzegovinaData courtesy of Dr. Keith Doubt, Department of Sociology, Wittenberg University, Springfield, Ohio N = 861 couples from Bosnia-Herzegovina are surveyed concerning affinal relations. M: Marriage Type (traditional, elopement) L: Location of Man and Wife (same, different) E: Ethnicity (Bosniak, Serb, Croat) S: Settlement (rural, urban) Best-fitting model: [MLES] Consider structural associations among M, L, and S for each ethnic group (E) separately.
Example: Affinal Relations in Bosnia-Herzegovina Bosniaks: [ML][LS] Serbs: [MS][SL] Croats: [M][L][S] M: Marriage Type L: Location of Man and Wife S: Settlement
Conclusions • The generator multigraph uses mathematical graph theory to analyze and interpret LLMs in a facile manner • Properties of the multigraph allow one to: • Find all conditional independencies • Determine all collapsibility conditions REFERENCE Khamis, H.J. (2011). The Association Graph and the Multigraph for Loglinear Models, SAGE series Quantitative Applications in the Social Sciences, No. 167.