260 likes | 411 Views
Challenges posed by Structural Equation Models. Thomas Richardson Department of Statistics University of Washington. Joint work with Mathias Drton, UC Berkeley Peter Spirtes, CMU. Overview. Challenges for Likelihood Inference Problems in Model Selection and Interpretation
E N D
Challenges posed byStructural Equation Models Thomas Richardson Department of Statistics University of Washington Joint work with Mathias Drton, UC Berkeley Peter Spirtes, CMU
Overview • Challenges for Likelihood Inference • Problems in Model Selection and Interpretation • Partial Solution • sub-class of path diagrams: ancestral graphs
X1 Y1 X2 Y2 Problems for Likelihood Inference • Likelihood may be multimodal • e.g. the bi-variate Gaussian Seemingly Unrelated Regression (SUR) model: may have up to 3 local maxima. Consistent starting value does not guarantee iterative procedures will find the MLE.
Problems for Likelihood Inference • Discrete latent variable models are not curved exponential families ternary latent class variable 15 parameters in saturated model 14 model parameters BUT model has 2d.f. (Goodman) C binary observed variables X1 X2 X3 X4 Usual asymptotics may not apply
Problems for Likelihood Inference • Likelihood may be highly multimodal in the asymptotic limit • After accounting for label switching/aliasing C d.f. may vary as a function of model parameters X1 X2 X3 X4 Why report one mode ?
Problems for Model Selection • SEM models with latent variables are not curved exponential families • Standard c2 asymptotics do not necessarily apply e.g. for LRTs • Model selection criteria such as BIC are not asymptotically consistent • The effective degrees of freedom may vary depending on the values of the model parameters
X1 Y1 X2 Y2 Problems for Model Selection • Many models may be equivalent: X1 Y1 X2 Y2 X1 X1 Y1 Y1 X2 X2 Y2 Y2
X1 Y1 w x Xp Yq X1 Y1 y Xp Yq Problems for Model Selection • Models with different numbers of latents may be equivalent: • e.g. unrestricted error covariance within blocks
Problems for Model Selection • Models with different numbers of latents may be equivalent: • e.g. unrestricted error covariance within blocks X1 Y1 w x Xp Yq X1 Y1 y Xp Yq Wegelin & Richardson (2001)
Two scenarios • A single SEM model is proposed and fitted. The results are reported.
Two scenarios • A single SEM model is proposed and fitted. The results are reported. • The researcher fits a sequence of models, making modifications to an original specification. • Model equivalence implies: • Final model depends on initial model chosen • Sequence of changes is often ad hoc • Equivalent models may lead to very different substantive conclusions • Often many equivalence classes of models give reasonable fit. Why report just one?
Partial Solution • Embed each latent variable model in a ‘larger’ model without latent variables characterized by conditional independence restrictions. • We ignore non-independence constraints and inequality constraints. Latent variable model Model imposing only independence constraints on observed variables Sets of distributions
a d a c b d b a d b d a d c a a c a t d d t b c t The Generating graph • Begin with a graph, and associated set of independences Toy Example: t a b c d G +others
a d a c b d b a d b d a d c a a c d Marginalization • Suppose now that some variables are unobserved • Find the independence relations involving only the observed variables Toy Example: hidden: t a b c d G ‘Unobserved’ independencies in red a t d t +others b c t
a d a c b d b a d b d a d c a a c d Marginalization • Suppose now that some variables are unobserved • Find the independence relations involving only the observed variables Toy Example: hidden: t a b c d G ‘Unobserved’ independencies in red a t d t +others b c t
a d a c b d b a d b d a d c a a c d ‘Graphical Marginalization’ • Now construct a graph that represents the conditional independence relations among the observed variables. • Bi-directed edges are required. Toy Example: t a b c d a b c d G G* represents all and only the distributions in which these independencies hold
Equivalence re-visited • Restrict model class to path diagrams including only observed variables characterized by conditional independence • Ancestral Graph Markovmodels • For such models we can: • Determine the entire class of equivalent models • Identify which features they have in common • Models are curved exponential: usual asymptotics do apply
A C B D A D A D C A D B A C D B D A Ancestral Graph T A B C D A D C A B
A C B D A D A D C A D B A C D B D A Equivalent ancestral graphs T A B C D A D C A B U V • Þ B C D A B C D A
A C B C D A B D A D A D C B C D A A D B A C D B D A Equivalent ancestral graphs T A D C A B U V A B C D R P Q • Þ B C D A A D B C Markov Equiv. Class of Graphs with Latent Variables
A C B C D A B D A D A D C B C D A A D B B C D A A C D B D A Equivalence Classes Equivalent ancestral graphs T A D C A B U V A B C D R P Q A D B C R M N B C D • Þ A A D B C + infinitely many others L Markov Equiv. Class of Graphs with Latent Variables
A C B C D A B D A D A D C B C D A A D B B C D A A C D B D A Equivalence class of Ancestral Graphs T A D C A B U V A B C D R P Q A D B C R M N B C D A A D B C • ß + infinitely many others L A B C D Markov Equiv. Class of Graphs with Latent Variables Partial Ancestral Graph
A C B C D A B D A D A D C B C D A A D B B C D A A C D B D A Equivalence class of Ancestral Graphs T A D C A B U V A B C D R P Q A D B C R M N B C D A A D B C • ß + infinitely many others L A B C D Partial Ancestral Graph Markov Equiv. Class of Graphs with Latent Variables
Measurement models • If we have pure measurement models with several indicators per latent: • May apply similar search methods among the latent variables (Spirtes et al. 2001; Silva et al.2003)
Other Related Work • Iterative ML estimation methods exist • Guaranteed convergence • Multimodality is still possible • Implemented in R package ggm (Drton & Marchetti, 2003) • Current work: • Extension to discrete data • Parameterization and ML fitting for binary bi-directed graphs already exist • Implementing search procedures in R
References • Richardson, T., Spirtes, P. (2002) Ancestral graph Markov models, Ann. Stat., 30: 962-1030 • Richardson, T. (2003) Markov properties for acyclic directed mixed graphs. Scand. J. Statist. 30(1), pp. 145-157 • Drton, M., Richardson T. (2003) A new algorithm for maximum likelihood estimation in Gaussian graphical models for marginal independence. UAI 03, 184-191 • Drton, M., Richardson T. (2003) Iterative conditional fitting in Gaussian ancestral graph models. UAI 04 130-137. • Drton, M., Richardson T. (2004) Multimodality of the likelihood in the bivariate seemingly unrelated regressions model. Biometrika, 91(2), 383-92. • Marchetti, G., Drton, M. (2003) ggm package. Available from http://cran.r-project.org