490 likes | 592 Views
Topic Outline. Motivation Representing/Modeling Causal Systems Estimation and Updating Model Search Linear Latent Variable Models Case Study: fMRI. Discovering Pure Measurement Models. Richard Scheines Carnegie Mellon University. Ricardo Silva* University College London.
E N D
Topic Outline Motivation Representing/Modeling Causal Systems Estimation and Updating Model Search Linear Latent Variable Models Case Study: fMRI
Discovering Pure Measurement Models Richard ScheinesCarnegie Mellon University Ricardo Silva*University College London Clark Glymour and Peter SpirtesCarnegie Mellon University
Outline • Measurement Models & Causal Inference • Strategies for Finding a Pure Measurement Model • Purify • MIMbuild • Build Pure Clusters • Examples • Religious Coping • Test Anxiety
Goals: • What Latents are out there? • Causal Relationships Among Latent Constructs Relationship Satisfaction Depression or Relationship Satisfaction Depression or ?
Needed: Ability to detect conditional independence among latent variables
Lead and IQ e2 e3 Parental Resources Lead Exposure IQ Lead _||_ IQ | PR e2 ~ N(m=0, s = 1.635) Lead = 15 -.5*PR + e2 PR ~ N(m=10, s = 3) e3 ~ N(m=0, s = 15) IQ = 90 + 1*PR + e3
Psuedorandom sample: N = 2,000 Parental Resources Lead Exposure IQ Regression of IQ on Lead, PR
Measuring the Confounder e1 e3 e2 X1 X2 X3 Parental Resources Lead Exposure IQ X1 = g1* Parental Resources + e1 X2 = g2* Parental Resources + e2 X3 = g3* Parental Resources + e3 PR_Scale = (X1 + X2 + X3) / 3
Scales don't preserve conditional independence X1 X2 X3 Parental Resources Lead Exposure IQ PR_Scale = (X1 + X2 + X3) / 3
Indicators Don’t Preserve Conditional Independence X1 X2 X3 Parental Resources Lead Exposure IQ Regress IQ on: Lead, X1, X2, X3
Structural Equation Models Work X1 X2 X3 Parental Resources Lead Exposure IQ b • Structural Equation Model • (p-value = .499) • Lead and IQ “screened off” by PR
Local Independence / Pure Measurement Models • For every measured item xi: • xi _||_ xj | latent parent of xi
Strategies • Find a Locally Independent Measurement Model • Correctly specify the MM, including deviations from Local Independence
Correctly Specifying Deviations from Local Independence is Often Very Hard
Finding Pure Measurement Models - Much Easier
tetrad constraints CovWXCovYZ =(122L)(342L) ==(132L) (242L)= CovWYCovXZ WXYZ = WYXZ = WZXY Tetrad Constraints • Fact: given a graph with this structure • it follows that L W = 1L + 1 X = 2L + 2 Y = 3L + 3 Z = 4L + 4 1 4 2 3 W X Y Z
Early Progenitors Charles Spearman (1904) StatisticalConstraints Measurement Model Structure g m1 m2 r1 r2 rm1 * rr1 = rm2 * rr2
Impurities/Deviations from Local Independence defeat tetrad constraints selectively rx1,x2 * rx3,x4 = rx1,x3 * rx2,x4 rx1,x2 * rx3,x4 = rx1,x4 * rx2,x3 rx1,x3 * rx2,x4 = rx1,x4 * rx2,x3 rx1,x2 * rx3,x4 = rx1,x3 * rx2,x4 rx1,x2 * rx3,x4 = rx1,x4 * rx2,x3 rx1,x3 * rx2,x4 = rx1,x4 * rx2,x3
Purify True Model Initially Specified Measurement Model
Purify Iteratively remove item whose removal most improves measurement model fit (tetrads or c2) – stop when confirmatory fit is acceptable Remove x4 Remove z2
Purify Detectibly Pure Subset of Items Detectibly Pure Measurement Model
How a pure measurement model is useful Consistently estimate covariances/correlations among latents- test conditional independence with estimatedlatent correlations Test for conditional independence among latents directly
2. Test conditional independence relations among latents directly Question: L1 _||_ L2 | {Q1, Q2, ..., Qn} b21 b21= 0 L1 _||_ L2 | {Q1, Q2, ..., Qn}
MIMbuild Input: - Purified Measurement Model - Covariance matrix over set of pure items MIMbuild PC algorithm with independence tests performed directly on latent variables Output: Equivalence class of structural models over the latent variables
Goal 2: What Latents are out there? • How should they be measured?
Latents and the clustering of items they measure imply tetrad constraints diffentially
Build Pure Clusters (BPC) Input: - Covariance matrix over set of original items BPC 1) Cluster (complicated boolean combinations of tetrads) 2) Purify Output: Equivalence class of measurement models over a pure subset of original Items
Build Pure Clusters • Qualitative Assumptions • Two types of nodes: measured (M) and latent (L) • M L (measured don’t cause latents) • Each m M measures (is a direct effect of) at least one l L • No cycles involving M • Quantitative Assumptions: • Each m M is a linear function of its parents plus noise • P(L) has second moments, positive variances, and no deterministic relations
Build Pure Clusters Output - provably reliable (pointwise consistent): Equivalence class of measurement models over a pure subset of M For example: TrueModel Output
Build Pure Clusters Measurement models in the equivalence class are at most refinements, but never coarsenings or permuted clusterings. Output
Build Pure Clusters • Algorithm Sketch: • Use particular rank (tetrad) constraints on the measured correlations to find pairs of items mj, mk that do NOT share a single latent parent • Add a latent for each subset S of M such that no pair in S was found NOT to share a latent parent in step 1. • Purify • Remove latents with no children
Case Studies Stress, Depression, and Religion (Lee, 2004) Test Anxiety (Bartholomew, 2002)
Specified Model Case Study: Stress, Depression, and Religion • Masters Students (N = 127) 61 - item survey (Likert Scale) • Stress: St1 - St21 • Depression: D1 - D20 • Religious Coping: C1 - C20 p = 0.00
Case Study: Stress, Depression, and Religion Build Pure Clusters
Case Study: Stress, Depression, and Religion • Assume Stress temporally prior: • MIMbuild to find Latent Structure: p = 0.28
Case Study : Test Anxiety Bartholomew and Knott (1999), Latent variable models and factor analysis 12th Grade Males in British Columbia (N = 335) 20 - item survey (Likert Scale items): X1 - X20: Exploratory Factor Analysis:
Case Study : Test Anxiety Build Pure Clusters:
Case Study : Test Anxiety Build Pure Clusters: Exploratory Factor Analysis: p-value = 0.00 p-value = 0.47
MIMbuild Scales: No Independencies or Conditional Independencies p = .43 Uninformative Case Study : Test Anxiety
Limitations • In simulation studies, requires large sample sizes to be really reliable (~ 400-500). • 2 pure indicators must exist for a latent to be discovered and included • Moderately computationally intensive (O(n6)). • No error probabilities.
Open Questions/Projects • IRT models? • Bi-factor model extensions? • Appropriate incorporation of background knowledge
References • Tetrad: www.phil.cmu.edu/projects/tetrad_download • Spirtes, P., Glymour, C., Scheines, R. (2000). Causation, Prediction, and Search, 2nd Edition, MIT Press. • Pearl, J. (2000). Causation: Models of Reasoning and Inference, Cambridge University Press. • Silva, R., Glymour, C., Scheines, R. and Spirtes, P. (2006) “Learning the Structure of Latent Linear Structure Models,” Journal of Machine Learning Research, 7, 191-246. • Learning Measurement Models for Unobserved Variables, (2003). Silva, R., Scheines, R., Glymour, C., and Spirtes. P., in Proceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence , U. Kjaerulff and C. Meek, eds., Morgan Kauffman