Part 2 Automatically Identifying and Measuring Latent Variables for Causal Theorizing

Part 2Automatically Identifying and Measuring Latent Variables for Causal Theorizing

Assumptions Throughout • Causal Bayes Nets • Causal Markov Condition • Faithfulness

Latent Variables Reduce Dimensionality

Latent Variables Cluster of Causes

Latent Variables Model concepts that might be “real” but which cannot be directly measured, e.g., air polution, depression

The Causal Theory Formation Problem for Latent Variable Models Given observations on a number of variables, identify the latent variables that underlie these variables and the causalrelations among these latent concepts. Example: Spectral measurements of solar radiation intensities. Variables are intensities at each measured frequency. Example: Quality of a Child’s Home Environment, Cumulative Exposure to Lead, Cognitive Functioning

The Most Common Automatic Solution: Exploratory Factor Analysis • Chooses “factors” to account linearly for as much of the variance/covariance of the measured variables as possible. • Great for dimensionality reduction • Factor rotations are arbitrary • Gives no information about the statistical and thus the causal dependencies among any real underlying factors. • No general theory of the reliability of the procedure

Other Solutions • Independent Components, etc • Background Theory • Scales

Key Causal Question Other Solutions: Background Theory Specified Model Thus, key statistical question: Lead _||_ Cog | Home ?

“Impurities” Other Solutions: Background Theory True Model Lead _||_ Cog | Home ? Yes, but statistical inference will say otherwise.

Other Solutions: Background Theory True Model “Impure” Measures: C1, C2, T2, T20 A measure is “pure” if it is d-separated from all other measures by its latent parent.

Purify Specified Model

Purify True Model

Purify Purified Model

Other Solutions: Scales Scale = sum(measures of a latent)

Other Solutions: Scales True Model Pseudo-Random Sample: N = 2,000

Insig. Scales vs. Latent variable Models True Model Regression: Cognition on Home, Lead Predictor Coef SE Coef T P Constant -0.02291 0.02224 -1.03 0.303 Home 1.22565 0.02895 42.33 0.000 Lead -0.00575 0.02230 -0.26 0.797 S = 0.9940 R-Sq = 61.1% R-Sq(adj) = 61.0%

Scales vs. Latent variable Models True Model Scales homescale = (x1 + x2 + x3)/3 leadscale = (x4 + x5 + x6)/3 cogscale = (x7 + x8 + x9)/3

Sig. Scales vs. Latent variable Models True Model Regression: Cognition on homescale, Lead Cognition = - 0.0295 + 0.714 homescale - 0.178 Lead Predictor Coef SE Coef T P Constant -0.02945 0.02516 -1.17 0.242 homescal 0.71399 0.02299 31.05 0.000 Lead -0.17811 0.02386 -7.46 0.000

Scales vs. Latent variable Models True Model Modeling Latents Specified Model

Scales vs. Latent variable Models True Model Estimated Model (c2 = 29.6, df = 24, p = .19) B5 = .0075, which at t=.23, is correctly insignificant

Scales vs. Latent variable Models True Model Mixing Latents and Scales (c2 = 14.57, df = 12, p = .26) B5 = -.137, which at t=5.2, is incorrectly highly significant P < .001

Algorithms Washdown (Scheines and Glymour, 2000?) Build Pure Clusters (Silva, Scheines, Glymour, 2003,204)

Build Pure Clusters • Qualitative Assumptions (Causal Grammar - Tennenbaum): • Two types of nodes: measured (M) and latent (L) • M L (measured don’t cause latents) • Each m  M measures (is a direct effect of) at least one l  L • No cycles involving M • Quantitative Assumptions: • Each m  M is a linear function of its parents plus noise • P(L) has second moments, positive variances, and no deterministic relations

Build Pure Clusters Output - provably reliable (pointwise consistent): Equivalence class of measurement models over a pure subset of M For example: True Model Output

Build Pure Clusters Measurement models in the equivalence class are at most refinements, but never coarsenings or permuted clusterings. Output

Build Pure Clusters • Algorithm Sketch: • Use particular rank (tetrad) constraints on the measured correlations to find pairs mj, mk that do NOT share a latent parent • Add a latent for each subset S of M such that no pair in S was found NOT to share a latent parent in step 1. • Purify • Remove latents with no children

Limitations • Requires large sample sizes to be really reliable (~ 500). • Pure indicators must exist for a latent to be discovered and included • Moderately computationally intensive (O(n6)). • No error probabilities.

Case Studies Stress, Depression, and Religion (Lee, 2004) Test Anxiety (Bartholomew, 2002)

Specified Model Stress, Depression, and Religion • MSW Students (N = 127) 61 - item survey (Likert Scale) • Stress: St1 - St21 • Depression: D1 - D20 • Religious Coping: C1 - C20 P = 0.00

Stress, Depression, and Religion • Build Pure Clusters

Stress, Depression, and Religion • Assume Stress temporally prior: • MIMbuild to find Latent Structure: P = 0.28

Test Anxiety 12th Grade Males in British Columbia (N = 335) 20 - item survey (Likert Scale items): X1 - X20 Exploratory Factor Analysis:

Test Anxiety Build Pure Clusters:

Test Anxiety Exploratory Factor Analysis: Build Pure Clusters: P-value = 0.47 P-value = 0.00

MIMbuild Scales: No Independencies or Conditional Independencies p = .43 Unininformative Test Anxiety

Future Directions • Handle discrete items • Incorporate background knowledge • Apply to ETS data

Part 2 Automatically Identifying and Measuring Latent Variables for Causal Theorizing

Part 2 Automatically Identifying and Measuring Latent Variables for Causal Theorizing

Presentation Transcript

Identifying Variables

Identifying variables

Measuring abstract concepts: Latent Variables and Factor Analysis

Part II: Measuring Psychological Variables

Latent Variables, Constructs, and Constructions

Statistical modelling and latent variables.

Statistical modelling and latent variables (2)

Identifying Variables

Identifying Variables

Identifying Controls and Variables

Identifying Variables

Identifying Variables

LATENT VARIABLES AND CORRELATION

Identifying Variables

Identifying Variables

Identifying Variables

Latent Variables, Constructs, and Constructions

MEASURING VARIABLES

Variables and Inheritance Part 2