Causal discovery, Bayesian networks, and structural equation models

Aapo Hyvärinen with Patrik Hoyer and Shohei Shimizu Causal discovery, Bayesian networks, and structural equation models Dept of Computer Science University of Helsinki [ Presentation at LTL/BRU, Apr 2008 ]

The “causal discovery” problem Example: is smoking cause of lung cancer? Distinguish between X causes Y Y causes X X and Y are both caused by Z Discovery: Find interesting connections between many variables Physiological quantity Smoking Non-smoking

How to “best” infer causality? Randomized experiments! Condition 1 with smoking Condition 2 without smoking Unfortunately: in many cases, can be... costly unpractical unethical ...what then?

Causality & statistical inference Emphasis in statistics courses: “Correlation does not imply causality”due to wide-spread misinterpretation of correlation as causality in the past This had lead to (exaggerated) pessimism regarding all causal inference Correlations do not imply causality, but causality usually implies correlations

Model-based causal discovery from “non-experimental” data Make a model with assumptions on the process which generated the data Deduce what different causal connections and directions would imply for the data We can choose which alternative fits best the data if the assumptions hold Thus, we can find the true causal connections (if the assumptions hold!) (see, e.g. Spirtes et al, 1993; Pearl, 2000)

Data-driven vs. physical models Data-driven models (topic of this talk): Few assumptions General functional forms Physically detailed models (e.g. Friston's DCM) Stronger assumptions Specific functional forms Which one is better? Who knows...

Basic form of data-driven models Typically, each data variable is expressed as a function of other data variables Often a linear function If xi is a function of xj , we think there is a causal effect Different from factor/component models (PCA, ICA) where x is function some other variables

Main approaches (1): Autoregressive models Present data is “caused” by the past (+ noise) Needs good time resolution in measurements (measurements faster than effects) Non-zero aij related to Granger causality Estimation “easy”: simple linear regression Problems will occur because there can be many different time lags => many parameters to estimate and summarize

Main approaches (1): Autoregressive models Red: Reference region Green: Sources of influence TO reference region Blue: Sources of influence FROM reference region (Roeboeck, Formisano, Goebel.NeuroImage, 2005)

Main approaches (2): Stuctrural equation models Also called (linear) Bayesian networks or simultaneous equation models All effects occur at the same time Estimation difficult: not simple regression If data is Gaussian, many different model indistinguishable => despair?

Linear Non-Gaussian Acyclic Model (LiNGAM) Non-Gaussianity allows estimation of the model, cf. ICA vs. factor analysis Important assumption of acyclicity: Equivalent to existence of an ordering of the variables so that there are only effects “forward” Otherwise, problems due to variables causing each other ad infinitum (Shimizu, Hoyer, Hyvärinen, Kerminen, Journal of Machine Learning Research, 2006)

Examples of acyclic graphs

Estimation of LiNGAM Transform it to ICA: Estimate ICA: you get up to a permutation and normalization. Acyclicity allows determination of right permutation. (Normalization obvious.) Optionally, set almost half the parameters to zero based on acyclicity.

Combination of autoregressive and structural equation models Easy to combine both in same equation:Note that k starts from 0 Must assume acyclicity for k=0 Lagged bij change when k=0 included Can be estimated by combining autoregressive estimation with LiNGAM (Hyvärinen, Shimizu, Hoyer, ICML 2008, in press)

Deep issues in modelling Hidden variables Perhaps x does not cause y and y does not cause x but both are caused by an unobserved variable z (Hoyer et al, in press) Lack of acyclicity x has an effect on y but then (afterwards?) y has an effect on x (Lacerda et al, submitted) Non-linearity, dependence of disturbances ei, etc.

Philosophical basis:Causal vs. probabilistic models A formalization which has recently gained acceptance Find the data generating mechanism, not just the statistical regularities: A probabilistic model of the data allows you to predict one quantity from observation of the other A causal model would allow you to predict the effect on one variable if intervening on the other

Code We distribute full Matlab/Octave code for LiNGAM. Please see: http://www.cs.helsinki.fi/group/neuroinf/lingam/

Summary Causal discovery is possible by making general assumptions on causal structure Simplest approach is autoregressive models needs good time resolution of measurements In simultaneous (or structural) models, non-Gaussianity is needed, cf. ICA Our LiNGAM method Autoregressive and structural models can be combined An alternative to factor/component based exploratory analysis Matlab code available on-line

Causal discovery, Bayesian networks, and structural equation models

Causal discovery, Bayesian networks, and structural equation models

Presentation Transcript

General Structural Equation (LISREL) Models

General Structural Equation (LISREL) Models

Bayesian Biosurveillance Using Causal Networks

G89.2247 Structural Equation Models

Chapter 9 Causal Inference and Marginal Structural Models

Latent Variable and Structural Equation Models: Bayesian Perspectives and Implementation.

Writing about Structural Equation Models

Bayesian Knowledge Tracing and Discovery with Models

Causal Modelling Using Bayesian Networks

Structural Equation Models

Causal Discovery

General Structural Equation (LISREL) Models

General Structural Equation (LISREL) Models

ICPSR General Structural Equation Models

ICPSR General Structural Equation Models

Structural Equation Models An Overview

General Structural Equation (LISREL) Models

Bayesian Biosurveillance Using Causal Networks

Causal Discovery

General Structural Equation (LISREL) Models