1 / 29

Naoki Tanaka , Shohei Shimizu, Takashi Washio

Estimation of Causal D irection in the Presence of Latent C onfounders U sing a Bayesian LiNGAM Mixture M odel. Naoki Tanaka , Shohei Shimizu, Takashi Washio The Institute of Scientific and Industrial Research, Osaka University. Outline. Motivation Background Our Approach

walter
Download Presentation

Naoki Tanaka , Shohei Shimizu, Takashi Washio

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Estimation of Causal Direction in the Presence of Latent Confounders Using a Bayesian LiNGAM Mixture Model Naoki Tanaka, Shohei Shimizu, Takashi Washio The Institute of Scientific and Industrial Research, Osaka University

  2. Outline • Motivation • Background • Our Approach • Our Model: Bayesian LiNGAM Mixture • Simulation Experiments

  3. Motivation • Recently, estimation of causal structure attracts much attention in machine learning. • Epidemiology • Genetics Cause Sleep problems Depression mood Latent confounder • The estimation results can be biased if there are latent confounders. → Unobserved variables that have more than one observed child variables. Observed variables • We propose a new estimationapproach that can solve the problem.

  4. Outline • Motivation • Background • Our Approach • Our Model: Bayesian LiNGAM Mixture • Simulation Experiments

  5. LiNGAM(Linear Non-Gaussian Acyclic Model)[Shimizu et al., 2006] • The relations between variables are linear. • Observed variables are generated from a DAG(Directed Acyclic Graphs). 1.4 0.5 -0.8 • External influences are non-Gaussian. • No latent confounders. →are mutually independent. • LiNGAM is an identifiable causal model.

  6. A Problem of LiNGAM • Latent confounders make dependent. →The estimation results can be biased. dependent Patients’ condition serious Patients’ condition mild Medicine A Survival rate Medicine A Survival rate

  7. LiNGAM with Latent Confounders[Hoyer et al., 2008] • LvLiNGAM(Latent variable LiNGAM) :Latent variables ・Independent ・Non-Gaussian :Represent effects ofon

  8. A Problem in Estimation of LiNGAM with Latent Confounders • Existing methods: • An estimation method using overcomplete ICA.[Hoyer et al., 2008] →Suffers from local optima andrequires large sample sizes. • Estimates unconfounded causal relations. [Entner and Hoyer, 2011; Tashiro et al., 2012]→Cannot estimate acausal direction of two observed variables that are affected by latent confounders. • We propose an alternative. • Computationally simpler. • Capable of finding a causal direction in the presence of latent confounders.

  9. Outline • Motivation • Background • Our Approach • Our Model: Bayesian LiNGAM Mixture • Simulation Experiments

  10. Basic Idea of Our Approach • Assumption • Continuouslatent confounders can be approximated by discrete variables. →LiNGAM with latent confounders reduces toLiNGAM mixture model.[Shimizu et al., 2008] • Estimation • Estimation of LiNGAM mixture. [Mollah et al., 2006] • Also suffers from local optima. • Propose to use Bayesian approach. • Bayesian approach for basic LiNGAM. [Hoyer et al., 2009]

  11. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + LiNGAM Mixture Model [Shimizu et al.,2008] • A data generating model of observed variable within class is Matrix form mean mean Class 1 0.8 Class 2 0.8 • Existing estimation methods of LiNGAM mixture model also suffer from local optima.[Mollahet al., 2006]

  12. Relation of Latent Variable LiNGAM and LiNGAM Mixture (1) • We assume that continuouslatent confounderscan be approximated by discrete variableshavingseveral values with good precision. • The combination of the discrete values determinewhich “class” an observation belongs to. →within the same classare mutuallyindependent. →It is simpler than incorporating latent confounders in LiNGAM directly. independent

  13. Class 4 Relation of Latent Variable LiNGAM and LiNGAM Mixture (2) • Asimple example • If latent confounders and can be approximated by 0 and 1 … Class 2 Class 3 Class 2 Class 4 Class 3 Class 2 Class 4 Class 3 Class 4 .2 .3 Class 1 Class 4 Latent Variable LiNGAM LiNGAM Mixture 0.7 0.7 0.7 0.7 0.7 0.7 0.9 0.9 0.9 0.9 0.9 0.7 0.9 0.9 0.6 0.3 0.6 0.6 0.6 0.6 0.3 0.3 0.3 0.6 0.3 0.6 0.3 0.6 reduces .3 Class 4 .2 .7 Class 2 Class 3 Class 4 Class 4 Class 2 Class 2 Class 4 Class 3 Class 3 Class 1 0.7 0.9 0.6 0.6 0.3 0.7 0.7 0.7 0.9 0.7 0.7 0.7 0.9 0.6 0.6 0.6 0.6 0.6 0.6 0.3 .2 .7 .9 .6 0.3 .7 .2 .9 .9 .3 .2 .3 0.3

  14. Outline • Motivation • Background • Our Approach • Our Model: Bayesian LiNGAM Mixture • Simulation Experiments

  15. Bayesian LiNGAMMixture Model (1) • The data within class are assumed to be generatedby the LiNGAM model. → and , the densities of , have no relation to latent confounders , so they are notdifferent between classes. Although changes … Density do not change does not change • and are the same between classes, so we replace and of the LiNGAM mixture model by and : • Then their probability densityis

  16. Bayesian LiNGAM Mixture Model (2) • The probability density of the data within each class is mixed according to some weights. • : multinomial distribution. • The parameters of the multinomial distribution:Dirichlet distribution • A typical prior for the parameters of the multinomial distribution. • Conjugate prior for multinomial distribution.

  17. Compare Three LiNGAM Mixture Models • Select the model with the largest log-marginal likelihood. • There are only three (, and ) models between two observed variables because of the assumption of acyclicity. class class class

  18. Log-marginal Likelihood of Our Model • Bayes’ theorem • Log-marginal likelihood is calculated as follows: • We use Monte Carlo integration to compute the integral. • The assumption of i.i.d. data, LiNGAM-mixture Prior distribution

  19. Distribution of • follows ageneralized Gaussian distributionwith zero means. →Includes Gaussian, Laplace, continuous uniform and many non-Gaussian distributions. • is the Gamma function.

  20. Prior Distributions and the Number of Classes • Prior distribution • and • , and • can be calculated by using the equation of . Inv-Gamma(3,3) • How to select the number of classes. • Note that ‘true ’does not exist. Selects the best number of classes.(painted in orange) In a Dirichlet process mixture model, [Antoniak,1974] Selects the best model. (letter in red)

  21. Outline • Motivation • Background • Our Approach • Our Model: Bayesian LiNGAM Mixture • Simulation Experiments

  22. Simulation Settings(1) • Generated data using a LiNGAM with latent confounders.[Hoyer et al., 2008] • 100 trials. -1 0.7 0.9 0.6 0.8 0.3 (This graph is .) 0.8 • The distributions of latent variables (,,, and ) are randomly selected from the following three non-Gaussian distributions: Mixture of two Gaussian distribution (asymmetric) Mixture of two Gaussian distribution (symmetric) Laplace distribution

  23. Simulation Settings(2) • Two methods for comparison: • Pairwise likelihood ratios for estimation of non-Gaussian SEMs [Hyvärinen et al., 2013]→Assumes no latent confounders. • PairwiseLvLiNGAM[Entner et al., 2011]→Finds variable pairs that are not affected by latent confounders and then estimate a causal ordering of one to the other.

  24. SimulationResults () ( → ) ( ← ) True: The number of correct answers The number of correct answers The number of correct answers • Our method is most robust against existing latent confounders. Sample size Sample size Sample size • “(Number of outputs)” is the number of estimation by PairwiseLvLiNGAM. • For the details, Correct answers / Number of outputs

  25. Conclusionsand Future Work • A challenging problem: Estimation of causal directionin the presence of latent confounders. • Latent confounders violate the assumption of LiNGAM and can bias the estimation results. • Proposed a Bayesian LiNGAM mixture approach. • Capable of finding causal direction in the presence of latent confounders. • Computationally simpler: no iterative estimation in the parameter space. • In this simulation, our method was better than two existing methods. • Future work • Test our method on a wide variety of real datasets.

  26. Histograms of

  27. Density of a Transformation[Hyvärinen et al., 2001] • e.g.) • is the density of andis the density of . • is i.i.d data, so .Similarly, • We can rewrite LiNGAM in a matrix form. • could be permuted by simultaneous equal row and column permutations to be strictly lower triangular due to the acyclicity assumption. [Bollen, 1989]→is lower triangular whose diagonal elements are all 1. • A determinant of lower triangularequals the product of its diagonal elements. →

  28. Gaussian vs. Non-Gaussian Gaussian Non-Gaussian (uniform) ( → ) ( ← )

More Related