1 / 48

Causal Faithfulness and Simplicity

Causal Faithfulness and Simplicity. Peter Spirtes, Jiji Zhang. Goals. Faithfulness comes in several flavors and is a kind of principle that selects simpler (in a certain sense) over more complicated models.

ivy
Download Presentation

Causal Faithfulness and Simplicity

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Causal Faithfulness and Simplicity Peter Spirtes, Jiji Zhang

  2. Goals • Faithfulness comes in several flavors and is a kind of principle that selects simpler (in a certain sense) over more complicated models. • We show how to weaken the assumption of standard faithfulness so that it needs to be applied in fewer circumstances. • We show how to weaken the assumption of strong (ε)-faithfulness) so that it does not prohibit the existence of weak edges. • We show how to modify the causal search algorithms so that they make fewer mind changes as the sample size grows.

  3. Example of SGS Algorithm X Y Z W X Y Z W X Y Z W X Y Z W • True Graph • W = aZ + εW • Z = bX + cY + εZ • X = εX • Y = εY X Y Z W IP(W,X|Z) = 0 IP(W,Y|Z) = 0 IP(X,Y|∅) = 0

  4. SGS algorithm • S1. Form the complete undirected graph H on the given set of variables V. • S2. For each pair of variables X and Y in V, search for a subset S of V\{X, Y} such that X and Y are independent conditional on S. Remove the edge between X and Y in Hiff such a set is found. • S3. Let K be the graph resulting from S2. For each unshielded triple <X, Y, Z> (i.e., X and Y are adjacent, Y and Z are adjacent, but X and Z are not adjacent), if X and Z are independent conditional on some subset of V\{X, Y} that does not contain Y, then orient the triple as a collider: XY Z. • S4. Execute the entailed orientation rules.

  5. Causal Faithfulness Assumption • Causal Markov Assumption: For a set of variables for which there are no unmeasured common causes, each variable is independent of its non-effects conditional on its direct causes. • Non-obvious equivalent formulation: If IG(X,Y|Z) in causal DAG G with no unmeasured common causes then IP(X,Y|Z) = 0. • If IP(X,Y|Z) = 0 then IG(X,Y|Z) in causal DAG G. • Converse of Causal Markov Assumption. • If IP(X,Y|Z) is a rational function of parameters, then violations are Lebesgue measure 0.

  6. Three Faces of Faithfulness • Reduction of Underdetmination • If I(A,B|∅) then prefer A → C ← B to A→ C→ B • Computational Efficiency • If A– C– B and I(A,B|∅) then don’t need to check I(A,B|C). • Statistical Efficiency • The Markov equivalence class can be found without testing independence conditional on a set with more than maximum degree of any variable in the true causal graph.

  7. Faithfulness Assumptions and Pointwise Consistency • If causal sufficiency, Causal Markov and Causal Faithfulness Assumptions, then there exist pointwise consistent estimators of Markov equivalence class • SGS • PC • GES (Gaussian, multinomial) • If just assume Causal Markov Assumption and causal sufficiency there are no pointwise consistent estimators of Markov Equivalence Class • Gaussian • Multinomial • Unrestricted

  8. Faithfulness Assumptions and Uniform Consistency • If causal sufficiency, Causal Markov and Causal Faithfulness Assumptions, then no uniform consistent estimator of Markov Equivalence Class • Gaussian • Multinomial • Unrestricted

  9. Kalish and Buhlmann Assumptions • (A4: ε-faithfulness) The partial correlations between X(i) and X( j) given {X(r); r k} for some set k{1,…,pn}\{i,j} are denoted by rn;i,j|k. Their absolute values are bounded from below and above:

  10. Kalisch and Buhlmann Theorem

  11. Kalish and Buhlmann Assumptions • Uhler et al.: (A4) tends to be violated fairly often, if the parameter values are assigned randomly, and ε is not very small. • There are two ways to get very small partial correlations – almost cancellations and very weak edges. • (A4) forbids both – it entails that there are no very weak edges.

  12. Discontinuities in Limiting Output X Y X Y X Y X Y Z Z Z Z W W W W

  13. Behavior as Sample Size Grows X Y IP(W,{X,Y}|Z) IP(W,{X,Y}|∅) Z IP(X,Y|∅) IP(W,Z|∅) W Output Medium- Sample X Y IP(W,{X,Y}|Z) IP(W,{X,Y}|∅) Z IP(X,Y|∅) IP(W,Z|∅) W Output Small Sample X Y IP(W,{X,Y}|{Z}) IP(W,{X,Y}|∅) Z IP(X,Y|∅) IP(W,Z|∅) W Output Large Sample X Y IP(W,{X,Y}|{Z}) IP(W,{X,Y}|∅) Z IP(X,Y|∅) IP(W,Z|∅) W Output Medium+ Sample

  14. Desired Behavior as Sample Size Grows X Y IP(W,{X,Y}|Z) IP(W,{X,Y}|∅) Z IP(X,Y|∅) IP(W,Z|∅) W Output Medium- Sample X Y IP(W,{X,Y}|Z) IP(W,{X,Y}|∅) Z IP(X,Y|∅) IP(W,Z|∅) W Output Small Sample X Y IP(W,{X,Y}|{Z}) IP(W,{X,Y}|∅) Z IP(X,Y|∅) IP(W,Z|∅) W Output Large Sample X Y IP(W,{X,Y}|{Z}) IP(W,{X,Y}|∅) Z IP(X,Y|∅) IP(W,Z|∅) W Output Medium+ Sample

  15. Behavior as Sample Size Grows X → Y → Z → W X – Y – Z – W X – Y – Z → W IP(X,Z|Y) IP(X,Z|Y) IP(X,Z|Y) IP(Y,W|{X,Z)} IP(Y,W|{X,Z)} IP(Y,W|{X,Z)} IP(X,W|∅) IP(X,W|∅) True Graph Small Sample Large Sample

  16. Desired Behavior as Sample Size Grows X → Y → Z → W X – Y – Z – W X – Y – Z → W IP(X,Z|Y) IP(X,Z|Y) IP(X,Z|Y) IP(Y,W|{X,Z)} IP(Y,W|{X,Z)} IP(Y,W|{X,Z)} IP(X,W|∅) IP(X,W|∅) True Graph Small Sample Large Sample

  17. True Graph X Y Z W

  18. Behavior as Sample Size Grows X Y IP(W,{X,Y}|Z) IP(W,{X,Y}|∅) Z IP(X,Y|∅) IP(W,Z|∅) W Output Medium- Sample X Y IP(W,{X,Y}|Z) IP(W,{X,Y}|∅) Z IP(X,Y|∅) IP(W,Z|∅) W Output Small Sample X Y IP(W,{X,Y}|{Z}) IP(W,{X,Y}|∅) Z IP(X,Y|∅) IP(W,Z|∅) W Output Large Sample X Y IP(W,{X,Y}|{Z}) IP(W,{X,Y}|∅) Z IP(X,Y|∅) IP(W,Z|∅) W Output Medium+ Sample

  19. CSGS • S3*. Let K be the undirected graph resulting from S2. For each unshielded triple <X, Y, Z>, • If X and Z are not independent conditional on any subset of V\{X, Y} that contains Y, then orient the triple as a collider: XY Z. • If X and Z are not independent conditional on any subset of V\{X, Y} that does not contain Y, then mark the triple as a non-collider. • Otherwise, mark the triple as ambiguous (or unfaithful).

  20. Assumptions about which independencies • Adjacency – If X – Y in the causal DAG then IP(X,Y|Z) ≠ 0 for any Z.

  21. Assumptions about which independencies • Triangle – For any three variables that form a triangle in causal DAG G • If Z is a non-collider on the path <X, Z, Y>, then X and Y are not independent conditional on any subset of V\{X, Y} that does not contain Z; • If Z is a collider on the path <X, Z, Y>, then X and Y are not independent conditional on any subset of V\{X, Y} that contains Z. • Suppose X → Y ← Z and IP(X,Z|Y) = 0. This is faithful to X → Y →Z. This cannot be detected, so it must be assumed.

  22. Triangle-Faithfulness X ¬I(X,Z|∅) Z¬I(X,Y|Z) Y¬I(Y,Z|∅) X ¬I(X,Z|∅) ¬I(X,Z|W) ¬I(X,Z|Y,W) Z ¬I(Y,Z|∅) ¬I(Y,Z|W) ¬I(Y,Z|X,W) Y ¬I(X,Y|Z) ¬I(X,Y|W) ¬I(X,Y|Z,W) W ¬I(X,W|∅) ¬I(X,W|Z) ¬I(X,W|Y) ¬I(Y,W|∅) ¬I(Y,W|X) ¬I(Y,W|Z) ¬I(Z,W|∅) ¬I(Z,W|X) ¬I(Z,W|Y)

  23. Causal Minimality • The population distribution is not Markov to any proper subDAG of the true causal DAG. • Causal Minimality is entailed by manipulation definition of causation if a distribution is positive. • There is a weaker kind of causal minimality – P-minimality: the population distribution is not Markov to any DAG that entails a proper superset of the conditional independence relations. • Is this sufficient for the correctness of VCSGS?

  24. Example of VCSGS X → Y → Z → W X – Y – Z – W X – Y – Z – W IP(X,Z|Y) IP(X,Z|Y) IP(X,Z|Y) IP(Y,W|{X,Z)} IP(Y,W|{X,Z)} IP(Y,W|{X,Z)} IP(X,W|∅) IP(X,W|∅) True Graph Small Sample Large Sample

  25. Example of VCSGS X → Y → Z → W X – Y – Z – W X – Y – Z →W IP(X,Z|Y) IP(X,Z|Y) IP(X,Z|Y) IP(Y,W|{X,Z)} IP(Y,W|{X,Z)} IP(Y,W|{X,Z)} IP(X,W|∅) IP(X,W|∅) True Graph Small Sample Large Sample

  26. Example of VCSGS X → Y → Z → W X – Y – Z – W X – Y – Z →W IP(X,Z|Y) IP(X,Z|Y) IP(X,Z|Y) IP(Y,W|{X,Z)} IP(Y,W|{X,Z)} IP(Y,W|{X,Z)} IP(X,W|∅) IP(X,W|∅) True Graph Small Sample Large Sample

  27. VCSGS algorithm • V1. Form the complete undirected graph H on the given set of variables V. • V2. For each pair of variables X and Y in V, search for a subset S of V\{X, Y} such that X and Y are independent conditional on S. Remove the edge between X and Y in H and mark the pair <X, Y> as ‘apparently non-adjacent’, if and only if such a set is found. • V3. Let K be the graph resulting from V2. For each apparently unshielded triple <X, Y, Z> (i.e., X and Y are adjacent, Y and Z are adjacent, but X and Z are apparently non-adjacent), • If X and Z are not independent conditional on any subset of V\{X, Y} that contains Y, then orient the triple as a collider: XY Z. • If X and Z are not independent conditional on any subset of V\{X, Y} that does not contain Y, then mark the triple as a non-collider. • Otherwise, mark the triple as ambiguous (or unfaithful), and mark the pair <X, Z> as ‘definitely non-adjacent’.

  28. VCSGS algorithm • V4. Execute the same orientation rules as in S4, until none of them applies. • V5. Let M be the graph resulting from V4. For each consistent disambiguation of the ambiguous triples in M (i.e., each disambiguation that leads to a pattern), test whether each vertex V in the resulting pattern satisfies the Markov condition. If V and W satisfy the Markov condition in every pattern, then mark the ‘apparently non-adjacent’ <V,W>pair as ‘definitely non-adjacent’.

  29. Inclusion Relations for given P Faithfulness Adjacency-Faithfulness Triangle-Faithfulness P-Minimality

  30. Faithfulness Assumptions and Pointwise Consistency • If Triangle Faithfulness Assumption, Causal Minimality Assumption, and Causal Markov Assumption, then VCSGS is a consistent estimator of the extended Markov equivalence class. • Is it complete?

  31. Conjecture • V5*. Let M be the graph resulting from V4. For each consistent disambiguation of the ambiguous triples in M (i.e., each disambiguation that leads to a pattern), test whether each vertex V in the resulting pattern satisfies the Markov condition. If V and W satisfy the Markov condition in some pattern, then mark the ‘apparently non-adjacent’ <V,W>pair as ‘definitely non-adjacent’.

  32. Our Assumptions • Assumption NVV(J): • Assumption UBC(C):

  33. k-Triangle-Faithfulness Assumption • Given a set of variables V, suppose the true causal model over V is M = <P,G>, where P is a Gaussian distribution over V, and G is a DAG with vertices V For any three variables X, Y, Z that form a triangle in G (i.e., each pair of vertices is adjacent), • If Y is a non-collider on the path <X, Y, Z>, then |r(X, Z|W)| ≥ k |eM(X – Z)| for all WV that do not contain Y; and • If Y is a collider on the path <X, Y, Z>, then |r(X, Z|W)| ≥ k |eM(X – Z)| for all WV that do contain Y.

  34. VCSGS (Sample version) • S3* (sample version). Let K be the undirected graph resulting from the adjacency phase. For each unshielded triple <X, Y, Z>, • If there is a set W not containing Y such that the test of r(X, Z|W) = 0 returns 0 (i.e., accepts the hypothesis),and for every set U that contains Y, the test of |r(X,Z|U)| = 0 returns 1 (i.e., rejects the hypothesis), and the test of |r(X,Z|U) – r(X,Z|W)| L returns 0 (i.e., accepts the hypothesis), then orient the triple as a collider: XY Z. • If there is a set W containing Y such that the test of r(X, Z|W) = 0 returns 0 (i.e., accepts the hypothesis),and for every set U that does not contain Y, the test of |r(X,Z|U)| = 0 returns 1 (i.e., rejects the hypothesis), and the test of |r(X,Z|U) – r(X,Z|W)| L returns 0 (i.e., accepts the hypothesis), then mark the triple as a non-collider. • Otherwise, mark the triple as ambiguous.

  35. Uniform Consistency • Say that CSGS(L, n, M) errs if it contains (i) an adjacency not in GM; or (ii) a marked non-collider not in GM, or (iii) an orientation not in GM. • Theorem: Given causal sufficiency of the measured variables V, the Causal Markov, k-Triangle-Faithfulness, NVV(J), and UBC(C) Assumptions, the CSGS algorithm is uniformly consistent in the sense that

  36. Estimation Algorithm • For each vertex Z • If every vertex not adjacent to Z is not confirmed to be non-adjacent to Z return ‘Unknown’ for every edge containing Z • else • For every non-adjacent pair <Y, Z> in EP(G), let the estimate be 0 • For each vertex Z such that all of the edges containing Z are oriented in EP(G), if Y is a parent of Z in EP(G), let the estimate be the sample regression coefficient of Y in the regression of Z on its parents in EP(G).

  37. Structural Coefficient Distance • Let M1 be an output of the Estimation Algorithm, and M2 be a causal model. We define the structural coefficient distance, d[M1,M2], between M1 and M2 to be where by convention if = “Unknown”.

  38. Edge Estimation Algorithm • E1. Run the CSGS algorithm on an i.i.d. sample of size n from PM. • E2. Let the output from E1 be CSGS(L, n, M). Apply step V5 in the VCSGS algorithm (from section 3), using tests of zero partial correlations and record which non-adjacencies are confirmed. • E3. Apply the Estimation Algorithm to CSGS(L, n, M), the confirmed non-adjacencies, and the sample of size n.

  39. Uniform Consistency • Given causal sufficiency of the measured variables V, the Causal Markov, k-Triangle-Faithfulness, NVV(J), and UBC(C) Assumptions, the Edge Estimation I algorithm is uniformly consistent in the sense that for every > 0 • For a large enough and dense enough graph, this still allows for the possibility of large manipulation errors (due to many small edge errors.

  40. Breaking the Markov Equivalence Class X1X2X3 1.0 0.01 1.0 0.7877781 0.612157 1.0

  41. Breaking the Markov Equivalence Class • if k > 0.014, then the k-Triangle-Faithfulness Assumption is violated for models M2 and M3, but not for M1. • If 0.008 < k < 0.014 then the k-Triangle-Faithfulness Assumption is violated for models M3, but not for M1 or M2.

  42. Edge Estimation Algorithm II • E1. Run Edge Estimation Algorithm I. • E2. Set ForbiddenOrientations = {}. • E3. For each maximal clique in CSGS(L, n, M) such that if a vertex in the clique is not adjacent to some vertex not in the clique, it is definitely non-adjacent • (i) for each possible orientation O of all of the unoriented edges in the maximal clique • Apply the orientation O to each of the unoriented edges. • Apply Meeks’ orientation rules. • If application of the rules produces a cycle or a new unshielded collider add O to ForbiddenOrientations • Add O to ForbiddenOrientations if for any Y and W such that Y is a non-collider the path <X, Y, Z>, and WV and does contain Y

  43. Edge Estimation Algorithm II • E4. For each unoriented edge X – Y in CSGS(L, n, M), if there is only one orientation XY that does not occur in ForbiddenOrientations, and every vertex that Y is not adjacent to, Y is definitely not adjacent to, orient as XY • E5. For each vertex V such that some edge containing V in CSGS(L, n, M) is not oriented, if there is only one orientation of all of the edges containing V that is not in ForbiddenOrientations, and every vertex that V is not adjacent to, V is definitely not adjacent to, let the estimate of each edge equal be the sampleregression coefficient of V on its parents in the non-forbidden orientation.

  44. Uniform Consistency • Theorem: Given causal sufficiency of the measured variables V, the Causal Markov, k-Triangle-Faithfulness, NVV(J), and UBC(C) Assumptions, the Edge Estimation II algorithm is uniformly consistent in the sense that for every > 0 • where O(L,n,M) is the graphical output of the Edge Estimation II algorithm, and is the output of the Edge Estimation II algorithm.

  45. Conclusion • We weaken the assumption of faithfulness so that fewer inferences from conditional independence to d-separation need to be made. • We strengthened the assumption so that it allows one to make inferences from “almost independence” in a probability distribution to d-separation in a causal graph, allowing for the existence of uniformly consistent estimation algorithms.

  46. Conclusion • We changed the concept of correctness to allow for missing weak edges, and saying “don’t know” about some features of Markov equivalence classes. • The new simplicity assumption broke up the Markov equivalence class in the sense that it considers some models in a Markov equivalence class simpler than other models in the same Markov equivalence class. • This allowed for uniformly consistent estimates of linear coefficients in a causal model, as well as causal structure.

  47. Open Questions • Can we get similar results for: • PC • FCI • non-linear models • increasing numbers of variables and vertex degree and decreasing k (analogous to Kalisch and Buhlmann)? • If parameter values are randomly assigned, how often is k-triangle faithfulness violated as a function of • sample size • clique size • parameter distribution • k

  48. References • Kalisch, M., and P. Bühlmann (2007). Estimating high-dimensional directed acyclic graphs with the PC-algorithm. Journal of Machine Learning Research8, 613–636. • Spirtes, P., and Zhang, J. (forthcoming) A Uniformly Consistent Estimator of Causal Effects Under The k-Triangle-Faithfulness Assumption, Statistical Science. • Spirtes, P., and Zhang, J. (submitted) Three Faces of Faithfulness, Synthese.

More Related