210 likes | 346 Views
Inferring Causal Graphs. Computing 882 Simon Fraser University Spring 2002. Applications of Bayes Nets (I). Windows Office “Paper Clip.” Bill Gates: “The competitive advantage of Microsoft lies in our expertise in Bayes Nets.” UBC Intelligent Tutoring System (ASI X-change).
E N D
Inferring Causal Graphs Computing 882 Simon Fraser University Spring 2002
Applications of Bayes Nets (I) • Windows Office “Paper Clip.” • Bill Gates: “The competitive advantage of Microsoft lies in our expertise in Bayes Nets.” • UBC Intelligent Tutoring System (ASI X-change).
Applications of Bayes Nets (II) • University Drop-outs: Search program Tetrad II says that higher SAT score would lead to lower drop-out rate. Carnegie Mellon uses this to reduce its drop-out rates. • Tetrad II recalibrates Mass Spectrometer on earth satellite. • Tetrad II predicts relation between corn exports and exchange rates.
Bayes Nets: Basic Definitions • Defn: A and B are independent iff P(A and B) = P(A) x P(B). • Exercise: Prove that A and B are independent iff P(A|B) = P(A). • Thus independence implies irrelevance.
Independence Among Variables • Let X,Y,Z be random variables. • X is independent of Y iff P(X=x| Y=y) = P(X=x) for all x,y s.t. P(Y=y) > 0. • X is independent of Y given Z iff P(X=x|Y=y,Z=z) = P(Z=z) for all y,z s.t. P(Y=y and Z=z) >0. • Notation: (X Y|Z). • Intuitively: given information Z, Y is irrelevant to X.
Axioms for Informational Relevance • Pearl (2000), p.11. It’s possible to read the symbol as “irrelevant”. Then we can consider a number of axioms for as axiomatizations of relevance, for example: • Symmetry: if (X Y|Z) then (Y X|Z). • Decomposition: if (X YW|Z) then (X Y|Z).
Markovian Parents • In constructing a Bayes net, we look for “direct causes” – variables that “immediately determine” the value of another value. Such direct causes “screen off” other variables. • Formally: Let an ordering of variables X1, …, Xn be given. Consider Xj. Let PA be any subset of X1,…,Xj-1. Suppose that P(Xj|PA) = P(Xj|X1,..,Xj) and that no subset of PA has this property. Then PA forms the Markovian parents of Xj.
Markovian Parents and Bayes Nets • Given an ordering of variables, we can construct a causal graphs by drawing arrows between Markovian parents and children. Note that graphs are suitable for drawing the distinction between “direct” and “intermediate” causes. • Exercise: For the variables in figure 1.2, construct a Bayes net in the given ordering. • Exercise: Construct a Bayes net along the ordering (X5, X1, X3, X2, X4).
Independence in Bayes Nets • Note how useful irrelevance information is – think of a Prolog-style logical database. • A typical problem: Given some information Z, and a query about X, is Y relevant to X? • For Bayes nets, the d-separation criterion is a powerful answer.
d-separation • In principle, information can flow along any path between two variables X and Y. • Provisos: A path is blocked by any collider. • Conditioning on a node reverses its status. • Conditioning on non-collider makes it block. • Conditioning on collider or its descendant makes it unblocked.
d-separation characterizes independence • If X,Y d-separated by Z in a DAG G, then (X Y|Z) in all probability distributions compatible with G. • If X,Y not d-separated by Z in a DAG G, then not [(X Y|Z) in all probability distributions compatible with G.].
Observational Equivalence • Suppose we can observe the probabilities of various occurrences (rain vs. umbrellas, smoking vs. lung cancer etc.). • How does prob constrain graph? • Two causal graphs G1,G2 are compatible with the same probs iff. • G1 has the same adjacencies as G2 and the same v-structures (basically, colliders).
Observational Equivalence: Examples (I) • In sprinkler network, cannot tell whether X1 -> X2 or vice versa. But can tell that X2 -> X4 and X4 -> X5. • General note: You cannot always tell in machine learning what the correct hypothesis is even if you have all possible data -> need more assumptions or other kinds of data.
Observational Equivalence: Examples (II) • Vancouver sun, March 29, 2002. “Adolescents …. Are more likely to turn to violence in their early twenties if they watch more than an hour of television a day… The team tracked more than 700 children and took into account the “chicken and egg” question: Does watching television cause aggression or do people prone to aggression watch more television?” [Science, Dr. Johnson, Columbia U.]
Two Models of Aggressive behaviour Disposition to aggression TV watching Violent behavour Disposition to aggression TV watching Violent behavour Are these two graphs observationally distinguishable?
Minimal Graphs • A graph G is minimal for a probability distribution P iff • G is compatible with P, and • no subgraph of G is compatible with P. • Example: not minimal if A {B,C,D} A C D B
Note on minimality • Intuitively, minimality requires that you add an edge between A and B only if there is some dependence between A and B. • In statistical tests, dependence is observable but independence is not. • So minimality amounts to “assume independence until dependence is observed”. • That is exactly the strategy for minimizing mind changes! (“assume reaction is impossible until observed”).
Stable Distributions • A distribution P is stable iff there is a graph G such that (X Y |Z) in P iff X and Y are d-separated given Z in G. • Intuitively, stability rules out “exact counterbalance”: two forces both having a causal effect but cancelling out each other exactly in every circumstance.
Inferring Causal Structure: The IC Algorithm • Assume a stable probability distribution P. • Find a minimal graph for P with as many edges directed as possible. • General idea: First find variables that are “directly causally related”. Connect those. Add arrows as far as possible.
Inferring Causal Structure: The IC Algorithm • For each pair of variables X and Y, look for a “screen off” set S(X,Y) s.t. X Y| S(X,Y) holds. If there is no such set, add an undirected edge between X and Y. • For each pair X,Y with a common neighbour Z, check if Z is part of a “screening off” set S(X,Y). If not, make Z a common consequence of X,Y. • Orient edges without creating cycles or v-structures.
Rules for Orientation • Given a b, b – c add b c if a,c are not linked (no new collider). • Given a c b, a – b add a b (no cycle). • Given a – c d and c d b and a – b add a b if c,d are not linked (no cycle + no new collider). • Given a – c b and a – d b and a – b add a b if c,d are not linked (no cycle + no new collider).