330 likes | 481 Views
From Variable Elimination to Junction Trees. Yaniv Hamo and Mark Silberstein. Variable Elimination – what is it and why we need it. R. Reference. S. Submit HW. P. Pass course. Variable elimination is needed for answering questions such as “ so, do I pass this course or not? ”.
E N D
From Variable Elimination to Junction Trees Yaniv Hamo and Mark Silberstein
Variable Elimination – what is it and why we need it R Reference S Submit HW P Pass course Variable elimination is needed for answering questions such as “so, do I pass this course or not?”
So, do I pass this course or not? • We want to compute P(p) • By definition: • In our case (chain): P(p) = 0.1*(0.8*0.9+0.2*0.5)+0.9*(0.4*0.9+0.6*0.5) = 0.676 We essentially eliminated nodes R and S
The General Case – Inference • Network describes a unique probability distribution P • We use inference as a name for the process of computing answers to queries about P • There are many types of queries we might ask. • Most of these involve evidence • An evidence e is an assignment of values to a set Evariables in the domain • Without loss of generality E = { Xk+1, …, Xn } • Simplest query: compute probability of evidence • This is often referred to as computing the likelihood of the evidence
The “Asia” network: Smoking Visit to Asia Tuberculosis Lung Cancer Abnormality in Chest Bronchitis Dyspnea X-Ray Another example of Variable Elimination
S V L T B A X D We are interested in P(d) - Need to eliminate: v,s,x,t,l,a,b Initial factors: Brute force:
S V L T B A X D Eliminate variables in order: Initial factors: [ Note: fv(t) = P(t) In general, result of elimination is not necessarily a probability term ]
S V L T B A X D Eliminate variables in order: Initial factors: [ Note: result of elimination may be a function of several variables ]
S V L T B A X D Eliminate variables in order: Initial factors: [ Note: fx(a) = 1 for all values of a ]
S V L T B A X D Eliminate variables in order: Initial factors:
S V L T B A X D Eliminate variables in order: Initial factors:
S V L T B A X D Eliminate variables in order: Initial factors:
S V L T B A X D Eliminate variables in order: Initial factors:
S V L T B A X D Intermediate factors In our previous example: With a different ordering: Complexity is exponential in the size of these factors!
Notes about variable elimination • Actual computation is done in the elimination steps • Computation depends on the order of elimination • For each query we need to compute everything again! • Many redundant calculations
The idea • Compute joint over partitions of U • small subset of U (typically made of a variable and its parents) - clusters • not necessary disjoint • Calculate • To compute P(X) need far less operations:
Junction Trees • The junction tree algorithms generalize Variable Elimination to the efficient, simultaneous execution of a large class of queries. • Theoretical background was shown in the previous lecture
Constructing Junction Trees • Moralize the graph (if directed) • Choose a node ordering and find the cliques generated by variable elimination. This gives a triangulation of the graph • Build a junction graph from the eliminated cliques • Find an appropriate spanning tree
b b b g d d g h h g d h a e a a e e c c c f f f Step 1: Moralization G = ( V , E ) GM 1. For all w V: • For all u,vpa(w) add an edge e=u-v. 2. Undirect all edges.
b b g h g h d d e a a e c c f f GM GT Step 2: Triangulation Add edges to GM such that there is no cycle with length 4 that does not contain a chord. NO YES
A A A A B B C C B C B B C B C C B C A A A B C D E E E D E D D E E D E D D D E F F F G F G G G G F F F G G F G H H H H H H H Step 2: Triangulation (cont.) • Each elimination ordering triangulates the graph, not necessarily in the same way: A A B C D E F G H H
Step 2: Triangulation (cont.) • Intuitively, triangulations with as few fill-ins as possible are preferred • Leaves us with small cliques (small probability tables) • A common heuristic: Repeat until no nodes remain: • Find the node whose elimination would require the least number of fill-ins (may be zero). • Eliminate that node, and note the need for a fill-in edge between any two non-adjacent neighbors. • Add the fill-in edges to the original graph.
a a a b d g h e a c f b c g b c g b c e e e d d d f f f a a a a a b c b e e e e d d d vertex induced added removed clique edges 1 h egh - 2 g ceg - 3 f def - 4 c ace a-e vertex induced added removed clique edges 5 b abd a-d 6 d ade - 7 e ae - 8 a a - Eliminate the vertex that requires least number of edges to be added. h GM GT
Step 3: Junction Graph • A junction graph for an undirected graph G is an undirected, labeled graph. • The nodes are the cliques in G. • If two cliques intersect, they are joined in the junction graph by an edgelabeled with their intersection.
b b b b g h d d d d g h g g h d g h d a e e a e a e e e e a a e a c c c c c f f f f Bayesian Network G = ( V , E ) Moral graph GM Triangulated graph GT a abd ace ad ae ce ade e ceg e eg de e seperators egh def e Cliques e.g. ceg egh = eg Junction graph GJ (not complete)
Step 4: Junction Tree • A junction tree is a sub-graph of the junction graph that • Is a tree • Contains all the cliques (spanning tree) • Satisfies the running intersection property: for each pair of nodes U, V, all nodes on the path between U and V contain (as seen in the previous part of the lecture)
Step 4: Junction Tree (cont.) • Theorem: An undirected graph is triangulated if and only if its junction graph has a junction tree • Definition: The weight of a link in a junction graph is the number of variable in the label. The weight of a junction tree is the sum of weights of the labels. • Theorem: A sub-tree of the junction graph of a triangulated graph is a junction tree if and only if it is a spanning of maximal weight
ae ce ad a abd ace ad ae ce de eg egh abd ade ceg ace def ade e ceg e eg de e egh def e There are several methods to find MST. Kruskal’s algorithm: choose successively a link of maximal weight unless it creates a cycle. Junction tree GJT Junction graph GJ (not complete)
Another example • Compute the elimination cliques(the order here is f, d, e, c, b, a). • Form the complete junction graph over the maximal elimination cliques and find a maximum-weight spanning tree.
Junction Trees and Elimination Order • We can use different orderings in variable elimination - affects efficiency. • Each ordering corresponds to a junction tree. • Just as some elimination orderings are more efficient than others, some junction trees are better than others. (Recall our mention of heuristics for triangulation.)
A separator S divides the remaining variables into two groups Variables in each group appearon one side in the cluster tree Examples: {A,B}: {L, S, T, V} & {D, X} {A,L}: {T, V} & {B,D,S,X} {B,L}: {S} & {A, D,T, V, X} {A}: {X} & {B,D,L, S, T, V} {T}; {V} & {A, B, D, K, S, X} S V T,V T L T A,L,T B,L,S B A A,L B,L X D A,L,B X,A A,B A A,B,D OK, I have this tree, now what?
x fX(S) fY(S) S B A y Elimination in Junction Trees • Let X and Ybe the partition induced by S Observation: • Eliminating all variables in Xresults in a factor fX(S) • Proof: Since S is a separator only variables in S are adjacentto variables in X • Note:The same factor would result, regardless of theelimination order
x1 x2 S1 S2 C S y Recursive Elimination in Junction Trees • How do we compute fX(S) ? • By recursive decomposition alongcluster tree • Let X1 and X2 be the disjoint partitioning of X \ Cimplied by theseparators S1 and S2 • Eliminate X1to get fX1(S1) • Eliminate X2 to get fX2(S2) • Eliminate variables in C \ S toget fX(S)