300 likes | 314 Views
Mini-Clustering (MC) is an approximate belief updating algorithm for Bayesian networks, controlling complexity with a user-adjustable parameter. Empirical evaluation shows MC is effective. Cluster Tree Elimination (CTE) is an exact algorithm based on tree decomposition. MC aims to reduce complexity by partitioning clusters into mini-clusters. It computes upper bounds for joint probabilities, helping in belief updating tasks.
E N D
Two Approximate Algorithms for Belief Updating Mini-Clustering - MC Robert Mateescu, Rina Dechter, Kalev Kask. "Tree Approximation for Belief Updating", AAAI-2002 Iterative Join-Graph Propagation - IJGP Rina Dechter, Kalev Kask and Robert Mateescu. "Iterative Join-Graph Propagation”, UAI 2002
What is Mini-Clustering? • Mini-Clustering (MC) is an approximate algorithm for belief updating in Bayesian networks • MC is an anytime version of join-tree clustering • MC applies message passing along a cluster tree • The complexity of MC is controlled by a user-adjustable parameter, the i-bound • Empirical evaluation shows that MC is a very effective algorithm, in many cases superior to other approximate schemes (IBP, Gibbs Sampling)
A B E C D F G Belief networks • The belief updating problem is the task of computing the posterior probability P(Y|e) of query nodes Y X given evidence e.We focus on the basic case where Y is a single variable Xi
Tree decompositions A B C p(a), p(b|a), p(c|a,b) BC B C D F p(d|b), p(f|c,d) BF B E F p(e|b,f) A B EF E F G p(g|e,f) E C D F Belief network Tree decomposition G
Cluster Tree Elimination • Cluster Tree Elimination (CTE) is an exact algorithm • It works by passing messages along a tree decomposition • Basic idea: • Each node sends only one message to each of its neighbors • Node u sends a message to its neighborvonly when ureceived messages from all its other neighbors
Cluster Tree Elimination • Previous work on tree clustering: • Lauritzen, Spiegelhalter - ‘88 (probabilities) • Jensen, Lauritzen, Olesen - ‘90 (probabilities) • Shenoy, Shafer - ‘90, Shenoy - ‘97 (general) • Dechter, Pearl - ‘89 (constraints) • Gottlob, Leone, Scarello - ‘00 (constraints)
Belief Propagation x1 h(u,v) v u x2 xn
A B E C D F G Cluster Tree Elimination - example ABC 1 BC BCDF 2 BF BEF 3 EF EFG 4
Cluster Tree Elimination - the messages A B C p(a), p(b|a), p(c|a,b) 1 BC B C D F p(d|b), p(f|c,d) h(1,2)(b,c) 2 sep(2,3)={B,F} elim(2,3)={C,D} BF B E F p(e|b,f), h(2,3)(b,f) 3 EF E F G p(g|e,f) 4
Cluster Tree Elimination - properties • Correctness and completeness: Algorithm CTE is correct, i.e. it computes the exact joint probability of a single variable and the evidence. • Time complexity: O ( deg (n+N) d w*+1 ) • Space complexity: O ( N d sep) where deg = the maximum degree of a node n = number of variables (= number of CPTs) N = number of nodes in the tree decomposition d = the maximum domain size of a variable w* = the induced width sep = the separator size
Mini-Clustering - motivation • Time and space complexity of Cluster Tree Elimination depend on the induced width w* of the problem • When the induced width w* is big, CTE algorithm becomes infeasible
Mini-Clustering - the basic idea • Try to reduce the size of the cluster (the exponent); partition each cluster into mini-clusters with less variables • Accuracy parameter i = maximum number of variables in a mini-cluster • The idea was explored for variable elimination (Mini-Bucket)
Mini-Clustering • Suppose cluster(u) is partitioned into p mini-clusters: mc(1),…,mc(p), each containing at most i variables • TC computes the ‘exact’ message: • We want to process each fmc(k) f separately
Mini-Clustering • Approximate each fmc(k) f , k=2,…,p and take it outside the summation • How to process the mini-clusters to obtain approximations or bounds: • Process all mini-clusters by summation - this gives an upper bound on the joint probability • A tighter upper bound: process one mini-cluster by summation and the others by maximization • Can also use mean operator (average) - this gives an approximation of the joint probability
Idea of Mini-Clustering Split a cluster into mini-clusters =>bound complexity
Mini-Clustering - example ABC 1 BC BCDF 2 BF BEF 3 EF EFG 4
Mini-Clustering - the messages, i=3 A B C p(a), p(b|a), p(c|a,b) 1 BC B C D p(d|b), h(1,2)(b,c) C D F p(f|c,d) 2 sep(2,3)={B,F} elim(2,3)={C,D} BF B E F p(e|b,f), h1(2,3)(b), h2(2,3)(f) 3 EF E F G p(g|e,f) 4
Cluster Tree Elimination vs. Mini-Clustering ABC ABC 1 1 BC BC BCDF BCDF 2 2 BF BF BEF BEF 3 3 EF EF EFG 4 EFG 4
Mini-Clustering • Correctness and completeness: Algorithm MC(i) computes a bound (or an approximation) on the joint probability P(Xi,e) of each variable and each of its values. • Time & space complexity: O(n hw* d i) where hw* = maxu | {f | f (u) } |
Normalization • Algorithms for the belief updating problem compute, in general, the joint probability: • Computing the conditional probability: • is easy to do if exact algorithms can be applied • becomes an important issue for approximate algorithms
Normalization • MC can compute an (upper) bound on the joint P(Xi,e) • Deriving a bound on the conditional P(Xi|e) is not easy when the exact P(e) is not available • If a lower bound would be available, we could use:as an upper bound on the posterior • In our experiments we normalized the results and regarded them as approximations of the posterior P(Xi|e)
Experimental results • Algorithms: • Exact • IBP • Gibbs sampling (GS) • MC with normalization (approximate) • Networks (all variables are binary): • Coding networks • CPCS 54, 360, 422 • Grid networks (MxM) • Random noisy-OR networks • Random networks We tested MC with max and mean operators • Measures: • Normalized Hamming Distance (NHD) • BER (Bit Error Rate) • Absolute error • Relative error • Time
Random networks - Absolute error evidence=0 evidence=10
Coding networks - Bit Error Rate sigma=0.22 sigma=.51
Noisy-OR networks - Absolute error evidence=10 evidence=20
CPCS422 - Absolute error evidence=0 evidence=10
Conclusion • MC extends the partition based approximation from mini-buckets to general tree decompositions for the problem of belief updating • Empirical evaluation demonstrates its effectiveness and superiority (for certain types of problems, with respect to the measures considered) relative to other existing algorithms