1.04k likes | 1.06k Views
Learn about constraint optimization problems in graphical models, from satisfaction to counting and enumeration. Explore various optimization tasks, inference methods, and search algorithms.
E N D
Chapter 13 Constraint Optimization And counting, and enumeration 275 class
From satisfaction to counting and enumeration and to general graphical models ICS-275Spring 2007
Outline • Introduction • Optimization tasks for graphical models • Solving optimization problems with inference and search • Inference • Bucket elimination, dynamic programming • Mini-bucket elimination • Search • Branch and bound and best-first • Lower-bounding heuristics • AND/OR search spaces • Hybrids of search and inference • Cutset decomposition • Super-bucket scheme
E A B red green red yellow green red green yellow yellow green yellow red A D B F G C Constraint Satisfaction Example: map coloring Variables - countries (A,B,C,etc.) Values - colors (e.g., red, green, yellow) Constraints: Task: consistency? Find a solution, all solutions, counting
Propositional Satisfiability = {(¬C),(A v B v C),(¬A v B v E), (¬B v C v D)}.
Constraint Optimization Problemsfor Graphical Models f(A,B,D) has scope {A,B,D}
A B C D F Constraint Optimization Problemsfor Graphical Models f(A,B,D) has scope {A,B,D} • Primal graph = • Variables --> nodes • Functions, Constraints - arcs • f1(A,B,D) • f2(D,F,G) • f3(B,C,F) F(a,b,c,d,f,g)= f1(a,b,d)+f2(d,f,g)+f3(b,c,f) G
Constrained Optimization Example: power plant scheduling
P(S) Smoking P(C|S) P(B|S) Bronchitis Cancer Dyspnoea P(D|C,B) P(X|C,S) X-Ray Probabilistic Networks P(D|C,B) P(S,C,B,X,D) = P(S)· P(C|S)·P(B|S)·P(X|C,S)·P(D|C,B)
A F B C E D Graphical models • A graphical model (X,D,C): • X = {X1,…Xn} variables • D = {D1, … Dn} domains • C = {F1,…,Ft} functions(constraints, CPTS, cnfs) • Primal graph
A F B C E D Graphical models • A graphical model (X,D,C): • X = {X1,…Xn} variables • D = {D1, … Dn} domains • C = {F1,…,Ft} functions(constraints, CPTS, cnfs) • Primal graph A COP problem defined by operators sum, product and min,max over consistent solutions • MPE: maxX j Pj • CSP: X j Cj • Max-CSP: minX j Fj • Optimization: minX j Fj
Outline • Introduction • Optimization tasks for graphical models • Solving by inference and search • Inference • Bucket elimination, dynamic programming, tree-clustering, bucket-elimination • Mini-bucket elimination, belief propagation • Search • Branch and bound and best-first • Lower-bounding heuristics • AND/OR search spaces • Hybrids of search and inference • Cutset decomposition • Super-bucket scheme
A B C D E “Moral” graph P(a)P(b|a)P(c|a)P(d|b,a)P(e|b,c)= P(c|a) P(b|a)P(d|b,a)P(e|b,c) Variable Elimination Computing MPE MPE= B C D E P(a)
bucket B: P(b|a) P(d|b,a) P(e|b,c) B bucket C: P(c|a) C bucket D: D bucket E: e=0 E P(a) bucket A: A MPE Finding Algorithm elim-mpe (Dechter 1996)Non-serial Dynamic Programming (Bertele and Briochi, 1973) Elimination operator
B: P(b|a) P(d|b,a) P(e|b,c) C: P(c|a) D: E: e=0 A: P(a) Generating the MPE-tuple
bucket B: P(b|a) P(d|b,a) P(e|b,c) B bucket C: P(c|a) C bucket D: D bucket E: e=0 E P(a) bucket A: A MPE exp(W*=4) ”induced width” (max clique size) Complexity Algorithm elim-mpe (Dechter 1996)Non-serial Dynamic Programming (Bertele and Briochi, 1973) Elimination operator
A B C D E constraint graph B E C D D C E B A A Complexity of bucket elimination Bucket-elimination is time and space r = number of functions The effect of the ordering: Finding smallest induced-width is hard
E E E E D C D D D C C C B B B B A Directional i-consistency Adaptive d-path d-arc
Mini-bucket approximation: MPE task Split a bucket into mini-buckets =>bound complexity
maxB∏ maxB∏ Bucket B P(E|B,C) P(B|A) P(D|A,B) A B C Bucket C P(C|A) hB (C,E) E hB (A,D) Bucket D D Bucket E E = 0 hC (A,E) Bucket A P(A) hE (A) hD (A) MPE* is an upper bound on MPE --U Generating a solution yields a lower bound--L Mini-Bucket Elimination P(A) P(B|A) P(C|A) P(E|B,C) P(D|A,B)
MBE-MPE(i) Algorithm Approx-MPE (Dechter&Rish 1997) • Input: i – max number of variables allowed in a mini-bucket • Output: [lower bound (P of a sub-optimal solution), upper bound] Example: approx-mpe(3) versus elim-mpe
Properties of MBE(i) • Complexity: O(r exp(i)) time and O(exp(i)) space. • Yields an upper-bound and a lower-bound. • Accuracy: determined by upper/lower (U/L) bound. • As i increases, both accuracy and complexity increase. • Possible use of mini-bucket approximations: • As anytime algorithms • As heuristics in search • Other tasks: similar mini-bucket approximations for: belief updating, MAP and MEU (Dechter and Rish, 1997)
Outline • Introduction • Optimization tasks for graphical models • Solving by inference and search • Inference • Bucket elimination, dynamic programming • Mini-bucket elimination • Search • Branch and bound and best-first • Lower-bounding heuristics • AND/OR search spaces • Hybrids of search and inference • Cutset decomposition • Super-bucket scheme
C A F D B E A B C D E F The Search Space Objective function: 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
C A F D B E A B C D E F The Search Space 0 0 0 1 2 0 0 4 0 1 0 1 3 1 5 4 0 2 2 5 0 1 0 1 0 1 0 1 5 6 4 2 2 4 1 0 5 6 4 2 2 4 1 0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 3 5 3 5 3 5 3 5 1 3 1 3 1 3 1 3 5 2 5 2 5 2 5 2 3 0 3 0 3 0 3 0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 3 0 2 2 3 0 2 2 3 0 2 2 1 2 0 4 1 2 0 4 1 2 0 4 3 0 2 2 3 0 2 2 3 0 2 2 3 0 2 2 3 0 2 2 1 2 0 4 1 2 0 4 1 2 0 4 1 2 0 4 1 2 0 4 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 Arc-cost is calculated based on cost components.
C A F D B E A B C D E F The Value Function 5 0 0 5 7 0 1 2 0 0 4 6 5 7 4 0 1 0 1 3 1 5 4 0 2 2 5 8 5 3 1 7 4 2 0 0 1 0 1 0 1 0 1 5 6 4 2 2 4 1 0 5 6 4 2 2 4 1 0 3 3 3 3 1 1 1 1 2 2 2 2 0 0 0 0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 3 5 3 5 3 5 3 5 1 3 1 3 1 3 1 3 5 2 5 2 5 2 5 2 3 0 3 0 3 0 3 0 0 2 0 2 0 2 0 2 0 2 0 2 0 2 0 2 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 3 0 2 2 3 0 2 2 3 0 2 2 1 2 0 4 1 2 0 4 1 2 0 4 3 0 2 2 3 0 2 2 3 0 2 2 3 0 2 2 3 0 2 2 1 2 0 4 1 2 0 4 1 2 0 4 1 2 0 4 1 2 0 4 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 Value of node = minimal cost solution below it
C A F D B E A B C D E F An Optimal Solution 5 0 0 5 7 0 1 2 0 0 4 6 5 7 4 0 1 0 1 3 1 5 4 0 2 2 5 8 5 3 1 7 4 2 0 0 1 0 1 0 1 0 1 5 6 4 2 2 4 1 0 5 6 4 2 2 4 1 0 3 3 3 3 1 1 1 1 2 2 2 2 0 0 0 0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 3 5 3 5 3 5 3 5 1 3 1 3 1 3 1 3 5 2 5 2 5 2 5 2 3 0 3 0 3 0 3 0 0 2 0 2 0 2 0 2 0 2 0 2 0 2 0 2 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 3 0 2 2 3 0 2 2 3 0 2 2 1 2 0 4 1 2 0 4 1 2 0 4 3 0 2 2 3 0 2 2 3 0 2 2 3 0 2 2 3 0 2 2 1 2 0 4 1 2 0 4 1 2 0 4 1 2 0 4 1 2 0 4 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 Value of node = minimal cost solution below it
2.Best-First Search Always expand the node with the highest heuristic value f(xp). Needs lots of memory 1.Branch and Bound Use heuristic function f(xp) to prune the depth-first search tree. Linear space f L L Basic Heuristic Search Schemes Heuristic function f(x) computes a lower bound on the best extension of x and can be used to guide a heuristic search algorithm. We focus on
Classic Branch-and-Bound Upper Bound UB Lower Bound LB g(n) LB(n) = g(n) + h(n) n Prune if LB(n) ≥ UB h(n) OR Search Tree
How to Generate Heuristics • The principle of relaxed models • Linear optimization for integer programs • Mini-bucket elimination • Bounded directional consistency ideas
0 D 0 B E 0 D 1 A B 1 D E 1 Generating Heuristic for graphical models(Kask and Dechter, 1999) Given a cost function C(a,b,c,d,e) = f(a) • f(b,a) • f(c,a) • f(e,b,c) • P(d,b,a) Define an evaluation function over a partial assignment as the probability of it’s best extension D f*(a,e,d) = minb,c f(a,b,c,d,e) = = f(a) • minb,c f(b,a) • P(c,a) • P(e,b,c) • P(d,a,b) = g(a,e,d) • H*(a,e,d)
Generating Heuristics (cont.) H*(a,e,d) = minb,c f(b,a) • f(c,a) • f(e,b,c) • P(d,a,b) = minc [f(c,a)• minb [f(e,b,c) • f(b,a) • f(d,a,b)]] <=minc [f(c,a)• minb f(e,b,c) • minb [f(b,a) • f(d,a,b)]] = minb [f(b,a) • f(d,a,b)] • minc [f(c,a)• minb f(e,b,c)] = hB(d,a) • hC(e,a) = H(a,e,d) f(a,e,d) = g(a,e,d) • H(a,e,d)<=f*(a,e,d) The heuristic function H is what is compiled during the preprocessing stage of the Mini-Bucket algorithm.
Generating Heuristics (cont.) H*(a,e,d) = minb,c f(b,a) • f(c,a) • f(e,b,c) • P(d,a,b) = minc [f(c,a)• minb [f(e,b,c) • f(b,a) • f(d,a,b)]] >=minc [f(c,a)• minb f(e,b,c) • minb [f(b,a) • f(d,a,b)]] = minb [f(b,a) • f(d,a,b)] • minc [f(c,a)• minb f(e,b,c)] = hB(d,a) • hC(e,a) = H(a,e,d) f(a,e,d) = g(a,e,d) • H(a,e,d)<=f*(a,e,d) The heuristic function H is what is compiled during the preprocessing stage of the Mini-Bucket algorithm.
A B C E B: P(E|B,C) P(D|A,B) P(B|A) D C: P(C|A) hB(E,C) D: hB(D,A) 0 D B 0 E 0 E: hC(E,A) D 1 A B 1 A: P(A) hE(A) hD(A) D E 1 Belief Network f(a,e,D) = P(a) · hB(D,a) · hC(e,a) g h – is admissible Static MBE Heuristics • Given a partial assignment xp, estimate the cost of the best extension to a full solution • The evaluation function f(x^p) can be computed using function recorded by the Mini-Bucket scheme f(a,e,D))=g(a,e) · H(a,e,D )
Heuristics Properties • MB Heuristic is monotone, admissible • Retrieved in linear time • IMPORTANT: • Heuristic strength can vary by MB(i). • Higher i-bound more pre-processing stronger heuristic less search. • Allows controlled trade-off between preprocessing and search
Experimental Methodology Algorithms • BBMB(i) – Branch and Bound with MB(i) • BBFB(i) - Best-First with MB(i) • MBE(i) Test networks: • Random Coding (Bayesian) • CPCS (Bayesian) • Random (CSP) Measures of performance • Compare accuracy given a fixed amount of time - how close is the cost found to the optimal solution • Compare trade-off performance as a function of time
Random Coding, K=100, noise=0.28 Random Coding, K=100, noise=0.32 Empirical Evaluation of mini-bucket heuristics,Bayesian networks, coding
Dynamic MB Heuristics • Rather than pre-compiling, the mini-bucket heuristics can be generated during search • Dynamic mini-bucket heuristics use the Mini-Bucket algorithm to produce a bound for any node in the search space (a partial assignment, along the given variable ordering)
Dynamic MB and MBTE Heuristics(Kask, Marinescu and Dechter, 2003) • Rather than precompile compute the heuristics during search • Dynamic MB: Dynamic mini-bucket heuristics use the Mini-Bucket algorithm to produce a bound for any node during search • Dynamic MBTE: We can compute heuristics simultaneously for all un-instantiated variables using mini-bucket-tree elimination . • MBTE is an approximation scheme defined over cluster-trees. It outputs multiple bounds for each variable and value extension at once.
A B E C D F G Cluster Tree Elimination - example ABC 1 BC BCDF 2 BF BEF 3 EF EFG 4
Mini-Clustering • Motivation: • Time and space complexity of Cluster Tree Elimination depend on the induced width w* of the problem • When the induced width w* is big, CTE algorithm becomes infeasible • The basic idea: • Try to reduce the size of the cluster (the exponent); partition each cluster into mini-clusters with less variables • Accuracy parameter i = maximum number of variables in a mini-cluster • The idea was explored for variable elimination (Mini-Bucket)
Idea of Mini-Clustering Split a cluster into mini-clusters => bound complexity
Mini-Clustering - example ABC 1 BC BCDF 2 BF BEF 3 EF EFG 4
Mini Bucket Tree Elimination ABC ABC 1 1 BC BC BCDF BCDF 2 2 BF BF BEF BEF 3 3 EF EF EFG 4 EFG 4
Mini-Clustering • Correctness and completeness: Algorithm MC(i) computes a bound (or an approximation) for each variable and each of its values. • MBTE: when the clusters are buckets in BTE.
Branch and Bound w/ Mini-Buckets • BB with static Mini-Bucket Heuristics (s-BBMB) • Heuristic information is pre-compiled before search. Static variable ordering, prunes current variable • BB with dynamic Mini-Bucket Heuristics (d-BBMB) • Heuristic information is assembled during search. Static variable ordering, prunes current variable • BB with dynamic Mini-Bucket-Tree Heuristics (BBBT) • Heuristic information is assembled during search. Dynamic variable ordering, prunes all future variables
Empirical Evaluation • Measures: • Time • Accuracy (% exact) • #Backtracks • Bit Error Rate (coding) • Algorithms: • Complete • BBBT • BBMB • Incomplete • DLM • GLS • SLS • IJGP • IBP (coding) • Benchmarks: • Coding networks • Bayesian Network Repository • Grid networks (N-by-N) • Random noisy-OR networks • Random networks
Real World Benchmarks Average Accuracy and Time. 30 samples, 10 observations, 30 seconds
Empirical Results: Max-CSP • Random Binary Problems: <N, K, C, T> • N: number of variables • K: domain size • C: number of constraints • T: Tightness • Task: Max-CSP