510 likes | 637 Views
Chapter 13. Constraint Optimization And counting, and enumeration 275 class. Outline. Introduction Optimization tasks for graphical models Solving optimization problems with inference and search Inference Bucket elimination, dynamic programming Mini-bucket elimination Search
E N D
Chapter 13 Constraint Optimization And counting, and enumeration 275 class
Outline • Introduction • Optimization tasks for graphical models • Solving optimization problems with inference and search • Inference • Bucket elimination, dynamic programming • Mini-bucket elimination • Search • Branch and bound and best-first • Lower-bounding heuristics • AND/OR search spaces • Hybrids of search and inference • Cutset decomposition • Super-bucket scheme
E A B red green red yellow green red green yellow yellow green yellow red A D B F G C Constraint Satisfaction Example: map coloring Variables - countries (A,B,C,etc.) Values - colors (e.g., red, green, yellow) Constraints: Task: consistency? Find a solution, all solutions, counting
Propositional Satisfiability = {(¬C),(A v B v C),(¬A v B v E), (¬B v C v D)}.
Constraint Optimization Problemsfor Graphical Models f(A,B,D) has scope {A,B,D}
A B C D F Constraint Optimization Problemsfor Graphical Models f(A,B,D) has scope {A,B,D} • Primal graph = • Variables --> nodes • Functions, Constraints - arcs • f1(A,B,D) • f2(D,F,G) • f3(B,C,F) F(a,b,c,d,f,g)= f1(a,b,d)+f2(d,f,g)+f3(b,c,f) G
Constrained Optimization Example: power plant scheduling
P(S) Smoking P(C|S) P(B|S) Bronchitis Cancer Dyspnoea P(D|C,B) P(X|C,S) X-Ray Probabilistic Networks P(D|C,B) P(S,C,B,X,D) = P(S)· P(C|S)·P(B|S)·P(X|C,S)·P(D|C,B)
Outline • Introduction • Optimization tasks for graphical models • Solving by inference and search • Inference • Bucket elimination, dynamic programming, tree-clustering, bucket-elimination • Mini-bucket elimination, belief propagation • Search • Branch and bound and best-first • Lower-bounding heuristics • AND/OR search spaces • Hybrids of search and inference • Cutset decomposition • Super-bucket scheme
A B C D E “Moral” graph P(a)P(b|a)P(c|a)P(d|b,a)P(e|b,c)= P(c|a) P(b|a)P(d|b,a)P(e|b,c) Variable Elimination Computing MPE MPE= B C D E P(a)
bucket B: P(b|a) P(d|b,a) P(e|b,c) B bucket C: P(c|a) C bucket D: D bucket E: e=0 E P(a) bucket A: A MPE Finding Algorithm elim-mpe (Dechter 1996)Non-serial Dynamic Programming (Bertele and Briochi, 1973) Elimination operator
B: P(b|a) P(d|b,a) P(e|b,c) C: P(c|a) D: E: e=0 A: P(a) Generating the MPE-tuple
bucket B: P(b|a) P(d|b,a) P(e|b,c) B bucket C: P(c|a) C bucket D: D bucket E: e=0 E P(a) bucket A: A MPE exp(W*=4) ”induced width” (max clique size) Complexity Algorithm elim-mpe (Dechter 1996)Non-serial Dynamic Programming (Bertele and Briochi, 1973) Elimination operator
A B C D E constraint graph B E C D D C E B A A Complexity of bucket elimination Bucket-elimination is time and space r = number of functions The effect of the ordering: Finding smallest induced-width is hard
E E E E D C D D D C C C B B B B A Directional i-consistency Adaptive d-path d-arc
Mini-bucket approximation: MPE task Split a bucket into mini-buckets =>bound complexity
maxB∏ maxB∏ Bucket B P(E|B,C) P(B|A) P(D|A,B) A B C Bucket C P(C|A) hB (C,E) E hB (A,D) Bucket D D Bucket E E = 0 hC (A,E) Bucket A P(A) hE (A) hD (A) MPE* is an upper bound on MPE --U Generating a solution yields a lower bound--L Mini-Bucket Elimination P(A) P(B|A) P(C|A) P(E|B,C) P(D|A,B)
MBE-MPE(i) Algorithm Approx-MPE (Dechter&Rish 1997) • Input: i – max number of variables allowed in a mini-bucket • Output: [lower bound (cost of a sub-optimal solution), upper bound] Example: approx-mpe(3) versus elim-mpe
Properties of MBE(i) • Complexity: O(r exp(i)) time and O(exp(i)) space. • Yields an upper-bound and a lower-bound. • Accuracy: determined by upper/lower (U/L) bound. • As i increases, both accuracy and complexity increase. • Possible use of mini-bucket approximations: • As anytime algorithms • As heuristics in search • Other tasks: similar mini-bucket approximations for: belief updating, MAP and MEU (Dechter and Rish, 1997)
Outline • Introduction • Optimization tasks for graphical models • Solving by inference and search • Inference • Bucket elimination, dynamic programming • Mini-bucket elimination • Search • Branch and bound and best-first • Lower-bounding heuristics • AND/OR search spaces • Hybrids of search and inference • Cutset decomposition • Super-bucket scheme
C A F D B E A B C D E F The Search Space Objective function: 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
C A F D B E A B C D E F The Search Space 0 0 0 1 2 0 0 4 0 1 0 1 3 1 5 4 0 2 2 5 0 1 0 1 0 1 0 1 5 6 4 2 2 4 1 0 5 6 4 2 2 4 1 0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 3 5 3 5 3 5 3 5 1 3 1 3 1 3 1 3 5 2 5 2 5 2 5 2 3 0 3 0 3 0 3 0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 3 0 2 2 3 0 2 2 3 0 2 2 1 2 0 4 1 2 0 4 1 2 0 4 3 0 2 2 3 0 2 2 3 0 2 2 3 0 2 2 3 0 2 2 1 2 0 4 1 2 0 4 1 2 0 4 1 2 0 4 1 2 0 4 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 Arc-cost is calculated based on cost components.
C A F D B E A B C D E F The Value Function 5 0 0 5 7 0 1 2 0 0 4 6 5 7 4 0 1 0 1 3 1 5 4 0 2 2 5 8 5 3 1 7 4 2 0 0 1 0 1 0 1 0 1 5 6 4 2 2 4 1 0 5 6 4 2 2 4 1 0 3 3 3 3 1 1 1 1 2 2 2 2 0 0 0 0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 3 5 3 5 3 5 3 5 1 3 1 3 1 3 1 3 5 2 5 2 5 2 5 2 3 0 3 0 3 0 3 0 0 2 0 2 0 2 0 2 0 2 0 2 0 2 0 2 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 3 0 2 2 3 0 2 2 3 0 2 2 1 2 0 4 1 2 0 4 1 2 0 4 3 0 2 2 3 0 2 2 3 0 2 2 3 0 2 2 3 0 2 2 1 2 0 4 1 2 0 4 1 2 0 4 1 2 0 4 1 2 0 4 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 Value of node = minimal cost solution below it
C A F D B E A B C D E F An Optimal Solution 5 0 0 5 7 0 1 2 0 0 4 6 5 7 4 0 1 0 1 3 1 5 4 0 2 2 5 8 5 3 1 7 4 2 0 0 1 0 1 0 1 0 1 5 6 4 2 2 4 1 0 5 6 4 2 2 4 1 0 3 3 3 3 1 1 1 1 2 2 2 2 0 0 0 0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 3 5 3 5 3 5 3 5 1 3 1 3 1 3 1 3 5 2 5 2 5 2 5 2 3 0 3 0 3 0 3 0 0 2 0 2 0 2 0 2 0 2 0 2 0 2 0 2 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 3 0 2 2 3 0 2 2 3 0 2 2 1 2 0 4 1 2 0 4 1 2 0 4 3 0 2 2 3 0 2 2 3 0 2 2 3 0 2 2 3 0 2 2 1 2 0 4 1 2 0 4 1 2 0 4 1 2 0 4 1 2 0 4 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 Value of node = minimal cost solution below it
2.Best-First Search Always expand the node with the highest heuristic value f(xp). Needs lots of memory 1.Branch and Bound Use heuristic function f(xp) to prune the depth-first search tree. Linear space f L L Basic Heuristic Search Schemes Heuristic function f(x) computes a lower bound on the best extension of x and can be used to guide a heuristic search algorithm. We focus on
Classic Branch-and-Bound Upper Bound UB Lower Bound LB g(n) LB(n) = g(n) + h(n) n Prune if LB(n) ≥ UB h(n) OR Search Tree
How to Generate Heuristics • The principle of relaxed models • Linear optimization for integer programs • Mini-bucket elimination • Bounded directional consistency ideas
0 D 0 B E 0 D 1 A B 1 D E 1 Generating Heuristic for graphical models(Kask and Dechter, 1999) Given a cost function C(a,b,c,d,e) = f(a) • f(b,a) • f(c,a) • f(e,b,c) • P(d,b,a) Define an evaluation function over a partial assignment as the probability of it’s best extension D f*(a,e,d) = minb,c f(a,b,c,d,e) = = f(a) • minb,c f(b,a) • P(c,a) • P(e,b,c) • P(d,a,b) = g(a,e,d) • H*(a,e,d)
Generating Heuristics (cont.) H*(a,e,d) = minb,c f(b,a) • f(c,a) • f(e,b,c) • P(d,a,b) = minc [f(c,a)• minb [f(e,b,c) • f(b,a) • f(d,a,b)]] <=minc [f(c,a)• minb f(e,b,c) • minb [f(b,a) • f(d,a,b)]] = minb [f(b,a) • f(d,a,b)] • minc [f(c,a)• minb f(e,b,c)] = hB(d,a) • hC(e,a) = H(a,e,d) f(a,e,d) = g(a,e,d) • H(a,e,d)<=f*(a,e,d) The heuristic function H is what is compiled during the preprocessing stage of the Mini-Bucket algorithm.
Generating Heuristics (cont.) H*(a,e,d) = minb,c f(b,a) • f(c,a) • f(e,b,c) • P(d,a,b) = minc [f(c,a)• minb [f(e,b,c) • f(b,a) • f(d,a,b)]] >=minc [f(c,a)• minb f(e,b,c) • minb [f(b,a) • f(d,a,b)]] = minb [f(b,a) • f(d,a,b)] • minc [f(c,a)• minb f(e,b,c)] = hB(d,a) • hC(e,a) = H(a,e,d) f(a,e,d) = g(a,e,d) • H(a,e,d)<=f*(a,e,d) The heuristic function H is what is compiled during the preprocessing stage of the Mini-Bucket algorithm.
A B C E B: P(E|B,C) P(D|A,B) P(B|A) D C: P(C|A) hB(E,C) D: hB(D,A) 0 D B 0 E 0 E: hC(E,A) D 1 A B 1 A: P(A) hE(A) hD(A) D E 1 Belief Network f(a,e,D) = P(a) · hB(D,a) · hC(e,a) g h – is admissible Static MBE Heuristics • Given a partial assignment xp, estimate the cost of the best extension to a full solution • The evaluation function f(x^p) can be computed using function recorded by the Mini-Bucket scheme f(a,e,D))=g(a,e) · H(a,e,D )
Heuristics Properties • MB Heuristic is monotone, admissible • Retrieved in linear time • IMPORTANT: • Heuristic strength can vary by MB(i). • Higher i-bound more pre-processing stronger heuristic less search. • Allows controlled trade-off between preprocessing and search
Experimental Methodology Algorithms • BBMB(i) – Branch and Bound with MB(i) • BBFB(i) - Best-First with MB(i) • MBE(i) Test networks: • Random Coding (Bayesian) • CPCS (Bayesian) • Random (CSP) Measures of performance • Compare accuracy given a fixed amount of time - how close is the cost found to the optimal solution • Compare trade-off performance as a function of time
Random Coding, K=100, noise=0.28 Random Coding, K=100, noise=0.32 Empirical Evaluation of mini-bucket heuristics,Bayesian networks, coding
Dynamic MB Heuristics • Rather than pre-compiling, the mini-bucket heuristics can be generated during search • Dynamic mini-bucket heuristics use the Mini-Bucket algorithm to produce a bound for any node in the search space (a partial assignment, along the given variable ordering)
Dynamic MB and MBTE Heuristics(Kask, Marinescu and Dechter, 2003) • Rather than precompile compute the heuristics during search • Dynamic MB: Dynamic mini-bucket heuristics use the Mini-Bucket algorithm to produce a bound for any node during search • Dynamic MBTE: We can compute heuristics simultaneously for all un-instantiated variables using mini-bucket-tree elimination . • MBTE is an approximation scheme defined over cluster-trees. It outputs multiple bounds for each variable and value extension at once.
A B E C D F G Cluster Tree Elimination - example ABC 1 BC BCDF 2 BF BEF 3 EF EFG 4
Mini-Clustering • Motivation: • Time and space complexity of Cluster Tree Elimination depend on the induced width w* of the problem • When the induced width w* is big, CTE algorithm becomes infeasible • The basic idea: • Try to reduce the size of the cluster (the exponent); partition each cluster into mini-clusters with less variables • Accuracy parameter i = maximum number of variables in a mini-cluster • The idea was explored for variable elimination (Mini-Bucket)
Idea of Mini-Clustering Split a cluster into mini-clusters => bound complexity
Mini-Clustering - example ABC 1 BC BCDF 2 BF BEF 3 EF EFG 4
Mini Bucket Tree Elimination ABC ABC 1 1 BC BC BCDF BCDF 2 2 BF BF BEF BEF 3 3 EF EF EFG 4 EFG 4
Mini-Clustering • Correctness and completeness: Algorithm MC(i) computes a bound (or an approximation) for each variable and each of its values. • MBTE: when the clusters are buckets in BTE.
Branch and Bound w/ Mini-Buckets • BB with static Mini-Bucket Heuristics (s-BBMB) • Heuristic information is pre-compiled before search. Static variable ordering, prunes current variable • BB with dynamic Mini-Bucket Heuristics (d-BBMB) • Heuristic information is assembled during search. Static variable ordering, prunes current variable • BB with dynamic Mini-Bucket-Tree Heuristics (BBBT) • Heuristic information is assembled during search. Dynamic variable ordering, prunes all future variables
Empirical Evaluation • Measures: • Time • Accuracy (% exact) • #Backtracks • Bit Error Rate (coding) • Algorithms: • Complete • BBBT • BBMB • Incomplete • DLM • GLS • SLS • IJGP • IBP (coding) • Benchmarks: • Coding networks • Bayesian Network Repository • Grid networks (N-by-N) • Random noisy-OR networks • Random networks
Real World Benchmarks Average Accuracy and Time. 30 samples, 10 observations, 30 seconds
Empirical Results: Max-CSP • Random Binary Problems: <N, K, C, T> • N: number of variables • K: domain size • C: number of constraints • T: Tightness • Task: Max-CSP
i=2 i=3 i=4 i=6 i=7 i=5 i=2 BBBT(i) vs BBMB(i), N=100 BBBT(i) vs. BBMB(i).
C A F D B E Searching the Graph; caching goods 5 0 0 0 1 A context(A) = [A] 2 0 0 4 B context(B) = [AB] 0 1 0 1 3 1 5 4 0 2 2 5 C context(C) = [ABC] 0 1 0 1 0 1 0 1 6 4 4 1 6 4 4 D context(D) = [ABD] 5 2 2 0 5 2 2 0 0 1 0 1 0 1 0 1 5 2 1 3 5 1 2 3 E context(E) = [AE] 3 0 3 5 3 5 3 0 0 1 0 1 F context(F) = [F] 0 0 2 1 3 4 2 2 0 1
C A F D B E Searching the Graph; caching goods 5 0 0 5 7 0 1 A context(A) = [A] 2 0 0 4 B context(B) = [AB] 6 5 7 4 0 1 0 1 3 1 5 4 0 2 2 5 C context(C) = [ABC] 8 5 3 1 7 4 2 0 0 1 0 1 0 1 0 1 6 4 4 1 6 4 4 1 D context(D) = [ABD] 5 2 2 0 5 2 2 0 3 3 1 1 2 2 0 0 0 1 0 1 0 1 0 1 5 2 1 3 5 1 2 3 E context(E) = [AE] 3 0 3 5 3 5 3 0 0 2 1 0 0 1 0 1 F context(F) = [F] 0 0 2 1 3 4 2 2 0 1