1 / 51

Chapter 13

Chapter 13. Constraint Optimization And counting, and enumeration 275 class. Outline. Introduction Optimization tasks for graphical models Solving optimization problems with inference and search Inference Bucket elimination, dynamic programming Mini-bucket elimination Search

elsie
Download Presentation

Chapter 13

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 13 Constraint Optimization And counting, and enumeration 275 class

  2. Outline • Introduction • Optimization tasks for graphical models • Solving optimization problems with inference and search • Inference • Bucket elimination, dynamic programming • Mini-bucket elimination • Search • Branch and bound and best-first • Lower-bounding heuristics • AND/OR search spaces • Hybrids of search and inference • Cutset decomposition • Super-bucket scheme

  3. E A B red green red yellow green red green yellow yellow green yellow red A D B F G C Constraint Satisfaction Example: map coloring Variables - countries (A,B,C,etc.) Values - colors (e.g., red, green, yellow) Constraints: Task: consistency? Find a solution, all solutions, counting

  4. Propositional Satisfiability  = {(¬C),(A v B v C),(¬A v B v E), (¬B v C v D)}.

  5. Constraint Optimization Problemsfor Graphical Models f(A,B,D) has scope {A,B,D}

  6. A B C D F Constraint Optimization Problemsfor Graphical Models f(A,B,D) has scope {A,B,D} • Primal graph = • Variables --> nodes • Functions, Constraints - arcs • f1(A,B,D) • f2(D,F,G) • f3(B,C,F) F(a,b,c,d,f,g)= f1(a,b,d)+f2(d,f,g)+f3(b,c,f) G

  7. Constrained Optimization Example: power plant scheduling

  8. P(S) Smoking P(C|S) P(B|S) Bronchitis Cancer Dyspnoea P(D|C,B) P(X|C,S) X-Ray Probabilistic Networks P(D|C,B) P(S,C,B,X,D) = P(S)· P(C|S)·P(B|S)·P(X|C,S)·P(D|C,B)

  9. Outline • Introduction • Optimization tasks for graphical models • Solving by inference and search • Inference • Bucket elimination, dynamic programming, tree-clustering, bucket-elimination • Mini-bucket elimination, belief propagation • Search • Branch and bound and best-first • Lower-bounding heuristics • AND/OR search spaces • Hybrids of search and inference • Cutset decomposition • Super-bucket scheme

  10. A B C D E “Moral” graph P(a)P(b|a)P(c|a)P(d|b,a)P(e|b,c)= P(c|a) P(b|a)P(d|b,a)P(e|b,c) Variable Elimination Computing MPE MPE= B C D E P(a)

  11. bucket B: P(b|a) P(d|b,a) P(e|b,c) B bucket C: P(c|a) C bucket D: D bucket E: e=0 E P(a) bucket A: A MPE Finding Algorithm elim-mpe (Dechter 1996)Non-serial Dynamic Programming (Bertele and Briochi, 1973) Elimination operator

  12. B: P(b|a) P(d|b,a) P(e|b,c) C: P(c|a) D: E: e=0 A: P(a) Generating the MPE-tuple

  13. bucket B: P(b|a) P(d|b,a) P(e|b,c) B bucket C: P(c|a) C bucket D: D bucket E: e=0 E P(a) bucket A: A MPE exp(W*=4) ”induced width” (max clique size) Complexity Algorithm elim-mpe (Dechter 1996)Non-serial Dynamic Programming (Bertele and Briochi, 1973) Elimination operator

  14. A B C D E constraint graph B E C D D C E B A A Complexity of bucket elimination Bucket-elimination is time and space r = number of functions The effect of the ordering: Finding smallest induced-width is hard

  15. E E E E D C D D D C C C B B B B A Directional i-consistency Adaptive d-path d-arc

  16. Mini-bucket approximation: MPE task Split a bucket into mini-buckets =>bound complexity

  17. maxB∏ maxB∏ Bucket B P(E|B,C) P(B|A) P(D|A,B) A B C Bucket C P(C|A) hB (C,E) E hB (A,D) Bucket D D Bucket E E = 0 hC (A,E) Bucket A P(A) hE (A) hD (A) MPE* is an upper bound on MPE --U Generating a solution yields a lower bound--L Mini-Bucket Elimination P(A) P(B|A) P(C|A) P(E|B,C) P(D|A,B)

  18. MBE-MPE(i) Algorithm Approx-MPE (Dechter&Rish 1997) • Input: i – max number of variables allowed in a mini-bucket • Output: [lower bound (cost of a sub-optimal solution), upper bound] Example: approx-mpe(3) versus elim-mpe

  19. Properties of MBE(i) • Complexity: O(r exp(i)) time and O(exp(i)) space. • Yields an upper-bound and a lower-bound. • Accuracy: determined by upper/lower (U/L) bound. • As i increases, both accuracy and complexity increase. • Possible use of mini-bucket approximations: • As anytime algorithms • As heuristics in search • Other tasks: similar mini-bucket approximations for: belief updating, MAP and MEU (Dechter and Rish, 1997)

  20. Outline • Introduction • Optimization tasks for graphical models • Solving by inference and search • Inference • Bucket elimination, dynamic programming • Mini-bucket elimination • Search • Branch and bound and best-first • Lower-bounding heuristics • AND/OR search spaces • Hybrids of search and inference • Cutset decomposition • Super-bucket scheme

  21. C A F D B E A B C D E F The Search Space Objective function: 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

  22. C A F D B E A B C D E F The Search Space 0 0 0 1 2 0 0 4 0 1 0 1 3 1 5 4 0 2 2 5 0 1 0 1 0 1 0 1 5 6 4 2 2 4 1 0 5 6 4 2 2 4 1 0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 3 5 3 5 3 5 3 5 1 3 1 3 1 3 1 3 5 2 5 2 5 2 5 2 3 0 3 0 3 0 3 0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 3 0 2 2 3 0 2 2 3 0 2 2 1 2 0 4 1 2 0 4 1 2 0 4 3 0 2 2 3 0 2 2 3 0 2 2 3 0 2 2 3 0 2 2 1 2 0 4 1 2 0 4 1 2 0 4 1 2 0 4 1 2 0 4 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 Arc-cost is calculated based on cost components.

  23. C A F D B E A B C D E F The Value Function 5 0 0 5 7 0 1 2 0 0 4 6 5 7 4 0 1 0 1 3 1 5 4 0 2 2 5 8 5 3 1 7 4 2 0 0 1 0 1 0 1 0 1 5 6 4 2 2 4 1 0 5 6 4 2 2 4 1 0 3 3 3 3 1 1 1 1 2 2 2 2 0 0 0 0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 3 5 3 5 3 5 3 5 1 3 1 3 1 3 1 3 5 2 5 2 5 2 5 2 3 0 3 0 3 0 3 0 0 2 0 2 0 2 0 2 0 2 0 2 0 2 0 2 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 3 0 2 2 3 0 2 2 3 0 2 2 1 2 0 4 1 2 0 4 1 2 0 4 3 0 2 2 3 0 2 2 3 0 2 2 3 0 2 2 3 0 2 2 1 2 0 4 1 2 0 4 1 2 0 4 1 2 0 4 1 2 0 4 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 Value of node = minimal cost solution below it

  24. C A F D B E A B C D E F An Optimal Solution 5 0 0 5 7 0 1 2 0 0 4 6 5 7 4 0 1 0 1 3 1 5 4 0 2 2 5 8 5 3 1 7 4 2 0 0 1 0 1 0 1 0 1 5 6 4 2 2 4 1 0 5 6 4 2 2 4 1 0 3 3 3 3 1 1 1 1 2 2 2 2 0 0 0 0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 3 5 3 5 3 5 3 5 1 3 1 3 1 3 1 3 5 2 5 2 5 2 5 2 3 0 3 0 3 0 3 0 0 2 0 2 0 2 0 2 0 2 0 2 0 2 0 2 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 3 0 2 2 3 0 2 2 3 0 2 2 1 2 0 4 1 2 0 4 1 2 0 4 3 0 2 2 3 0 2 2 3 0 2 2 3 0 2 2 3 0 2 2 1 2 0 4 1 2 0 4 1 2 0 4 1 2 0 4 1 2 0 4 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 Value of node = minimal cost solution below it

  25. 2.Best-First Search Always expand the node with the highest heuristic value f(xp). Needs lots of memory 1.Branch and Bound Use heuristic function f(xp) to prune the depth-first search tree. Linear space f  L L Basic Heuristic Search Schemes Heuristic function f(x) computes a lower bound on the best extension of x and can be used to guide a heuristic search algorithm. We focus on

  26. Classic Branch-and-Bound Upper Bound UB Lower Bound LB g(n) LB(n) = g(n) + h(n) n Prune if LB(n) ≥ UB h(n) OR Search Tree

  27. How to Generate Heuristics • The principle of relaxed models • Linear optimization for integer programs • Mini-bucket elimination • Bounded directional consistency ideas

  28. 0 D 0 B E 0 D 1 A B 1 D E 1 Generating Heuristic for graphical models(Kask and Dechter, 1999) Given a cost function C(a,b,c,d,e) = f(a) • f(b,a) • f(c,a) • f(e,b,c) • P(d,b,a) Define an evaluation function over a partial assignment as the probability of it’s best extension D f*(a,e,d) = minb,c f(a,b,c,d,e) = = f(a) • minb,c f(b,a) • P(c,a) • P(e,b,c) • P(d,a,b) = g(a,e,d) • H*(a,e,d)

  29. Generating Heuristics (cont.) H*(a,e,d) = minb,c f(b,a) • f(c,a) • f(e,b,c) • P(d,a,b) = minc [f(c,a)• minb [f(e,b,c) • f(b,a) • f(d,a,b)]] <=minc [f(c,a)• minb f(e,b,c) • minb [f(b,a) • f(d,a,b)]] = minb [f(b,a) • f(d,a,b)] • minc [f(c,a)• minb f(e,b,c)] = hB(d,a) • hC(e,a) = H(a,e,d) f(a,e,d) = g(a,e,d) • H(a,e,d)<=f*(a,e,d) The heuristic function H is what is compiled during the preprocessing stage of the Mini-Bucket algorithm.

  30. Generating Heuristics (cont.) H*(a,e,d) = minb,c f(b,a) • f(c,a) • f(e,b,c) • P(d,a,b) = minc [f(c,a)• minb [f(e,b,c) • f(b,a) • f(d,a,b)]] >=minc [f(c,a)• minb f(e,b,c) • minb [f(b,a) • f(d,a,b)]] = minb [f(b,a) • f(d,a,b)] • minc [f(c,a)• minb f(e,b,c)] = hB(d,a) • hC(e,a) = H(a,e,d) f(a,e,d) = g(a,e,d) • H(a,e,d)<=f*(a,e,d) The heuristic function H is what is compiled during the preprocessing stage of the Mini-Bucket algorithm.

  31. A B C E B: P(E|B,C) P(D|A,B) P(B|A) D C: P(C|A) hB(E,C) D: hB(D,A) 0 D B 0 E 0 E: hC(E,A) D 1 A B 1 A: P(A) hE(A) hD(A) D E 1 Belief Network f(a,e,D) = P(a) · hB(D,a) · hC(e,a) g h – is admissible Static MBE Heuristics • Given a partial assignment xp, estimate the cost of the best extension to a full solution • The evaluation function f(x^p) can be computed using function recorded by the Mini-Bucket scheme f(a,e,D))=g(a,e) · H(a,e,D )

  32. Heuristics Properties • MB Heuristic is monotone, admissible • Retrieved in linear time • IMPORTANT: • Heuristic strength can vary by MB(i). • Higher i-bound  more pre-processing  stronger heuristic  less search. • Allows controlled trade-off between preprocessing and search

  33. Experimental Methodology Algorithms • BBMB(i) – Branch and Bound with MB(i) • BBFB(i) - Best-First with MB(i) • MBE(i) Test networks: • Random Coding (Bayesian) • CPCS (Bayesian) • Random (CSP) Measures of performance • Compare accuracy given a fixed amount of time - how close is the cost found to the optimal solution • Compare trade-off performance as a function of time

  34. Random Coding, K=100, noise=0.28 Random Coding, K=100, noise=0.32 Empirical Evaluation of mini-bucket heuristics,Bayesian networks, coding

  35. Max-CSP experiments(Kask and Dechter, 2000)

  36. Dynamic MB Heuristics • Rather than pre-compiling, the mini-bucket heuristics can be generated during search • Dynamic mini-bucket heuristics use the Mini-Bucket algorithm to produce a bound for any node in the search space (a partial assignment, along the given variable ordering)

  37. Dynamic MB and MBTE Heuristics(Kask, Marinescu and Dechter, 2003) • Rather than precompile compute the heuristics during search • Dynamic MB: Dynamic mini-bucket heuristics use the Mini-Bucket algorithm to produce a bound for any node during search • Dynamic MBTE: We can compute heuristics simultaneously for all un-instantiated variables using mini-bucket-tree elimination . • MBTE is an approximation scheme defined over cluster-trees. It outputs multiple bounds for each variable and value extension at once.

  38. A B E C D F G Cluster Tree Elimination - example ABC 1 BC BCDF 2 BF BEF 3 EF EFG 4

  39. Mini-Clustering • Motivation: • Time and space complexity of Cluster Tree Elimination depend on the induced width w* of the problem • When the induced width w* is big, CTE algorithm becomes infeasible • The basic idea: • Try to reduce the size of the cluster (the exponent); partition each cluster into mini-clusters with less variables • Accuracy parameter i = maximum number of variables in a mini-cluster • The idea was explored for variable elimination (Mini-Bucket)

  40. Idea of Mini-Clustering Split a cluster into mini-clusters => bound complexity

  41. Mini-Clustering - example ABC 1 BC BCDF 2 BF BEF 3 EF EFG 4

  42. Mini Bucket Tree Elimination ABC ABC 1 1 BC BC BCDF BCDF 2 2 BF BF BEF BEF 3 3 EF EF EFG 4 EFG 4

  43. Mini-Clustering • Correctness and completeness: Algorithm MC(i) computes a bound (or an approximation) for each variable and each of its values. • MBTE: when the clusters are buckets in BTE.

  44. Branch and Bound w/ Mini-Buckets • BB with static Mini-Bucket Heuristics (s-BBMB) • Heuristic information is pre-compiled before search. Static variable ordering, prunes current variable • BB with dynamic Mini-Bucket Heuristics (d-BBMB) • Heuristic information is assembled during search. Static variable ordering, prunes current variable • BB with dynamic Mini-Bucket-Tree Heuristics (BBBT) • Heuristic information is assembled during search. Dynamic variable ordering, prunes all future variables

  45. Empirical Evaluation • Measures: • Time • Accuracy (% exact) • #Backtracks • Bit Error Rate (coding) • Algorithms: • Complete • BBBT • BBMB • Incomplete • DLM • GLS • SLS • IJGP • IBP (coding) • Benchmarks: • Coding networks • Bayesian Network Repository • Grid networks (N-by-N) • Random noisy-OR networks • Random networks

  46. Real World Benchmarks Average Accuracy and Time. 30 samples, 10 observations, 30 seconds

  47. Empirical Results: Max-CSP • Random Binary Problems: <N, K, C, T> • N: number of variables • K: domain size • C: number of constraints • T: Tightness • Task: Max-CSP

  48. i=2 i=3 i=4 i=6 i=7 i=5 i=2 BBBT(i) vs BBMB(i), N=100 BBBT(i) vs. BBMB(i).

  49. C A F D B E Searching the Graph; caching goods 5 0 0 0 1 A context(A) = [A] 2 0 0 4 B context(B) = [AB] 0 1 0 1 3 1 5 4 0 2 2 5 C context(C) = [ABC] 0 1 0 1 0 1 0 1 6 4 4 1 6 4 4 D context(D) = [ABD] 5 2 2 0 5 2 2 0 0 1 0 1 0 1 0 1 5 2 1 3 5 1 2 3 E context(E) = [AE] 3 0 3 5 3 5 3 0 0 1 0 1 F context(F) = [F] 0 0 2 1 3 4 2 2 0 1

  50. C A F D B E Searching the Graph; caching goods 5 0 0 5 7 0 1 A context(A) = [A] 2 0 0 4 B context(B) = [AB] 6 5 7 4 0 1 0 1 3 1 5 4 0 2 2 5 C context(C) = [ABC] 8 5 3 1 7 4 2 0 0 1 0 1 0 1 0 1 6 4 4 1 6 4 4 1 D context(D) = [ABD] 5 2 2 0 5 2 2 0 3 3 1 1 2 2 0 0 0 1 0 1 0 1 0 1 5 2 1 3 5 1 2 3 E context(E) = [AE] 3 0 3 5 3 5 3 0 0 2 1 0 0 1 0 1 F context(F) = [F] 0 0 2 1 3 4 2 2 0 1

More Related