1.91k likes | 2.08k Views
Combinatorial Optimization for Graphical Models. Rina Dechter Donald Bren School of Computer Science University of California, Irvine, USA. Radu Marinescu Cork Constraint Computation Centre University College Cork, Ireland. Simon de Givry & Thomas Schiex
E N D
Combinatorial Optimization for Graphical Models Rina Dechter Donald Bren School of Computer Science University of California, Irvine, USA Radu Marinescu Cork Constraint Computation Centre University College Cork, Ireland Simon de Givry & Thomas Schiex Dept. de Mathématique et Informatique Appliquées INRA, Toulouse, France
Outline • Introduction • Graphical models • Optimization tasks for graphical models • Fundamental operations on cost functions • Inference • Bucket Elimination, Bucket-Tree and Cluster-Tree Elimination • Polynomial classes based on structure • Search (OR) • Branch-and-Bound and Best-First Search • Local search and Partial search • Lower-bounds and relaxations • Bounded inference, local consistency and continuous relaxations • Exploiting problem structure in search • AND/OR search spaces (trees, graphs) • Other hybrids of search and inference • Cutset decomposition, boosting search with VE, super-bucket scheme • Software & Applications
Outline • Introduction • Graphical models • Optimization tasks for graphical models • Solving optimization problems by inference and search • Fundamental operations on cost functions • Inference • Search (OR) • Lower-bounds and relaxations • Exploiting problem structure in search • Other hybrids of search and inference • Software & Applications
A B C D F G Constraint Optimization Problemsfor Graphical Models f(A,B,D) has scope {A,B,D} • Primal graph = • Variables --> nodes • Functions, Constraints - arcs F(a,b,c,d,f,g)= f1(a,b,d)+f2(d,f,g)+f3(b,c,f)
Constraint graph E A A B red green red yellow green red green yellow yellow green yellow red A E D B D F B G F C G C Constraint Networks Map coloring Variables: countries (A B C etc.) Values: colors (redgreenblue) Constraints:
ConstrainedOptimization Example: power plant scheduling
P(S) Smoking P(C|S) P(B|S) Bronchitis Cancer Dyspnoea P(D|C,B) P(X|C,S) X-Ray Probabilistic Networks BN = (X,D,G,P) P(D|C,B) P(S,C,B,X,D) = P(S)· P(C|S)·P(B|S)·P(X|C,S)·P(D|C,B) MPE= Find a maximum probability assignment, given evidence MPE= find argmax P(S)· P(C|S)· P(B|S)· P(X|C,S)· P(D|C,B)
The “alarm” network - 37 variables, 509 parameters (instead of 237) MINVOLSET KINKEDTUBE PULMEMBOLUS INTUBATION VENTMACH DISCONNECT PAP SHUNT VENTLUNG VENITUBE PRESS MINOVL FIO2 VENTALV PVSAT ANAPHYLAXIS ARTCO2 EXPCO2 SAO2 TPR INSUFFANESTH HYPOVOLEMIA LVFAILURE CATECHOL LVEDVOLUME STROEVOLUME ERRCAUTER HR ERRBLOWOUTPUT HISTORY CO CVP PCWP HREKG HRSAT HRBP BP Monitoring Intensive-Care Patients
Oil sales Test cost Drill cost Test Test result Oil produced Chance variables: over domains. Decision variables: CPT’s for chance variables: Reward components: Utility function: Oil sale policy Drill Sales cost Seismic structure Oil underground Market information Influence Diagrams Task: find optimal policy: Influence diagram ID = (X,D,P,R).
A F B C E D Graphical Models • A graphical model (X,D,F): • X = {X1,…Xn} variables • D = {D1, … Dn} domains • F = {f1,…,fr} functions(constraints, CPTS, CNFs …) • Operators: • combination • elimination (projection) • Tasks: • Beliefupdating: X-y j Pi • MPE: maxX j Pj • CSP: X j Cj • Max-CSP: minX j Fj Conditional ProbabilityTable (CPT) Relation Primal graph (interaction graph) • All these tasks are NP-hard • exploit problem structure • identify special cases • approximate
Sample Domains for GM • Web Pages and Link Analysis • Communication Networks (Cell phone Fraud Detection) • Natural Language Processing (e.g. Information Extraction and Semantic Parsing) • Battle-space Awareness • Epidemiological Studies • Citation Networks • Intelligence Analysis (Terrorist Networks) • Financial Transactions (Money Laundering) • Computational Biology • Object Recognition and Scene Analysis …
Types of Constraint Optimization • Valued CSPs, Weighted CSPs, Max-CSPs, Max-SAT • Most Probable Explanation (MPE) • Linear Integer Programs • Examples: • Problems translated from planning • Unit scheduling maintenance • Combinatorial auctions • Maximum-likelihood haplotypes in linkage
Outline • Introduction • Graphical models • Optimization tasks for graphical models • Solving optimization problems by inference and search • Fundamental operations on cost functions • Inference • Search (OR) • Lower-bounds and relaxations • Exploiting problem structure in search • Other hybrids of search and inference • Software & Applications
Graphical Models Reasoning Time: exp(n) Space: linear Search: Conditioning Incomplete Simulated Annealing Complete Gradient Descent Time: exp(w*) Space: exp(w*) Depth-first search Branch-and-Bound A* search Incomplete Hybrids: Local Consistency Complete Unit Resolution mini-bucket(i) Adaptive Consistency Tree Clustering Dynamic Programming Resolution Inference: Elimination
Bucket Elimination(Variable Elimination) = ¹ D = C A ¹ C B = A contradiction = Bucket E: E ¹ D, E ¹ C Bucket D: D ¹ A Bucket C: C ¹ B Bucket B: B ¹ A Bucket A:
A A D D B B C C F F E E G G … A=1 A=k D D B B D B C F C F C F E E G G E G Conditioning vs. Elimination Conditioning (search) Elimination (inference) k “sparser” problems 1 “denser” problem
Outline • Introduction • Graphical models • Optimization tasks for graphical models • Solving optimization problems by inference and search • Fundamental operations on cost functions • Inference • Search (OR) • Lower-bounds and relaxations • Exploiting problem structure in search • Other hybrids of search and inference • Software & Applications
Fundamental Operations on Cost Functions • To be completed …
Outline • Introduction • Inference • Bucket Elimination • Bucket-Tree Elimination and Cluster-Tree Elimination • Polynomial classes based on structure • Search (OR) • Lower-bounds and relaxations • Exploiting problem structure in search • Other hybrids of search and inference • Software & Applications
F(a,b)+F(a,c)+F(a,d)+F(b,c)+F(b,d)+F(b,e)+F(c,e) OPT = Variable Elimination F(a,b)+F(b,c)+F(b,d)+F(b,e) F(a,c)+F(c,e) + F(a,d) + Computing the Optimal Cost Solution A Constraint graph B C B C D E D E
OPT Finding Algorithm elim-opt(Dechter, 1996)Non-serial Dynamic Programming(Bertele & Briochi, 1973) Elimination operator bucket B: F(a,b) F(b,c) F(b,d) F(b,e) B bucket C: F(c,a) F(c,e) C bucket D: F(a,d) D bucket E: e=0 E bucket A: A
Generating the Optimal Assignment B: F(a,b) F(b,c) F(b,d) F(b,e) C: F(c,a) F(c,e) D: F(a,d) E: e=0 A:
OPT Complexity Algorithm elim-opt(Dechter, 1996)Non-serial Dynamic Programming(Bertele & Briochi, 1973) Elimination operator bucket B: F(a,b) F(b,c) F(b,d) F(b,e) B bucket C: F(c,a) F(c,e) C bucket D: F(a,d) D bucket E: e=0 E exp(w*=4) ”induced width” (max clique size) bucket A: A
Induced-width • Width along ordering d, w(d): • max # of previous neighbors (parents) • Induced width along ordering d, w*(d): • The width in the ordered induced graph, obtained by connecting “parents” of each node X, recursively from top to bottom E D C B A
A B C D E constraint graph B E C D D C E B A A Complexity of Bucket Elimination Bucket-Elimination is time and space The effect of the ordering: r = number of functions Finding smallest induced-width is hard!
Bucket-Tree Elimination • Describe BTE
Cluster-Tree Elimination • Describe CTE
Outline • Introduction • Inference • Search (OR) • Branch-and-Bound and Best-First search • Branching schemes, variable and value orderings • Local search and Partial search • Lower-bounds and relaxations • Exploiting problem structure in search • Other hybrids of search and inference • Software & Applications
C A F D B E A B C D E F The Search Space Objective function: 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
C A F D B E A B C D E F The Search Space 0 0 0 1 2 0 0 4 0 1 0 1 3 1 5 4 0 2 2 5 0 1 0 1 0 1 0 1 5 6 4 2 2 4 1 0 5 6 4 2 2 4 1 0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 3 5 3 5 3 5 3 5 1 3 1 3 1 3 1 3 5 2 5 2 5 2 5 2 3 0 3 0 3 0 3 0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 3 0 2 2 3 0 2 2 3 0 2 2 1 2 0 4 1 2 0 4 1 2 0 4 3 0 2 2 3 0 2 2 3 0 2 2 3 0 2 2 3 0 2 2 1 2 0 4 1 2 0 4 1 2 0 4 1 2 0 4 1 2 0 4 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 Arc-cost is calculated based on cost components.
C A F D B E A B C D E F The Value Function 5 0 0 5 7 0 1 2 0 0 4 6 5 7 4 0 1 0 1 3 1 5 4 0 2 2 5 8 5 3 1 7 4 2 0 0 1 0 1 0 1 0 1 5 6 4 2 2 4 1 0 5 6 4 2 2 4 1 0 3 3 3 3 1 1 1 1 2 2 2 2 0 0 0 0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 3 5 3 5 3 5 3 5 1 3 1 3 1 3 1 3 5 2 5 2 5 2 5 2 3 0 3 0 3 0 3 0 0 2 0 2 0 2 0 2 0 2 0 2 0 2 0 2 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 3 0 2 2 3 0 2 2 3 0 2 2 1 2 0 4 1 2 0 4 1 2 0 4 3 0 2 2 3 0 2 2 3 0 2 2 3 0 2 2 3 0 2 2 1 2 0 4 1 2 0 4 1 2 0 4 1 2 0 4 1 2 0 4 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 Value of node = minimal cost solution below it
C A F D B E A B C D E F An Optimal Solution 5 0 0 5 7 0 1 2 0 0 4 6 5 7 4 0 1 0 1 3 1 5 4 0 2 2 5 8 5 3 1 7 4 2 0 0 1 0 1 0 1 0 1 5 6 4 2 2 4 1 0 5 6 4 2 2 4 1 0 3 3 3 3 1 1 1 1 2 2 2 2 0 0 0 0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 3 5 3 5 3 5 3 5 1 3 1 3 1 3 1 3 5 2 5 2 5 2 5 2 3 0 3 0 3 0 3 0 0 2 0 2 0 2 0 2 0 2 0 2 0 2 0 2 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 3 0 2 2 3 0 2 2 3 0 2 2 1 2 0 4 1 2 0 4 1 2 0 4 3 0 2 2 3 0 2 2 3 0 2 2 3 0 2 2 3 0 2 2 1 2 0 4 1 2 0 4 1 2 0 4 1 2 0 4 1 2 0 4 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 Value of node = minimal cost solution below it
2. Best-First Search Always expand the node with the highest heuristic value f(xp) Needs lots of memory 1. Branch-and-Bound Use heuristic function f(xp) to prune the depth-first search tree Linear space f L L Basic Heuristic Search Schemes Heuristic function f(xp) computes a lower bound on the best extension of xp and can be used to guide a heuristic search algorithm. We focus on:
Best-First vs. Depth-first Branch-and-Bound • Best-First (A*): (optimal) • Expand least number of nodes given h • Requires to store all search tree • Depth-first Branch-and-Bound: • Can use only linear space • If find an optimal solution early will expand the same space as Best-First (if search space is a tree) • B&B can improve heuristic function dynamically
How to Generate Heuristics • The principle of relaxed models • Mini-Bucket Elimination • Bounded directional consistency ideas • Linear relaxation for integer programs
Outline • Introduction • Inference • Search (OR) • Lower-bounds and relaxations • Bounded inference • Mini-Bucket Elimination • Static and dynamic mini-bucket heuristics • Local consistency • Continuous relaxations • Exploiting problem structure in search • Other hybrids of search and inference • Software & Applications
Mini-Bucket Approximation Split a bucket into mini-buckets => bound complexity bucket (X) = { h1, …, hr, hr+1, …, hn } { h1, …, hr } { hr+1, …, hn }
Mini-Bucket Elimination Mini-buckets minBΣ bucket B: F(b,e) F(a,b) F(b,d) A bucket C: F(c,e) F(a,c) B C F(a,d) hB(a,d) bucket D: E D hC(e,a) hB(e) bucket E: hE(a) hD(a) bucket A: L = lower bound
Semantics of Mini-Bucket: Splitting a Node Variables in different buckets are renamed and duplicated (Kask et. al., 2001), (Geffner et. al., 2007), (Choi, Chavira, Darwiche , 2007) Before Splitting:Network N After Splitting:Network N' U U Û
MBE-MPE(i) Algorithm Approx-MPE(Dechter & Rish, 1997) • Input: i – max number of variables allowed in a mini-bucket • Output: [lower bound (P of a sub-optimal solution), upper bound] Example: approx-mpe(3) versus elim-mpe
Properties of MBE(i) • Complexity: O(r exp(i)) time and O(exp(i))space • Yields an upper-bound and a lower-bound • Accuracy: determined by upper/lower (U/L) bound • As i increases, both accuracy and complexity increase • Possible use of mini-bucket approximations: • As anytime algorithms • As heuristics in search • Other tasks: similar mini-bucket approximations for: • Belief updating, MAP and MEU(Dechter & Rish, 1997)
Empirical Evaluation(Rish & Dechter, 1999) • Benchmarks • Randomly generated networks • CPCS networks • Probabilistic decoding • Task • Comparing approx-mpe and anytime-mpe versus bucket-elimination (elim-mpe)
Time (sec) Algorithm cpcs360 cpcs422 elim-mpe 115.8 1697.6 anytime-mpe( ), 70.3 505.2 anytime-mpe( ), 70.3 110.5 CPCS networks – medical diagnosis(noisy-OR model) Test case: no evidence
Outline • Introduction • Inference • Search (OR) • Lower-bounds and relaxations • Bounded inference • Mini-Bucket Elimination • Static and dynamic mini-bucket heuristics • Local consistency • Continuous relaxations • Exploiting problem structure in search • Other hybrids of search and inference • Software & Applications
0 D 0 B E 0 D 1 A B 1 D E 1 Generating Heuristic for Graphical Models(Kask & Dechter, AIJ’01) Given a cost function C(a,b,c,d,e) = F(a) + F(b,a) + F(c,a) + F(e,b,c) + F(d,b,a) Define an evaluation function over a partial assignment as the probability of it’s best extension D f*(a,e,d) = minb,c F(a,b,c,d,e) = = F(a) + minb,c F(b,a) + F(c,a) + F(e,b,c) + F(d,a,b) = g(a,e,d) • H*(a,e,d)
Generating Heuristics (cont.) H*(a,e,d) = minb,c F(b,a) + F(c,a) + F(e,b,c) + F(d,a,b) = minc [F(c,a)+ minb [F(e,b,c) + F(b,a) + F(d,a,b)]] >=minc [F(c,a)+ minb F(e,b,c) + minb [F(b,a) + F(d,a,b)]] = minb [F(b,a) + F(d,a,b)] + minc [F(c,a)+ minb F(e,b,c)] = hB(d,a) + hC(e,a) = H(a,e,d) f(a,e,d) = g(a,e,d) + H(a,e,d)<=f*(a,e,d) The heuristic function H is what is compiled during the preprocessing stage of the Mini-Bucket algorithm.
Generating Heuristics (cont.) H*(a,e,d) = minb,c F(b,a) + F(c,a) + F(e,b,c) + F(d,a,b) = minc [F(c,a)+ minb [F(e,b,c) + F(b,a) + F(d,a,b)]] >=minc [F(c,a)+ minb F(e,b,c) + minb [F(b,a) + F(d,a,b)]] = minb [F(b,a) + F(d,a,b)] + minc [F(c,a)+ minb F(e,b,c)] = hB(d,a) + hC(e,a) = H(a,e,d) f(a,e,d) = g(a,e,d) + H(a,e,d)<=f*(a,e,d) The heuristic function H is what is compiled during the preprocessing stage of the Mini-Bucket algorithm.
B: F(E,B,C) F(D,A,B) F(B,A) C: F(C,A) hB(E,C) D: hB(D,A) 0 D B 0 E 0 E: hC(E,A) D 1 A B 1 A: F(A) hE(A) hD(A) D E 1 f(a,e,D) = F(a) + hB(D,a) + hC(e,a) g h – is admissible Static MBE Heuristics • Given a partial assignment xp, estimate the cost of the best extension to a full solution • The evaluation function f(xp) can be computed using function recorded by the Mini-Bucket scheme A B C E D Cost Network f(a,e,D))=g(a,e) + H(a,e,D )