280 likes | 360 Views
Performing Bayesian Inference by Weighted Model Counting. Tian Sang, Paul Beame, and Henry Kautz Department of Computer Science & Engineering University of Washington Seattle, WA. Goal.
E N D
Performing Bayesian Inference by Weighted Model Counting Tian Sang, Paul Beame, and Henry Kautz Department of Computer Science & Engineering University of Washington Seattle, WA
Goal • Extend success of “compilation to SAT” work for NP-complete problems to “compilation to #SAT” for #P-complete problems • Leverage rapid advances in SAT technology • Example: Computing permanent of a 0/1 matrix • Inference in Bayesian networks (Roth 1996, Dechter 1999) • Provide practical reasoning tool • Demonstrate relationship between #SAT and conditioning algorithms • In particular: compilation to DNNF (Darwiche 2002, 2004)
Contributions • Simple encoding of Bayesian networks into weighted model counting • Techniques for extending state-of-the-art SAT algorithms for efficient weighted model counting • Evaluation on computationally challenging domains • Outperforms join-tree methods on problems with high tree-width • Competitive with best conditioning methods on problems with high degree of determinism
Outline • Model counting • Encoding Bayesian networks • Related Bayesian inference algorithms • Experiments • Grid networks • Plan recognition • Conclusion
SAT and #SAT • Given a CNF formula, • SAT: find a satisfying assignment n • #SAT: count satisfying assignments • Example: (x y) (y z) • 5 models: (0,1,0), (0,1,1), (1,1,0), (1,1,1), (1, 0, 0) • Equivalently: satisfying probability = 5/23 • Probability that formula is satisfied by a random truth assignment • Can modify Davis-Putnam-Logemann-Loveland to calculate this value
DPLL for SAT DPLL(F) if F is empty, return 1 if F contains an empty clause, return 0 else choose a variable x to branch return (DPLL(F|x=1) V DPLL(F|x=0)) #DPLL for #SAT #DPLL(F) // computes satisfying probability of F if F is empty, return 1 if F contains an empty clause, return 0 else choose a variable x to branch return 0.5*#DPLL(F|x=1 )+ 0.5*#DPLL(F|x=0)
Weighted Model Counting • Each literal has a weight • Weight of a model = Product of weight of its literals • Weight of a formula = Sum of weight of its models WMC(F) if F is empty, return 1 if F contains an empty clause, return 0 else choose a variable x to branch return weight(x) * WMC(F|x=1) + weight(x) * WMC(F|x=0)
Cachet • State of the art model counting program (Sang, Bacchus, Beame, Kautz, & Pitassi 2004) • Key innovation: sound integration of component caching and clause learning • Component analysis(Bayardo & Pehoushek 2000): if formulas C1 and C2 share no variables, BWMC (C1 C2) = BWMC (C1) * BWMC (C2) • Caching (Majercik & Littman 1998; Darwiche 2002; Bacchus, Dalmao, & Pitassi 2003; Beame, Impagliazzo, Pitassi, & Segerland 2003): save and reuse values of internal nodes of search tree • Clause learning(Marquis-Silva 1996; Bayardo & Shrag 1997; Zhang, Madigan, Moskewicz, & Malik 2001): analyze reason for backtracking, store as a new clause
Cachet • State of the art model counting program (Sang, Bacchus, Beame, Kautz, & Pitassi 2004) • Key innovation: sound integration of component caching and clause learning • Naïve combination of all three techniques is unsound • Can resolve by careful cache management (Sang, Bacchus, Beame, Kautz, & Pitassi 2004) • New branching strategy (VSADS) optimized for counting (Sang, Beame, & Kautz SAT-2005)
Computing All Marginals • Task: In one counting pass, • Compute number of models in which each literal is true • Equivalently: compute marginal satisfying probabilities • Approach • Each recursion computes a vector of marginals • At branch point: compute left and right vectors, combine with vector sum • Cache vectors, not just counts • Reasonable overhead: 10% - 40% slower than counting
B B A 0.2 0.8 A 0.6 0.4 Encoding Bayesian Networks to Weighted Model Counting A A 0.1 B
B B A 0.2 0.8 A 0.6 0.4 Encoding Bayesian Networks to Weighted Model Counting A A 0.1 Chance variable P added with weight(P)=0.2 B
B B A 0.2 0.8 A 0.6 0.4 Encoding Bayesian Networks to Weighted Model Counting A A 0.1 and weight(P)=0.8 B
B B A 0.2 0.8 A 0.6 0.4 Encoding Bayesian Networks to Weighted Model Counting A A 0.1 Chance variable Q added with weight(Q)=0.6 B
B B A 0.2 0.8 A 0.6 0.4 Encoding Bayesian Networks to Weighted Model Counting A A 0.1 and weight(Q)=0.4 B
B B A 0.2 0.8 A 0.6 0.4 Encoding Bayesian Networks to Weighted Model Counting A A 0.1 B
Main Theorem • Let: • F = a weighted CNF encoding of a Bayes net • E = an arbitrary CNF formula, the evidence • Q = an arbitrary CNF formula, the query • Then:
Exact Bayesian Inference Algorithms • Junction tree algorithm (Shenoy & Shafer 1990) • Most widely used approach • Data structure grows exponentially large in tree-width of underlying graph • To handle high tree-width, researchers developed conditioning algorithms, e.g.: • Recursive conditioning (Darwiche 2001) • Value elimination (Bacchus, Dalmao, Pitassi 2003) • Compilation to d-DNNF (Darwiche 2002; Chavira, Darwiche, Jaeger 2004; Darwiche 2004) • These algorithms become similar to DPLL...
Experiments • Our benchmarks: Grid, Plan Recognition • Junction tree - Netica • Recursive conditioning – SamIam • Value elimination – Valelim • Weighted model counting – Cachet • ISCAS-85 and SATLIB benchmarks • Compilation to d-DNNF – timings from (Darwiche 2004) • Weighted model counting - Cachet
S T Experiments: Grid Networks • CPT’s are set randomly. • A fraction of the nodes are deterministic, specified as a parameter ratio. • T is the query node
Results of ratio=0.5 10 problems of each size, X=memory out or time out
Plan Recognition • Task: • Given a planning domain described by STRIPS operators, initial and goal states, and time horizon • Infer the marginal probabilities of each action • Abstraction of strategic plan recognition: We know enemy’s capabilities and goals, what will it do? • Modified Blackbox planning system (Kautz & Selman 1999) to create instances
Summary • Bayesian inference by translation to model counting is competitive with best known algorithms for problems with • High tree-width • High degree of determinism • Recent conditioning algorithms already make use of important SAT techniques • Most striking: compilation to d-DNNF • Translation approach makes it possible to quickly exploit future SAT algorithms and implementations