120 likes | 290 Views
Solution Counting Methods for Combinatorial Problems. Ashish Sabharwal [ Cornell University] Based on joint work with: Carla Gomes, Willem-Jan van Hoeve, Lukas Kroc, Bart Selman INFORMS, Oct 2008, Washington, D.C. Context. Constraint Satisfaction Problems (CSPs)
E N D
Solution Counting Methodsfor Combinatorial Problems Ashish Sabharwal [Cornell University] Based on joint work with: Carla Gomes, Willem-Jan van Hoeve,Lukas Kroc, Bart Selman INFORMS, Oct 2008, Washington, D.C.
Context • Constraint Satisfaction Problems (CSPs) • In particular, Boolean Satisfiability or SAT : • Given a Boolean formula F in conjunctive normal forme.g. F = (a or b) and (a or c or d) and (b or c)determine whether F is satisfiable • NP-complete • widely used in practice, e.g. in hardware & software verification, design automation, AI planning, … How many satisfying assignments does F have? • #F, the “model count” of F, the solution count of F • #SAT is #P-complete INFORMS-08
Model Counting for SAT • Inspired by the success of SAT solvers, a lot of activity in the last few years in attacking the solution counting problem • Aside: “success of SAT” = scalability, industrial applications, black-box nature and standardized input making it ‘easy’ for users • Many different approaches, many different counting goals • A “zoo” of techniques! • This talk: to give a brief overview of these techniques, many of which are contributed by our group at Cornell • Further reading and refs: Model Counting chapter in the upcoming Handbook of Satisfiability (draft available on my webpage) – with Carla Gomes and Bart Selman INFORMS-08
What shall we count? E.g., F has N=1000 variables and 10150≈ 2500 solutions 0 #F 2N 0 #F Exact count Strict “(,)” guarantee Estimate, no guarantees Lower bound Upper bound (appears hard!) INFORMS-08
Problem Space: why are upper bounds hard? • Number of solutions is often a miniscule fraction of the search space size • Limits our ability to reason about upper bounds • E.g., after having searched half the space, could still have 2999 potential solutions remaining in the worst case! (off by a factor of 2499) • Probabilistic methods work better for lower bounds • E.g., if expected value = true count, Markov’s ineq. says,can’t get high numbers too often because 0’s can’t compensate enough • reverse Markov’s ineq. doesn’t help: can get low numbers too oftenbecause a single 2N can compensate for a lot of low numbers! E.g., F has N=1000 variables and 10150≈ 2500 solutions 0 #F 2N INFORMS-08
The “Zoo” of Counting Methods Solutioncounting Exact methods Approximate methods FPRAS:MCMC sampling “Only” thecount Count + manyby-products Estimation withoutany guarantee Practical boundswith a guarantee U DPLL-stylebacktracksearch Knowledgecompilation L L Backtr. search+ randomization + statistics Using backtr.-free space Sampling +randomization L U L FPT:branch-width,tree-width,… Sampling +multipliers XOR streamlining(randomized) Belief prop. +randomization Note: not an exhaustive listing INFORMS-08
I. Exact Methods Exact methods “Only” thecount Count + manyby-products [“CDP”, Birnbaul-Lozinskii-99] [“relsat”, Bayardo-Pehoushek-00] [“cachet”, Sang et al-04] [“sharpSAT”, Thurley-06] DPLL-stylebacktracksearch Knowledgecompilation FPT:branch-width,tree-width,… [tree-width: Gottlob-Scarcello-Sideri-02] [branch-width: Bacchus-Dalmao-Pitassi-03] [cluster-width: Fischer-Makowsky-Ravve-08]
Knowledge Compilation for Counting • Main idea: convert F into a different “form” from which one can easily read off the solution count (and many other quantities of interest) • d-DNNF: Deterministic, Decomposable Negation Normal Form • Think of the formula as a directed acyclic graph (DAG) • Negations allowed only at the leaves (NNF) • Children of AND node don’t share any variables (different “components”) • Children of OR node don’t share any solutions • Once converted to d-DNNF, can answer many queries in linear time • Satisfiability, tautology, logical equivalence, solution counts, … • Any query that a BDD could answer • Our recent result: can count number of “clusters” of solutions – how many different kinds/families of solutions are there? [DNNF, “c2d”, Darwiche et al. 2001-05] can multiplythe counts can addthe counts [To appearin NIPS-08] INFORMS-08
II. ApproximateMethods [Karp-Luby-85][Karp-Luby-Madras89] Approximate methods FPRAS:MCMC sampling Estimation withoutany guarantee Practical boundswith a guarantee [“SampleMinisat”, Gogate-Dechter-07] U L L Backtr. search+ randomization + statistics Using backtr.-free space Sampling +randomization L U L Sampling +multipliers XOR streamlining(randomized) Belief prop. +randomization [“MiniCount”, CPAIOR-08]
XOR Streamlining for Bounds on #F • Main idea: rather than modifying the algorithm for solving, modify the problem, run the solver, deduce the count • Randomized algorithm, expected value = true count • Can be converted into bounds with correctness guarantees • Lower bounds easier in practice (XORs of any “length” work) • Upper bounds possible but not so easy • Empirical evidence: can get by with “very short” XORs • Can be extended to general CSPs [“Mbound”, AAAI-06] Off-the-shelfSAT Solver CNF formula Streamlinedformula Model count Random XORconstraints ideal when systematic search works well! [SAT-07] [AAAI-07; see Willem’s talk] INFORMS-08
Sampling for Estimates + Lower Bound • Main idea: “find” a balanced variable – one thatappears roughly equally as True and as Falsein solutions; fix to one value, count thatsub-problem, re-scale with appropriate multiplier • Finding balanced variables not so easy • Use solution sampling: ideal when local search works well! • Use Belief Propagation for “marginal” prob. estimates:ideal when message passing works well! • Randomize the process: expected value = true count, as before! • Great lower bounds, but variance too high for good upper bounds x=? T F E.g., count #F|x=T, scale up by factor 100/60 40% ofsolutions 60% ofsolutions [“ApproxCount”, Wei-Selman-05] [“BPCount”, CPAIOR-08] [“SampleCount”, IJCAI-07] INFORMS-08
The “Zoo” of Counting Methods Solutioncounting Exact methods Approximate methods FPRAS:MCMC sampling “Only” thecount Count + manyby-products Estimation withoutany guarantee Practical boundswith a guarantee U DPLL-stylebacktracksearch Knowledgecompilation L L Backtr. search+ randomization + statistics Using backtr.-free space Sampling +randomization L U L FPT:branch-width,tree-width,… Sampling +multipliers XOR streamlining(randomized) Belief prop. +randomization Note: not an exhaustive listing INFORMS-08