Backdoor Sets in SAT Instances

Backdoor Sets in SAT Instances Ryan WilliamsCarnegie Mellon UniversityJoint work in IJCAI03 with:Carla Gomes and Bart Selman Cornell University

Significant progress in Complete search methods! (Complete = always returns SAT or unSAT) Software and hardware verification – complete methods are critical - e.g. for verifying the correctness of chip design, using SAT encodings Current methods can verify automatically the correctness of a large fraction of a Pentium IV.

A “real world” example (Thanks to: Oliver Kullmann)

Bounded Model Checking instance: i.e. ((x1) or x7) and ((x1) or x6) and … etc.

10 pages later: … (x177 or x169 or x161 or x153 … or x17 or x9 or x1 or (x185)) clauses / constraints are getting more interesting…

4000 pages later: ?!! a 59-cnf clause… …

Finally, 15,000 pages later: Note that: … !!! The MiniSat solver (Een&Sorensson) solves this instance in 2 seconds.

Gap between Theory and Practice • The good scaling behavior of state-of-the art SAT solvers seems to defy our complexity-theoretic intuition that SAT is NP-complete! • How can we explain this gap between theory and practice? • What makes this possible? • Our answer:Hidden tractable substructure in real-world problems. • Can we make this more precise? • Proposal: We consider structures we call backdoor sets. • Idea came out of study of heavy-tailed phenomena in runtime • distributions for some SAT solvers.

Backdoor Sets – Initial Motivation Heavy-tailed distributions and Randomization. Certain problems, when solved by randomized backtracking, yield a runtime distribution that is heavy-tailed • Explains why restarting a solver often is an effective strategy • Implies a wide range of possible solution times, often including short runs Pr[solution found in time t] ~ 1/t^c, 0 < c < 2 How to explain short runs?

Explaining short runs:Backdoors to tractability • Informally: • A backdoor set to a given problem instance is a subset of its variables such that, once assigned values, the remaining instance simplifies to a tractable class. • Formally: • We define notion of a “sub-solver” • (handles tractable substructure of problem instance) • backdoor set and strong backdoor set

Defining a sub-solver Definition is general enough to encompass many polynomial time propagation methods. (Also those for which we do not know a clean characterization of the tractable subclass.) Valid for other encoding languages besides SAT: e.g., Mixed Integer Programming and Constraint Satisfaction Problems

Defining backdoors Backdoor set (for satisfiable instances): Strong backdoor set (applies to satisfiable or inconsistent instances):

Backdoors can be surprisingly small: • Backdoors help explain how a solver can get “lucky” on certain runs: backdoor sets are identified early on in backtracking search. Most recent: Other combinatorial domains. E.g. Graphplan planning, near constant size backdoors (2 or 3 variables) in certain domains. (Hoffman, Gomes, Selman ’03) Backdoors capture critical problem resources (bottlenecks).

Constraint Satisfaction Problem • The Constraint Satisfaction Problem (CSP): • A finite set of n variables is given and with each variable is associated a non-empty finite domain. • A constraint onk variables X1,…,Xk is a relation R(X1,…,Xk)D1x …x Dk. • A solution to a CSP is an assignment of values to all the variables,satisfying all the constraints. • (Satisfaction of a constraint = the relation holds) (Dechter 86, Freuder 82, Mackworth 77, Tsang 93, van Beek and Dechter 97)

Explicit Algorithms for Finding/Exploiting Backdoor Sets • We cover three kinds of strategies for dealing with instances with small backdoor sets: • A deterministic algorithm • A randomized algorithm • Provably better worst-case performance over the deterministic one • A heuristicrandomized algorithm • Assumes existence of a good heuristic for choosing variables to branch on • We believe this is close to what happens in practice

Deterministic Generalized Iterative Deepening

Randomized Generalized Iterative Deepening • Assumption: • There exists a backdoor whose size is bounded by a function of n (call it B(n)) • Idea: • Repeatedly choose random subsets of variables that are slightly larger than B(n), searching these subsets for the backdoor

Randomized Generalized Iterative Deepening

Deterministic Versus Randomized Suppose variables have 2 possible values (e.g. SAT) For B(n) = n/k, algorithm runtime iscn c Deterministic algorithm Randomized algorithm k

Complete Randomized Depth First Search with Heuristic • Assume we have the following. • DFS, a generic depth first search randomized • backtrack search solver with: • (polytime)sub-solverA • Heuristic Hthat (randomly) chooses variables to branch on, in polynomial time • Hhas probability1/hof choosing a • backdoor variable (h is a fixed constant) • Call this ensemble (DFS, H, A)

Polytime Restart Strategy for(DFS, H, A) • Essentially: • If there is a small backdoor, • then(DFS, H, A) has a restart strategy that runs in polytime.

Runtime Table for Algorithms DFS,H,A B(n) = upper bound on the size of a backdoor, given n variables When the backdoor is a constant fraction of n, there is an exponential improvement between the randomized and deterministic algorithm

Summary • Introduced notion of a “backdoor set” of variables. • More closely captures combinatorics of a problem instance, as dealt with in practice. • Provides insight into restart strategies. • 3) Backdoors can be surprisingly small in practice. • 4) Search heuristics + randomization can be used to find them, provably efficiently.

Backdoor Sets in SAT Instances