210 likes | 443 Views
[published at AAAI-2013]. Resolution and Parallelizability: Barriers to the Efficient Parallelization of SAT Solvers George Katsirelos MIAT, INRA, Toulouse, France Ashish Sabharwal IBM Watson, USA Horst Samulowitz IBM Watson, USA Laurent Simon Univ. Paris- Sud , LRI/CNRS, Orsay , France.
E N D
[published at AAAI-2013] Resolution and Parallelizability:Barriers to the Efficient Parallelization of SAT SolversGeorge Katsirelos MIAT, INRA, Toulouse, FranceAshish Sabharwal IBM Watson, USAHorst SamulowitzIBM Watson, USALaurent Simon Univ. Paris-Sud, LRI/CNRS, Orsay, France
Trend Towards Parallelization • Focus Shifting From Single-Thread Performanceto Multi-Processor Performance • 100s and even 1000s of compute cores easily accessible • Classical Algorithm Parallelization, e.g., parallel sort, shortest path,PRAM model, AC circuits • Significant Advances in Data Parallelisme.g., MapReduce, Hadoop, SystemML, R statistics • Challenge: Search and Optimization on 1000s of Processors • Tremendous advances in the Sequential case of Combinatorial SearchE.g., SAT solvers can tackle instances with ~2M variables, 10M constraints! • Exponential search appears to be an “obvious” candidate to parallelize! • In fact, many SAT/CSP/MIP solvers already do support multi-core andmulti-machine runs Banff Workshop on SAT, 2014 | Katsirelos, Samulowitz, Sabharwal, Simon
Parallelization of Combinatorial Search • Fact: State-of-the-Art Search Engines Do NOT Parallelize Well • Brute Force exponential search is, of course, trivial to parallelize • But sophisticated search engines that adapt (through e.g. clause learning, variable impact aggregation, etc.) have inherent sequential aspects • Modern SAT/MIP/”adapting”-CP solvers do not parallelize well • Supporting data: next slide • AAAI 2012 Challenge Paper on the topic [Hamadi& Wintersteiger2012] • P-completeness of Unit Propagation a key barrier (solvers spend ~80% of the time Unit Propagating and we don’t know how to parallelize P well) • Our result: barriers exist even if Unit Propagation came for free! Banff Workshop on SAT, 2014 | Katsirelos, Samulowitz, Sabharwal, Simon
Parallelization of Combinatorial Search: SAT • Rather Disappointing Performance at SAT Competitions – e.g., in 2011: • Average speedup on 8 cores only ~1.8x, on 32 cores only ~3x • Top performing parallel solvers were based on little to no communication(CryptoMinisat-MT [Soos 2012], Plingeling[Biere 2012]) • Winners were “simple” Portfolio solvers (ppfolio[Roussel], pfolioUZK[Wotzlaw et al]) • Plingeling-ats-587[Dec 2013] • Single machine with 128 coresand 128 GB memory • Benchmark set used in thiswork, restricted to the 142instances solved by 1 core in[10,5000) seconds Banff Workshop on SAT, 2014 | Katsirelos, Samulowitz, Sabharwal, Simon
What makes parallelization of SAT solvers hard? Can we obtain insights into their behaviorbeyond eventual wall-clock performance? Banff Workshop on SAT, 2014 | Katsirelos, Samulowitz, Sabharwal, Simon
Contributions of the Work • A New Systematic Study of Parallelism in the Context of Searchthrough the Lens of Proof Complexity • Focus on understanding rather than on engineering • Are there inherent bottlenecks that may hinder parallelization,irrespective of which heuristics are used to share information? • A Practical Study: Interesting properties of Actual Proofs • Proofs generated by state-of-the-art SAT solvers contain narrow bottlenecks • Proof-Based Measures that capture Best-Case Parallelizability • Coarse measure: “Depth” of the proof graph • Refined measure: Makespan of a resource constrained scheduling problem • Empirical Findings: Correlations and Parallelization Limits • Typical sequential proofs are not very parallelizable even in the best case! • “Schedule speedup” / makespan correlates with observed speedup Banff Workshop on SAT, 2014 | Katsirelos, Samulowitz, Sabharwal, Simon
Approach: Proof Complexity (applied here to Typically Generated Proofs) • Proof Complexity[Cook & Reckhov, 1979]: Study of the nature (e.g., size, width, space, depth, “shape”, etc.) of Proofs of Unsatisfiability • Resolution Graph of Conflict-Directed-Clause-Learning (CDCL) SAT Solvers Runtime(any SAT solver, F) minproofs Size(Resolution proof of F) • Note: Insights applicable also to Satisfiable instances! • Solvers prove a lot of sub-formulas to be unsatisfiable before hitting the first solution • Formal characterization [Achlioptas et al, 2001 & 2004] • Study of Proofs has provided strong insights into CDCL SAT solvers • What does “clause learning” bring? • What do “restarts” add? [Beame et al, 2004; Buss et al, 2008, 2012; Hertel et al, 2008; Pipatsrisawat et al, 2011] Worst case / Best caseresults Banff Workshop on SAT, 2014 | Katsirelos, Samulowitz, Sabharwal, Simon
Underlying Inference Principle: Resolution • CDCL SAT solvers produce Resolution Derivations • Proof Graphand Depth: • Each initial and derived constraint is a node, annotated with its proof depth • proofdepth(initial clause C) = 0 • proofdepth(derived clause C) = 1 + maxparentsproofdepth(parent(C)) F : C1 0 C2 0 C3 0 C4 0 C5 0 C6 0 C7 1 C9 1 C8 2 C11 2 C10 3 C12 3 Constraint ID Depth C13 4 Banff Workshop on SAT, 2014 | Katsirelos, Samulowitz, Sabharwal, Simon
How Parallelizable are Resolution Refutations? • Refutation(F) = Resolution Proof that derives the empty (“false”) clause • Depth of the proof clearly limits the amount of potential parallelization • Chain of dependencies • Theorem: All Resolution Proof Graphs of certain “pebbling” style instances have large depth; also holds for all Conflict Resolution Graphs (XOR substitution trick) • However, proofdepth bound on parallelization is very crude • Does not explain poor performance with small k (e.g., 8, 32, … processors) How does a typical sequential SAT solver proof look like? • Setup for Experiments: • Sequential Glucose 2.1 extended with proof output • GluSatX10: using SatX10 to run a k-processor version of Sequential Glucose • Working Assumption: Proofs produced by GluSatX10 on k cores look “similar”to proofs produced by Sequential Glucose ** simplified statements; see paper for more formal notions http://x10-lang.org/satx10[IBM Teams: X10 and SAT/CSP] Banff Workshop on SAT, 2014 | Katsirelos, Samulowitz, Sabharwal, Simon
Proof Graph Example: Very Complex Structure [Easy sequential case, solved in ~30 seconds] Banff Workshop on SAT, 2014 | Katsirelos, Samulowitz, Sabharwal, Simon
Bottlenecks in Typical SAT Proofs • Proofs Generated by SAT Solvers Exhibit Surprisingly Narrow “Bottlenecks”, i.e., Depths with Very Few (~1) Clauses! • Nothing deeper can be derived before bottleneck clauses Sequentiality Number of Clauses (log-scale)Derived at that Depth Depth in the proof Banff Workshop on SAT, 2014 | Katsirelos, Samulowitz, Sabharwal, Simon
Best-Case Parallelization with k Processors • Given Proof P and k Processors, Best-Case Parallelization of P = Resource Constrained Scheduling Problem with Precedences • Let Mk(P) = makespan of the optimal schedule of P on k processors • Even approximating Mk(P) within 4/3 is NP-hard, but (2 – 1/k) approx. is easy • Best-Case k processor speedupon P: Sk(P) = M1(P) / Mk(P) C1 0 C2 0 C3 0 C4 0 C5 0 C6 0 1 1 2 C7 1 C9 1 C’9 1 Example: M1(P) = 8 M2(P) = 5 M3(P) = 4 M4(P) = 4 … depth = 4 3 2 C8 2 C11 2 4 3 C10 3 C12 3 Constraint ID Depth 5 C13 4 Banff Workshop on SAT, 2014 | Katsirelos, Samulowitz, Sabharwal, Simon
Makespan vs. Proof Depth • Schedule Makespan yields a finer grained lower bound, Sk(P),on best-case parallelization than proof depth • proofdepth(P) : limit of parallelization of P with “infinite” processors • Mk(P) proofdepth(P) • Mk(P) proofdepth(P) as k Banff Workshop on SAT, 2014 | Katsirelos, Samulowitz, Sabharwal, Simon
Empirical Findings Banff Workshop on SAT, 2014 | Katsirelos, Samulowitz, Sabharwal, Simon
Even Best-Case Parallelization Efficiency is LowBeyond 100 Processors Best-Case Efficiency of parallelizing P with k processors = 100 * (Sk(P) / k) E.g., 100% = full utilization of k processors speedup = k Banff Workshop on SAT, 2014 | Katsirelos, Samulowitz, Sabharwal, Simon
Proofs of Some Instances Exhibit Very LowBest-Case Schedule Speedup B) 128 processors insufficient toachieve a speedup of ~ 90 A) Even with 1024 processors,best-case speedup ~ 50-100 Banff Workshop on SAT, 2014 | Katsirelos, Samulowitz, Sabharwal, Simon
Best-Case Schedule Speedup Correlates WithActual Observed Runtime Speedup (Makes the study of the best-case schedule speedup relevant) Average over a sliding window Banff Workshop on SAT, 2014 | Katsirelos, Samulowitz, Sabharwal, Simon
Summary • A New Systematic Study of Parallelism in the Context of Searchthrough the Lens of Proof Complexity • Focus on understanding rather than on engineering • Main Findings: • Typical Sequential Refutations Contain Surprisingly Narrow Bottlenecks • Typical Sequential Refutations are Not Parallelizable Beyond a Few Processors, even in the best case of offline ‘schedule speedup’ produced in hindsight • Observed Runtime Speedup with k processors weakly correlates withBest-Case Schedule Speedup of a Sequential Proof produced in hindsight • Open Question: Can we design SAT solvers that generate Proofs that are inherently More Parallelizable? Caveat: assumption that proofs generated by GluSatX10 on k cores look “similar” to proofs generated by Sequential Glucose Banff Workshop on SAT, 2014 | Katsirelos, Samulowitz, Sabharwal, Simon