1 / 17

Trend Towards Parallelization

Resolution and Parallelizability: Barriers to the Efficient Parallelization of SAT Solvers George Katsirelos MIAT, INRA, Toulouse, France Ashish Sabharwal IBM Watson Research Center, USA Horst Samulowitz Laurent Simon Univ. Paris- Sud , LRI/CNRS, Orsay , France. Trend Towards Parallelization.

river
Download Presentation

Trend Towards Parallelization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Resolution and Parallelizability:Barriers to the Efficient Parallelization of SAT SolversGeorge Katsirelos MIAT, INRA, Toulouse, FranceAshish Sabharwal IBM Watson Research Center, USAHorst SamulowitzLaurent Simon Univ. Paris-Sud, LRI/CNRS, Orsay, France

  2. Trend Towards Parallelization • Focus Shifting From Single-Thread Performanceto Multi-Processor Performance • 100s and even 1000s of compute cores easily accessible • Classical Algorithm Parallelization, e.g., parallel sort, PRAM model • Significant Advances in Data Parallelisme.g., MapReduce, Hadoop, SystemML, R statistics • Challenge: Search and Optimization on 1000s of Processors • Tremendous advances in the Sequential case of Combinatorial SearchE.g., SAT solvers can tackle instances with ~2M variables, 10M constraints! • Exponential search appears to be an “obvious” candidate to parallelize! • In fact, many SAT/CSP/MIP solvers already do support multi-core runs AAAI 2013 | Katsirelos, Samulowitz, Sabharwal, Simon

  3. Parallelization of Combinatorial Search • Fact: State-of-the-Art Search Engines Do NOT Parallelize Well • Brute Force exponential search is, of course, trivial to parallelize • But sophisticated search engines that adapt (through e.g. clause learning, impact aggregation, etc.) have inherent sequential aspects • AAAI 2012 Challenge Paper on the topic [Hamadi& Wintersteiger2012] • Rather Disappointing Performance at SAT Competitions. E.g., in 2011: • 8-coretrack: average speedup of best parallel solvers only ~1.8x • 32-core track: only ~3x • Top performing solvers based on little to no communication(CryptoMinisat-MT [Soos 2012], Plingeling[Biere 2012]) • Parallel track winners were “simple” Portfolio solvers(ppfolio[Roussel 2012], pfolioUZK[Wotzlaw et al, 2012]) AAAI 2013 | Katsirelos, Samulowitz, Sabharwal, Simon

  4. What makes parallelization of SAT solvers hard? Can we obtain insights into their behaviorbeyond eventual wall-clock performance? AAAI 2013 | Katsirelos, Samulowitz, Sabharwal, Simon

  5. Contributions of the Paper • A New Systematic Study of Parallelism in the Context of Searchthrough the Lens of Proof Complexity • Focus on understanding rather than on engineering • Are there inherent bottlenecks that may hinder parallelization,irrespective of which heuristics are used to share information? • A Practical Study: Interesting properties of Actual Proofs • Proofs generated by state-of-the-art SAT solvers contain narrow bottlenecks • Proof-Based Measures that capture Best-Case Parallelizability • Coarse measure: “Depth” of the proof graph • Refined measure: Makespan of a resource constrained scheduling problem • Empirical Findings: Correlations and Parallelization Limits • Typical sequential proofs are not very parallelizable even in the best case! • “Schedule speedup” / makespan correlates with observed speedup AAAI 2013 | Katsirelos, Samulowitz, Sabharwal, Simon

  6. Approach: Proof Complexity(applied here to Typically Generated Proofs) • Proof Complexity[Cook & Reckhov, 1979]: Study the nature (e.g., size, depth, width, “shape”, etc.) of Proofs of Unsatisfiability • Resolution Graph of Conflict-Directed-Clause-Learning (CDCL) SAT Solvers Runtime(any SAT solver, F)  minproofs Size(Resolution proof of F) • Note: Insights applicable also to Satisfiable instances! • Solvers prove a lot of sub-formulas to be unsatisfiable before hitting the first solution • Formal characterization [Achlioptas et al, 2001 & 2004] • Study of Proofs has provided strong insights into CDCL SAT solvers • What does “clause learning” bring? • What do “restarts” add? [Beame et al, 2004; Buss et al, 2008, 2012; Hertel et al, 2008; Pipatsrisawat et al, 2011] Worst case / Best caseresults AAAI 2013 | Katsirelos, Samulowitz, Sabharwal, Simon

  7. Underlying Inference Principle: Resolution • CDCL SAT solvers produce Resolution Derivations • Proof Graphand Depth: • Each initial and derived constraint is a node, annotated with its proof depth • proofdepth(initial clause C) = 0 • proofdepth(derived clause C) = 1 + maxparentsproofdepth(parent(C)) F : C1 0 C2 0 C3 0 C4 0 C5 0 C6 0 C7 1 C9 1 C8 2 C11 2 C10 3 C12 3 Constraint ID Depth C13 4 AAAI 2013 | Katsirelos, Samulowitz, Sabharwal, Simon

  8. How Parallelizable are Resolution Refutations? • Refutation(F) = Resolution Proof that derives the empty (“false”) clause • Depth of the proof clearly limits the amount of potential parallelization • Chain of dependencies • Theorem: Certain “pebbling” style instances have large depth • However, proofdepth bound on parallelization is very crude • Does not explain poor performance with small k (e.g., 8, 32, … processors) How does a typical sequential SAT solver proof look like? • Setup for Experiments: • Sequential Glucose 2.1 extended with proof output • GluSatX10: using SatX10 to run a k-processor version of Sequential Glucose • Working Assumption: Proofs produced by GluSatX10 on k cores look “similar”to proofs produced by Sequential Glucose ** simplified statements; see paper for more formal notions http://x10-lang.org/satx10[IBM Teams: X10 and SAT/CSP] AAAI 2013 | Katsirelos, Samulowitz, Sabharwal, Simon

  9. Proof Graph Example: Very Complex Structure [Easy sequential case, solved in ~30 seconds] AAAI 2013 | Katsirelos, Samulowitz, Sabharwal, Simon

  10. Bottlenecks in Typical SAT Proofs • Proofs Generated by SAT Solvers Exhibit Surprisingly Narrow “Bottlenecks”, i.e., Depths with Very Few (~1) Clauses! • Nothing deeper can be derived before bottleneck clauses  Sequentiality Number of Clauses (log-scale)Derived at that Depth Depth in the proof AAAI 2013 | Katsirelos, Samulowitz, Sabharwal, Simon

  11. Best-Case Parallelization with k Processors • Given Proof P and k Processors, Best-Case Parallelization of P= Resource Constrained Scheduling Problem with Precedences • Let Mk(P) = makespan of the optimal schedule of P on k processors • Even approximating Mk(P) within 4/3 is NP-hard, but (2 – 1/k) approx. is easy • Best-Case k processor speedupon P: Sk(P) = M1(P) / Mk(P) C1 0 C2 0 C3 0 C4 0 C5 0 C6 0 1 1 2 C7 1 C9 1 C’9 1 Example: M1(P) = 8 M2(P) = 5 M3(P) = 4 M4(P) = 4 … depth = 4 3 2 C8 2 C11 2 4 3 C10 3 C12 3 Constraint ID Depth 5 C13 4 AAAI 2013 | Katsirelos, Samulowitz, Sabharwal, Simon

  12. Makespan vs. Proof Depth • Schedule Makespanyields a finer grained lower bound, Sk(P),on best-case parallelization than proof depth • proofdepth(P) : limit of parallelization of P with “infinite” processors • Mk(P)  proofdepth(P) • Mk(P)  proofdepth(P) as k   AAAI 2013 | Katsirelos, Samulowitz, Sabharwal, Simon

  13. Empirical Findings AAAI 2013 | Katsirelos, Samulowitz, Sabharwal, Simon

  14. Even Best-Case Parallelization Efficiency is LowBeyond 100 Processors Best-Case Efficiency of parallelizing P with k processors = 100 * (Sk(P) / k) E.g., 100% = full utilization of k processors  speedup = k AAAI 2013 | Katsirelos, Samulowitz, Sabharwal, Simon

  15. Proofs of Some Instances Exhibit Very LowBest-Case Schedule Speedup B) 128 processors insufficient toachieve a speedup of ~ 90 A) Even with 1024 processors,best-case speedup ~ 50-100 AAAI 2013 | Katsirelos, Samulowitz, Sabharwal, Simon

  16. Best-Case Schedule Speedup Correlates WithActual Observed Runtime Speedup (Makes the study of the best-case schedule speedup relevant) Average over a sliding window AAAI 2013 | Katsirelos, Samulowitz, Sabharwal, Simon

  17. Summary • A New Systematic Study of Parallelism in the Context of Searchthrough the Lens of Proof Complexity • Focus on understanding rather than on engineering • Main Findings: • Typical Sequential Refutations Contain Surprisingly Narrow Bottlenecks • Typical Sequential Refutations are Not Parallelizable Beyond a Few Processors, even in the best case of offline ‘schedule speedup’ produced in hindsight • Observed Runtime Speedup with k processors weakly correlates withBest-Case Schedule Speedup of a Sequential Proof produced in hindsight • Open Question: Can we design SAT solvers that generate Proofs that are inherently More Parallelizable? Caveat: assumption that proofs generated by GluSatX10 on k cores look “similar” to proofs generated by Sequential Glucose AAAI 2013 | Katsirelos, Samulowitz, Sabharwal, Simon

More Related