1 / 22

One Flip per Clock Cycle

One Flip per Clock Cycle. Martin Henz, Edgar Tan, Roland Yap. SAT Problems. Find an assignment of n variables that satisfies all m clauses (disjunctions of literals of variables) Notation: V: array of boolean values; V[3] is the value of the third variable in assignment V

graceland
Download Presentation

One Flip per Clock Cycle

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. One Flip per Clock Cycle Martin Henz, Edgar Tan, Roland Yap

  2. SAT Problems Find an assignment of n variables that satisfies all m clauses (disjunctions of literals of variables) Notation: V: array of boolean values; V[3] is the value of the third variable in assignment V EVALi(V): evaluation function of clause i, returns boolean value resulting from evaluating clause i under assignment V

  3. GenSAT procedure GenSAT(cnf, maxtries, maxflips) for i = 1 to maxtries do INITASSIGN(V); for j = 1 to maxflips do if V satisfies cnf then return V else f = CHOOSEFLIP(); V := V with variable f flipped end end end end

  4. Instances of GenSAT • GSAT: CHOOSEFLIP randomly chooses a flip that produces maximal score • WSAT: CHOOSEFLIP randomly chooses a violated clause, and randomly chooses among the variables of that clause a flip that produces maximal score • GWSAT: choose randomly whether to do GSAT flip or WSAT flip • GSAT/Tabu: prevent quick flipping back • HSAT: use history for tie breaking: choose least recently flipped variable

  5. FPGAs • ASICs: application-specific integrated circuits • customer describes logic behavior in a hardware description language such as VHDL • vendor designs and produces integrated circuit with this behavior • Masked gate arrays • ASIC with transistors arranged in a grid-like manner • initially unconnected; mass produced • add final conductor layers for connecting components • FPGAs: field programmable gate arrays

  6. Current Line of FPGAs: Example • Xilinx XCV1000 • 4MBytes on-board RAM • max clock rate 300 MHz • max clock rate using on-board RAM 33MHz • 6144 CLBs (configurable logic blocks) • roughly 1M system gates • 1 Mbit of distributed RAM • each CLB is divided into 2 slices • thus 12,288 slices available

  7. Programming FPGAs • Massively parallel computer with random access memory • Instructions are compiled into hardware; no runtime stacks; no functions; no recursion… • In practice, hardware description languages like VHDL are used to program FPGAs • Newer development: Handel C

  8. NESL-like Syntax for Parallelism P gates for P depth of P x:=y+z g(P) = O(1) d(P) = O(1) Q; R g(P) = g(Q)+g(P) d(P) = g(Q)+g(R) {e(i) : i  S} g(P) = i(g(e(i))) d(P) = maxi(d(e(i)))

  9. Example Let S be an array of statically known size n, where n is a power of 2. macro SUM(S,n): if n = 1 then S[0] else SUM({ S[2i] + S[2i + 1] : i [0..n/2-1]}, n/2) g(SUM(S,n) = O(n) d(SUM(S,n) = O(log n)

  10. Previous GSAT/FPGA Work • Hamadi/Merceron: first non-software design of a local search algorithm; CP 97 • Yung/Seung/Lee/Leong: runtime reconfigurable version of Hamadi/Merceron work; first implementation; Conference on Field-programmable Logic and Applications, 1999

  11. Naïve Parallel GSAT (Ham/Merc) macro CHOOSEFLIP(f): max := -1; f := -1; for i = 1 to n do score := SUM({EVALj(V[V[i]/i] : j  [1…m]}); if score > max  (score = max  RANDOMBIT()) then max := score; f := i end end g(CHOOSEFLIP(f)) = O(n m) d(CHOOSEFLIP(f)) = n * (O(log m) + O(log n)) = O(n log m)

  12. Step 1: Naïve Random GSAT macro CHOOSEFLIP(f): max := -1; f := -1; MaxV := {0 : k  [1…n]}; for i = 1 to n do score := SUM({EVALj(V[V[i]/i] : j  [1…m]}); if score > max then max := score; MaxV := { 0 : k  [1…n]}[1/i] else if score = max then MaxV := MaxV[1/i] end end f := CHOOSE_ONE(MaxV) g and d is unchanged; d(CHOOSE_ONE) = O(log n), g = O(n)

  13. Step 2: Parallel Variable Scoring macro CHOOSEFLIP(f): Scores := { SUM( {EVALj(V[V[i]/i]) : j  [1…m]}) : i  [1…n]}; f := CHOOSE_MAX(Scores); d(CHOOSEFLIP(f)) = O(log m + log n) = O(log m) g(CHOOSEFLIP(f)) = O(m n2)

  14. Step 3: Relative Scoring • Selman/Levesque/Mitchell use a technique of relative scoring in their implementation. • First thorough analysis of relative scoring in Hoos’ Diplomarbeit • Idea: After every flip, update the score of those variables that are affected by the flip. • Since clauses are small, the number of affected variables is much smaller than the overall number of variables

  15. Some Notation • NCl[i] is the number of clauses that contain the variable i • MaxClauses = maxi NCl[i]; usually MaxClauses << m • MaxVariables = maxj (number of vars in clause j) • EVALjC(i) evaluates the j-th clause from the set of clauses that contain the variable i

  16. Relative Scoring macro CHOOSE_FLIP(f): NewS := { SUM({EVALjC(i)(V[V[i]/i]) : j  [1…NCl[i]}) : i [1…n] }; OldS := { SUM({EVALjC(i)(V) : j  [1…NCl[i]}) : i [1…n] }; Diff := { NewS[i] – OldS[i] : i [1…n]}; f := CHOOSE_MAX(Diff) g(CHOOSE_FLIP(f)) = O(MaxVars MaxClauses n) d(CHOOSE_FLIP(f)) = O(log MaxClauses + log MaxVars)

  17. Step 4: Pipelining procedure GenSAT(cnf, maxtries, maxflips) for i = 1 to maxtries do INITASSIGN(V); for j = 1 to maxflips do if V satisfies cnf then return V else f = CHOOSEFLIP(); V := V with variable f flipped end end end end

  18. Pipelining Outer Loop STAGE I STAGE II STAGE III STAGE IV macro CHOOSE_FLIP(f): NewS := { SUM({EVALjC(i)(V[V[i]/i]) : j  [1…NCl[i]}) : i [1…n] }; OldS := { SUM({EVALjC(i)(V) : j  [1…NCl[i]}) : i [1…n] }; Diff := { NewS[i] – OldS[i] : i [1…n]}; f := CHOOSE_MAX(Diff) Try 1 S I S II S III S IV S I S II S III S IV S I S II … Try 2 S I S II S III S IV S I S II S III S IV S I … Try 3 S I S II S III S IV S I S II S III S IV … Try 4 S I S II S III S IV S I S II S III …

  19. Preliminary Experiments • Conducted on hill-climbing variant of GSAT; • Comparing software implementation by Selman/Kautz with Hamadi/Merceron and Step 4 • Software: running on Pentium II at 400MHz • FPGA: running on Xilinx XCV 1000 at 20MHz; programmed using Handel C by Celoxica

  20. Flips per Second

  21. Flips per Slice Second

  22. Conclusions • Fastest known one-chip implementation of GSAT • using parallel relative scoring plus pipelining • current size and speed makes it feasible to use FPGAs as platforms for parallel algorithms • FPGA are one-chip parallel machines with serious limitations of programmability • higher-level languages needed • stack support needed: towards compiling parallel languages to hardware

More Related