220 likes | 342 Views
One Flip per Clock Cycle. Martin Henz, Edgar Tan, Roland Yap. SAT Problems. Find an assignment of n variables that satisfies all m clauses (disjunctions of literals of variables) Notation: V: array of boolean values; V[3] is the value of the third variable in assignment V
E N D
One Flip per Clock Cycle Martin Henz, Edgar Tan, Roland Yap
SAT Problems Find an assignment of n variables that satisfies all m clauses (disjunctions of literals of variables) Notation: V: array of boolean values; V[3] is the value of the third variable in assignment V EVALi(V): evaluation function of clause i, returns boolean value resulting from evaluating clause i under assignment V
GenSAT procedure GenSAT(cnf, maxtries, maxflips) for i = 1 to maxtries do INITASSIGN(V); for j = 1 to maxflips do if V satisfies cnf then return V else f = CHOOSEFLIP(); V := V with variable f flipped end end end end
Instances of GenSAT • GSAT: CHOOSEFLIP randomly chooses a flip that produces maximal score • WSAT: CHOOSEFLIP randomly chooses a violated clause, and randomly chooses among the variables of that clause a flip that produces maximal score • GWSAT: choose randomly whether to do GSAT flip or WSAT flip • GSAT/Tabu: prevent quick flipping back • HSAT: use history for tie breaking: choose least recently flipped variable
FPGAs • ASICs: application-specific integrated circuits • customer describes logic behavior in a hardware description language such as VHDL • vendor designs and produces integrated circuit with this behavior • Masked gate arrays • ASIC with transistors arranged in a grid-like manner • initially unconnected; mass produced • add final conductor layers for connecting components • FPGAs: field programmable gate arrays
Current Line of FPGAs: Example • Xilinx XCV1000 • 4MBytes on-board RAM • max clock rate 300 MHz • max clock rate using on-board RAM 33MHz • 6144 CLBs (configurable logic blocks) • roughly 1M system gates • 1 Mbit of distributed RAM • each CLB is divided into 2 slices • thus 12,288 slices available
Programming FPGAs • Massively parallel computer with random access memory • Instructions are compiled into hardware; no runtime stacks; no functions; no recursion… • In practice, hardware description languages like VHDL are used to program FPGAs • Newer development: Handel C
NESL-like Syntax for Parallelism P gates for P depth of P x:=y+z g(P) = O(1) d(P) = O(1) Q; R g(P) = g(Q)+g(P) d(P) = g(Q)+g(R) {e(i) : i S} g(P) = i(g(e(i))) d(P) = maxi(d(e(i)))
Example Let S be an array of statically known size n, where n is a power of 2. macro SUM(S,n): if n = 1 then S[0] else SUM({ S[2i] + S[2i + 1] : i [0..n/2-1]}, n/2) g(SUM(S,n) = O(n) d(SUM(S,n) = O(log n)
Previous GSAT/FPGA Work • Hamadi/Merceron: first non-software design of a local search algorithm; CP 97 • Yung/Seung/Lee/Leong: runtime reconfigurable version of Hamadi/Merceron work; first implementation; Conference on Field-programmable Logic and Applications, 1999
Naïve Parallel GSAT (Ham/Merc) macro CHOOSEFLIP(f): max := -1; f := -1; for i = 1 to n do score := SUM({EVALj(V[V[i]/i] : j [1…m]}); if score > max (score = max RANDOMBIT()) then max := score; f := i end end g(CHOOSEFLIP(f)) = O(n m) d(CHOOSEFLIP(f)) = n * (O(log m) + O(log n)) = O(n log m)
Step 1: Naïve Random GSAT macro CHOOSEFLIP(f): max := -1; f := -1; MaxV := {0 : k [1…n]}; for i = 1 to n do score := SUM({EVALj(V[V[i]/i] : j [1…m]}); if score > max then max := score; MaxV := { 0 : k [1…n]}[1/i] else if score = max then MaxV := MaxV[1/i] end end f := CHOOSE_ONE(MaxV) g and d is unchanged; d(CHOOSE_ONE) = O(log n), g = O(n)
Step 2: Parallel Variable Scoring macro CHOOSEFLIP(f): Scores := { SUM( {EVALj(V[V[i]/i]) : j [1…m]}) : i [1…n]}; f := CHOOSE_MAX(Scores); d(CHOOSEFLIP(f)) = O(log m + log n) = O(log m) g(CHOOSEFLIP(f)) = O(m n2)
Step 3: Relative Scoring • Selman/Levesque/Mitchell use a technique of relative scoring in their implementation. • First thorough analysis of relative scoring in Hoos’ Diplomarbeit • Idea: After every flip, update the score of those variables that are affected by the flip. • Since clauses are small, the number of affected variables is much smaller than the overall number of variables
Some Notation • NCl[i] is the number of clauses that contain the variable i • MaxClauses = maxi NCl[i]; usually MaxClauses << m • MaxVariables = maxj (number of vars in clause j) • EVALjC(i) evaluates the j-th clause from the set of clauses that contain the variable i
Relative Scoring macro CHOOSE_FLIP(f): NewS := { SUM({EVALjC(i)(V[V[i]/i]) : j [1…NCl[i]}) : i [1…n] }; OldS := { SUM({EVALjC(i)(V) : j [1…NCl[i]}) : i [1…n] }; Diff := { NewS[i] – OldS[i] : i [1…n]}; f := CHOOSE_MAX(Diff) g(CHOOSE_FLIP(f)) = O(MaxVars MaxClauses n) d(CHOOSE_FLIP(f)) = O(log MaxClauses + log MaxVars)
Step 4: Pipelining procedure GenSAT(cnf, maxtries, maxflips) for i = 1 to maxtries do INITASSIGN(V); for j = 1 to maxflips do if V satisfies cnf then return V else f = CHOOSEFLIP(); V := V with variable f flipped end end end end
Pipelining Outer Loop STAGE I STAGE II STAGE III STAGE IV macro CHOOSE_FLIP(f): NewS := { SUM({EVALjC(i)(V[V[i]/i]) : j [1…NCl[i]}) : i [1…n] }; OldS := { SUM({EVALjC(i)(V) : j [1…NCl[i]}) : i [1…n] }; Diff := { NewS[i] – OldS[i] : i [1…n]}; f := CHOOSE_MAX(Diff) Try 1 S I S II S III S IV S I S II S III S IV S I S II … Try 2 S I S II S III S IV S I S II S III S IV S I … Try 3 S I S II S III S IV S I S II S III S IV … Try 4 S I S II S III S IV S I S II S III …
Preliminary Experiments • Conducted on hill-climbing variant of GSAT; • Comparing software implementation by Selman/Kautz with Hamadi/Merceron and Step 4 • Software: running on Pentium II at 400MHz • FPGA: running on Xilinx XCV 1000 at 20MHz; programmed using Handel C by Celoxica
Conclusions • Fastest known one-chip implementation of GSAT • using parallel relative scoring plus pipelining • current size and speed makes it feasible to use FPGAs as platforms for parallel algorithms • FPGA are one-chip parallel machines with serious limitations of programmability • higher-level languages needed • stack support needed: towards compiling parallel languages to hardware