1 / 45

Instance-Specific Accelerators for Minimum Covering

Instance-Specific Accelerators for Minimum Covering. By: Desmond Correia, Mathew Sonke University of Guelph: School Of Engineering. Outline. Background Information What is Instance Specific Hardware The Problem Solving the problem Hardware Approach Accelerator Architectures

Download Presentation

Instance-Specific Accelerators for Minimum Covering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Instance-Specific Accelerators for Minimum Covering By: Desmond Correia, Mathew Sonke University of Guelph: School Of Engineering

  2. Outline • Background Information • What is Instance Specific Hardware • The Problem • Solving the problem • Hardware Approach • Accelerator Architectures • State Machines • Adapting to other problems • Experimentation • Simulation • Implementation • Discussion • Conclusion

  3. Background Information

  4. What is Instance Specific Hardware • Hardware generates on the fly • Optimized for Algorithm • Optimized for Input Data • Formally • Generates circuit on the fly that depend on the problem instances rather than the problem • Useful when there is: • Need for fine-grained operators • Lots of parallelism • Long software run time

  5. Instance Specific Hardware • Shaded blocks denote steps that are part of the accelerator’s runtime • Dynamically Compiles • Dynamically Configures • New problem = New hardware

  6. What’s The Problem? • Boolean Satisfiability Problem (SAT) • Given a Boolean formula, find a variable assignment that equate it to 1 • F = (a + b)(a’ + b’ + c) = 1. One Solution: a = 1, c = 1 • Must be in Conjunctive Normal Form (CNF) • Minimum-Cost Covering Problem • Given a Universal Set: U = {1,2,3,4,5} • Given a set of subsets: S = { {1,2,3}, {2,4}, {3,4}, {4,5} } • Find smallest subset that contains all elements of U • T = { {1,2,3}, {4,5} }

  7. Minimum-Cost Covering Problem • Try to cover a,b,c,d,e • Best Cover • S2, S4, S5 • Cost = 0.2+0.1+0.2 = 0.5

  8. Why is the Problem Important? • Traveling Salesman problem • Shortest route to visit all the cities

  9. Additional Applications • Scheduling of Airline crews • 2 level - Logic Synthesis • Have a set of minterms F(A,B,C,E) = ∑ m(0,2,7,10,11) • What is the optimal circuit for this? • Becomes a covering problem in order to generate an optimal Boolean function • Placement and Routing in FPGA • Decide location of each block while trying to minimize total length of interconnection

  10. Matching Problem To Hardware • SAT problem Combinatorial problem • NP complete (Nondeterministic polynomial time) • Cannot be completed in polynomial time • Combinatorial problems exhibit • lots of parallelism • Often have very long runtimes • Requires fine-grained operators (XORing, ANDing, etc.). • Instance specific accelerators perfect for this

  11. The Goal • Paper targets discrete optimization problem • Concentrate on exact solvers for minimum-cost covering problem • Global optimum solution • Minimum cost covering problem regarded as minimum cost SAT problem • Find a satisfying solution for a CNF that minimizes a linear cost function over the variables. • Paper published in 2003 in The Journal of Supercomputing

  12. Solving The Problem • A = Matrix • V = current variable • B = current lowest cost solution • Iteratively reduce the matrix • Remove Essential Rows • Remove Dominating Columns • Remove Dominated Rows

  13. Solve The Problem • No more reductions • Need to Compute cost bound • cost(v) + cost(minimum number of rows required to cover remaining columns) • Branch if cost of v ≤ b AND rows exist • Select variable • Assign it 1 and 0 • Both cases matrix is modified and algorithm is called recursively

  14. Hardware Approach

  15. Accelerator Architectures • State Machines: SM1… SMn • Control variable values • Implement one search level of branch and bound algorithm • SM connected to immediate neighbour • Branching to next SM • Backtrack to pervious SM • Output of SM: Current variable values

  16. Accelerator Architectures • Checkers • Deduce information for partial variable assignment • Help us to figure out if to back track or continue • CNF Checkers • Don’t care Checker • Essential Checker • Dominated column Checker • All run in parallel

  17. Accelerator Architectures • Cost Counter • Computes cost of current partial assignment • Controller • Initializes search procedure • Stops search procedure • Compute the cost bound

  18. Backtracking with 3-valued Logic • Model to help with branching and backtracking • Three values: {0, 1, X} • X denoting unassigned variable • Allows for analyzing of partial assignment • Uses 3-valued logic to model • The Clause (a + b) (a’ + b’ + c) • The variable a, b, c • The CNF F = (a + b)(a’ + b’ + c)

  19. Backtracking with 3-valued Logic. How it works? • All variable areinitially X • After value assignment CNF checker inspects results • CNF is 0: Backtrack i.e. NOT satisfiable (SAT) • CNF is 1: Valid cover found • If the cover is the least cost then save the variable assignment • Both cases: CNF=0 OR CNF=1 • Backtracking occurs to continue search on another path • Exploring of solution space

  20. Backtracking with 3-valued LogicHow it works? • CNF is X: Continue searching on current path • Depending on: Checkers and cost bound results • Continues search with different value • State machine changes its assignment • Continue search by branching • Trigger next State machine • Backtrack • trigger previous state machine

  21. CNF (Conjunctive Normal Form) Checker • Input vector: Current variable assignment • Clauses evaluated individually • (a + b) (a’ + b’ + c) • Results are ANDed together • Output: Single 3-valued logic signal • {1, 0, X }

  22. Reductions Techniques • Reduction Checkers • Don’t cares • Essential Columns • Dominated Columns • Outputs: 2-valued Boolean logic • Implemented in pure combinatorial logic • Derived from CNF at compile time Function of Current variable Assignment

  23. Don’t Care • Shares hardware with CNF Checker • CNF Checker computed 3-value logic • Only uses logic for {1, 0} • Variable set to ‘0’ indicates don’t care • Don’t care are derived from the clauses and covering matrix Shared CNF Checker

  24. Essential Columns Checker • Generates essential condition for each variable • To make V4 essential • Set V3 = 0 • Reason • Only way to cover e4 WHEN V3 = 0

  25. Dominated Column Checker • Variable corresponding to dominated column is set to ‘0’ • Module implements logic for each variable • Indicating the dominated condition • Evaluated when the state machine for the variable is activated • Only work on that column when covered by that variable • NOTE: Column is referring to a row in matrix presented before

  26. Cost Counter • Approach • Algorithm implements unit cost; every variable has a cost of 0 or 1 • A new cost bound must be computed after every single variable assignment • Implementation • n-bit parallel counter • Adder that sums up n single bit inputs • Leverages Fast Carry Chain routing • n input bits results in l=log2(n) levels • Time delay Tctr=(l (l+1)/2)*Tadder

  27. Cost Bound • Very simple implementation • Cost bound = current_best_cost – 1 • No estimation of cost by variables not yet searched

  28. State Machines • Linear array of identical State Machines • Connections • From Top and To Top • From Below and To Below • Set 0 (Don’t care or Dominated Column) • Set 1 (Essential) • CNF Flag (1 or X) • Cost Exceeded Flag

  29. State Machines Assign X If FT and not ST0 Assign 1 If CNF = X and not CEX TB If FB Assign 0 If CNF = X and not CEX TB If FB Assign X, TT (Backtrack) Else Backtrack Else Backtrack Else If FT and ST0 Assign 0 TB If FB Backtrack

  30. Adapting to Other Problems • Reduction • Encapsulated into checker modules • Cost Bound • Encapsulated into controller module • Cost Counter • Unit cost can be replaced with integer cost by replacing Cost Counter with Cost Adder module

  31. Experimentation

  32. Testbench • Problems contained in DIMACS CNF file format • Code Generation • Perl program generates VHDL for each problem • VHDL code templates used for generic parts • Augmented with generated code for instance specific parts • Tools • Synopsys FPGA Compiler II • Xilinx 4.1i backend

  33. Testbench • Problems • 16 small and 5 medium-sized problems from ESPRESSO-EXACT distribution • Problems have between 4 and 62 variables, and 4 to 70 clauses • Benchmarking • ESPRESSO-EXACT configured to output Cyclic Cores • Gives us the covering matrices just before first branch and after first round of reductions

  34. Simulation • Performed using Modelsim VHDL Simulator • Benchmark specifics: • Compares number of clauses, cost of optimal solution versus number of cycles • Raw Speedup Time Sraw= tsw/thw • Software run on a Sun Ultra10 440MHz workstation with 512 MB ram • Hardware assumes a clock rate of 25MHz

  35. Simulation

  36. Implementation • Platform: PC with PCI carrier board SMT320 • Accelerator: FPGA TIM SMT358 • Xilinx Virtex XCV1000-BG560-4 device with 12288 slices • Achieved clock rate of 30-50MHz • Generation Time • On the order of minutes • No optimizations or constraints specified

  37. Generation Time: AMD example • Code Generation: 4 s • Circuit Synthesis: 160 s • Place and Route: 360 s • Results Readback: Negligible • Area: 1072 slices • 8% of total FPGA area

  38. Checker Performance • Each reduction achieves speedup of one order of magnitude • CEDCESDCOL is 3600 times faster than CE with 80% increase in resources

  39. Discussion • Long synthesis times render hardware acceleration useless on small test problems • Meant for application on larger problems • Despite its rudimentary nature versus software algorithms, CEDCESDCOL offers high raw speedup

  40. Discussion • Max Size is difficult to predict • Breakdown: • n variables • Constant Modules = 210 slices • State Machines = n*13 slices • Cost Counter = n*0.5 slices • Controller = n*1.5 slices • Checker slices strongly depend on problem instance • Assuming checkers scale constantly with problem size, could accommodate 600 variables

  41. Discussion • Large Problem Implementations • 313 variables, 302 clauses • 58% resource utilization • Clock speed 14 MHz • No optimizations • Successfully implemented • 550 variables • Failed due to space constraints

  42. Conclusion

  43. Conclusion • Successes: • Practical for problems that take software solvers on the order of minutes • Raw speedups up to 5 orders of magnitude for small covering instances • Improvements: • Improved architecture required to compete with software performance • Reduced hardware compile times

  44. Feedback • Not clear about extra reduction techniques used in ESPRESSO-EXACT over the Hardware reduction techniques • How do they implement X in logic? • What really happens with X logic when don’t care block wants to reuse hardware? • No algorithm presented for Dominated Columns Checker and don’t care checker.

  45. References • Platzner, M., & De Micheli, G. (1998). Acceleration of satisfiability algorithms by reconfigurable hardware. In Field-Programmable Logic and Applications From FPGAs to Computing Paradigm (pp. 69-78). Springer Berlin Heidelberg. • Plessl, C., & Platzner, M. (2003). Instance-specific accelerators for minimum covering. The Journal of Supercomputing, 26(2), 109-129. • Platzner, M. (2000). Reconfigurable accelerators for combinatorial problems.Computer, 33(4), 58-60.

More Related