1 / 27

Philip Brisk 2 Paolo Ienne 2

Improving Synthesis of Compressor Trees on FPGAs via Integer Linear Programming. Hadi Parandeh-Afshar 1,2. Philip Brisk 2 Paolo Ienne 2. 1: University of Tehran, ECE Department 2: EPFL, School of Computer and Communication Sciences. Outline. Motivation Generalized Parallel Counters

nichellel
Download Presentation

Philip Brisk 2 Paolo Ienne 2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Improving Synthesis of Compressor Trees on FPGAs via Integer Linear Programming Hadi Parandeh-Afshar1,2 Philip Brisk2 Paolo Ienne2 1: University of Tehran, ECE Department 2: EPFL, School of Computer and Communication Sciences

  2. Outline • Motivation • Generalized Parallel Counters • ILP Formulation • Experimental Results • Conclusion

  3. Outline • Motivation • Generalized Parallel Counters • ILP Formulation • Experimental Results • Conclusion

  4. Motivation: Why multi-input addition is important? • Partial product reduction in parallel multiplication • Wallace and Dadda in the 1960s • Multi-input addition occurs in many multimedia and signal processing • H.264/AVC Variable Block Size Motion Estimation • FIR Filters • 3G Wireless Base Station Channel Cards • Flow graph transformations expose opportunities to use compresor trees in high-level synthesis [Verma and Ienne, ICCAD 2004]

  5. Multi Input Addition Implementation • ASIC • Compressor Trees + Final Adder • Counters are the basic blocks • Wallace/Dadda/3-Greedy • FPGA • Adder Trees • Full Adder Implemented in CLB Structure • Fast Carry-Chain (Xilinx and Altera) • Reduces Routing Delay • Compressor Trees have poor performance • Fast carry chains can not be used • Counters are inflexible • GOAL: Better implementation of compressor trees on FPGAs

  6. Outline • Motivation • Generalized Parallel Counters • ILP Formulation • Experimental Results • Conclusion

  7. (3; 2) Counter (3, 3; 4) GPC Generalized Parallel Counters (GPCs) • Parallel Counter: Sum bits with the same rank • Generalized Parallel Counter: Sum bits having different ranks • Example • GPCs are more flexible and reduce the number of logic levels • GPCs are more complex, but the additional complexity is absorbed in LUTs! • GPCs are perfect building blocks to create better compressors out of FPGA LUTs

  8. GPC Implementation K K GPC K-LUT K-LUT K-LUT N N

  9. 0 2 1 3 Rank Goal • How to best select GPC types and connect them to build a compressor tree

  10. 0 2 1 3 Rank Goal • How to best select GPC types and connect them to build a compressor tree

  11. 0 2 1 3 Rank Goal • How to best select GPC types and connect them to build a compressor tree

  12. 0 2 1 3 Rank Goal • How to best select GPC types and connect them to build a compressor tree

  13. 0 2 1 3 Rank Goal • How to best select GPC types and connect them to build a compressor tree

  14. Outline • Motivation • Generalized Parallel Counters • ILP Formulation • Experimental Results • Conclusion

  15. ki = 1 ki = 0 GPC kj = 1 kj = 2 kj = 0 ILP Formulation • Objective Function • Minimizing Levels of GPCs • GPC Representation in ILP

  16. ILP Formulation • Variables • pm,i,ki {0, 1} – True if there is a connection between the m-thinput bit and an input of rank kiof GPCi. m2 m1 m0 m3 p2,1,0 p1,0,1 p0,0,0 GPC1 GPC0 D3,3 e1,2,0,1 e0,2,1,0 GPC2 q1,2,2 q0,0,0 q2,1,1 n3 n2 n1 n0

  17. m3 p2,1,0 p1,0,1 p0,0,0 GPC1 GPC0 e1,2,0,1 e0,2,1,0 GPC2 q1,2,2 q0,0,0 q2,1,1 n3 n2 n1 n0 ILP Formulation • Variables • qi,ki,m{0, 1} – True if there is a connection between the ki-thoutput of GPCi and an output bit of rank m. m2 m1 m0 D3,3

  18. m3 p2,1,0 p1,0,1 p0,0,0 GPC1 GPC0 e1,2,0,1 e0,2,1,0 GPC2 q1,2,2 q0,0,0 q2,1,1 n3 n2 n1 n0 ILP Formulation • Variables • ei,j,ki,kj{0, 1} – True if there is a connection from the ki-thoutput of GPCi and an input of rank kj of GPCj. m2 m1 m0 D3,3

  19. m3 p2,1,0 p1,0,1 p0,0,0 GPC1 GPC0 e1,2,0,1 e0,2,1,0 GPC2 q1,2,2 q0,0,0 q2,1,1 n3 n2 n1 n0 ILP Formulation • Variables • Di,j{0, 1} – True if there is a direct connection from the ith input bit and an output bit of rank j. m2 m1 m0 D3,3

  20. ILP Formulation • Connection rules • Circuit I/Os • Each circuit input should be connected to either a GPC or the final adder • Each output rank should be derived k-times (K=3, final adder is a ternary adder) • GPC I/Os • Satisfying number of allowable I/Os considering input ranks • Wires • Satisfying rank constraints of source and destination of each wire

  21. ILP Formulation • ILP Improvement • Using [Parandeh-Afshar et. al, APSDAC 2008] heuristic for estimating maximum number of GPCs at each Level • GPC on level L can only connect to inputs of GPCs on levels L+1 and L+2

  22. Outline • Motivation • Generalized Parallel Counters • ILP Formulation • Experimental Results • Conclusion

  23. Experimental Methodology • CPLEX ILP Solver • Altera Stratix-II • 90nm CMOS Technology • Implementations of multi-input addition • Adder Tree – Ternary adder tree • State of the art for FPGAs • Heuristic – Mapping heuristic described in [13] • ILP – ILP formulation described here

  24. Experimental results (Delay) ILP on average is: 32% faster than Adder Tree 5% faster than the Heuristic

  25. Experimental Results (Area) ILP on average consumes: 3% less resources than Adder Tree 13% less resources than Heuristic

  26. Outline • Motivation • Generalized Parallel Counters • ILP Formulation • Experimental Results • Conclusion

  27. Conclusion • Conventional wisdom has held that adder trees outperform compressor trees on FPGAs • Ternary adder trees were a major selling point of the Altera Stratix II architecture • Conventional wisdom is wrong! • GPCs map nicely onto LUTs • Compressor trees on FPGAs, are faster than adder trees when built from GPCs

More Related