1 / 28

Bitwidth-Aware High-Level Synthesis for Designing Low-Power DSP Applications

Bitwidth-Aware High-Level Synthesis for Designing Low-Power DSP Applications. G. Lhairech-Lebreton , P. Coussy, D. Heller, E. Martin University Bretagne-Sud / Lab-STICC ghizlane.lhairech@univ-ubs.fr December 14, 2010. Outline. Introduction The proposed approach Experimental results

zariel
Download Presentation

Bitwidth-Aware High-Level Synthesis for Designing Low-Power DSP Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bitwidth-Aware High-Level Synthesis for Designing Low-Power DSP Applications G. Lhairech-Lebreton, P. Coussy, D. Heller, E. Martin University Bretagne-Sud / Lab-STICC ghizlane.lhairech@univ-ubs.fr December 14, 2010

  2. Outline • Introduction • The proposed approach • Experimental results • Conclusion & perspectives

  3. Context Golden specification (floating point) Signal to Noise Ratio MATLAB, C, … Refinement methodologies Refinement Finite precision specification C, SystemC, Algorithmic C, … Automatic design Manual design High-Level Synthesis VHDL, VERILOG, … Uniform or bit-accurate datapath

  4. Allocation Scheduling Operator binding Register binding High-Level Synthesis Finite precision specification • Translates the specification into an internal format Compilation Internal format • Defines the number of operators Constraints • Defines the execution start time for each operation • Chooses the operator to execute the operation operator library • Defines the number of registers to store the data Uniform RTL Architecture

  5. Bitwidth-Aware High-Level Synthesis Finite precision specification • Keep the bitwidth information in the internal format Compilation Bit-accurate Internal format Internal format • Group the operation into clusters (define new types) Clustering Allocation Constraints Scheduling Operator binding operator library Register binding • Resize the datapath Resizing Bit-accurate Architecture

  6. Bitwidth-Aware High-Level Synthesis Finite precision specification Compilation • Related works propose HLS flows that either • consider bitwidth only in particular synthesis steps to design area optimized architectures or • consider uniform bit-width specification to design low-power architecture Bit-accurate Internal format Internal format Clustering Allocation Constraints Scheduling Operator binding operator library Register binding Resizing Bit-accurate Architecture

  7. The proposed approach • A high-level synthesis flow that inputs bit-accurate specification and generates a bit-accurate architecture. • Our approach optimizes both area and power consumption of hardware architectures • By taking into account the bitwidth during all the HLS steps • By reducing the number of multiplexers

  8. The proposed design flow C/C++ Specification Compilation Compilation Bit accurate DFG DFG Bit-width and range propagation Library characterization Library characterization Constraints • Clock • Interval Iteration Bit-accurate operator library operator library Allocation Scheduling Operator binding Register binding VHDL RTL Architecture Bitwidth-aware

  9. The proposed design flow C/C++ Specification Compilation Compilation Bit accurate DFG DFG Bit-width and range propagation Library characterization Constraints • Clock • Interval Iteration Clustering Bit-accurate operator library operator library Allocation Scheduling Scheduling Operator binding Operator binding Register binding Register binding Resizing VHDL RTL Architecture Bitwidth-aware

  10. Clustering Allocation Scheduling Operator binding op1 op0 op3 op4 op7 op6 op9 Register binding 5 2 3 2 2 5 3 6 18 16 3 5 5 16 * * * + * + * Resizing 2 5 2 3 5 3 6 18 ADD_1 op9 op3 * op0 op6 * + * MUL_2 2 5 5 16 16 3 MUL_3 16 op4 op1 op7 * * + ADD_2 16 16 16 Bitwidth-aware clustering • A cluster is a group of operations implemented by the same type of operator • The operations are grouped according to their propagation time Propagation time (N° cycles)

  11. Operations are scheduled using a modified list scheduling Priority is the operation mobility combined with the Bitwidth Clustering Allocation Scheduling Operator binding Register binding Resizing Bitwidth-aware scheduling

  12. Operations are scheduled using a modified list scheduling Priority is the operation mobility only Clustering Allocation Scheduling Mobility Operator binding * op3 op6 op9 op1 op4 Register binding + op0 op7 Resizing Bitwidth-aware scheduling Total number of bits for the multipliers 52 2 5 2 5 3 6 18 3 16 5 3 3 5 2 6 18 5 3 2 16 + * * * op9 op0 op3 op6 * + * * op0 op3 op6 Step 1 16 2 5 16 5 16 3 Op7 op1 op9 op4 Step 2 op1 op4 op7 * * + 16 16 16

  13. Operations are scheduled using a modified list scheduling Priority is the operation mobility combined with bitwidth Clustering Allocation Scheduling Mobility Operator binding * op3 op6 op9 op1 op4 Register binding + op0 op7 Resizing Bitwidth-aware scheduling Gain of 14 bits Total number of bits for the multipliers 38 2 5 2 5 3 6 18 3 16 3 3 5 2 5 6 18 3 2 + * * * op9 op0 op3 op6 * + * * op0 op3 op6 Step 1 16 2 5 16 5 16 3 Op7 op4 op1 op9 Step 2 op1 op4 op7 * * + 16 16 16

  14. Operations are bound on available resources by using a Maximum Weighted Bipartite Matching Weights combines bitwidth and number of multiplexers Clustering Allocation Scheduling Operator binding MUL1 + op2 op9 Register binding * Step 0 Resizing Step 1 op0 op3 op6 + * * MUL2 op1 * Step 2 op9 op1 op4 op7 * * * + op4 * Step 3 op8 + Bitwidth-aware operator binding ADD1 MUL3

  15. Operations are bound on available resources by using a Maximum Weighted Bipartite Matching Weights combines bitwidth and number of multiplexers Clustering Allocation Scheduling Operator binding MUL1 + op2 op9 Register binding * Step 0 Resizing Step 1 op0 op3 op6 + * * MUL2 op1 * 2 Step 2 op9 op1 op4 op7 * * * + op4 * Step 3 op8 + Bitwidth-aware operator binding ADD1 * MUL3

  16. Operations are bound on available resources by using a Maximum Weighted Bipartite Matching Weights combines bitwidth and number of multiplexers Clustering Allocation Scheduling Operator binding MUL1 + op2 op9 Register binding * Step 0 Resizing Step 1 op0 op3 op6 + * * MUL2 op1 * 2 Step 2 op9 op1 op4 op7 * * * + op4 * 1,5 Step 3 op8 + Bitwidth-aware operator binding ADD1 * MUL3 +

  17. Operations are bound on available resources by using a Maximum Weighted Bipartite Matching Weights combines bitwidth and number of multiplexers Clustering Allocation Scheduling Operator binding MUL1 + op2 op9 Register binding * Step 0 Resizing Step 1 op0 op3 op6 + * * MUL2 op1 * 2 Step 2 op9 op1 op4 op7 * * * + op4 * 1,5 Step 3 op8 + Bitwidth-aware operator binding ADD1 0 0 0 2 0 0 0 MUL3 0

  18. Operations are bound on available resources by using a Maximum Weighted Bipartite Matching Weights combines bitwidth and number of multiplexers Clustering Allocation Scheduling Operator binding + op2 op9 Register binding * Step 0 Resizing Step 1 op0 op3 op6 + * * Step 2 op9 op1 op4 op7 * * * + op4 * 1,5 Step 3 op8 + Bitwidth-aware operator binding ADD1 0 0 MUL2 MUL3 0

  19. Operations are bound on available resources by using a Maximum Weighted Bipartite Matching Weights combines bitwidth and number of multiplexers Clustering Allocation Scheduling Operator binding + op2 op9 Register binding * Step 0 Resizing Step 1 op0 op3 op6 + * * Step 2 op9 op1 op4 op7 * * * + Step 3 op8 + Bitwidth-aware operator binding ADD1 0 MUL3

  20. Variables are bound on allocated register by using a Maximum Weighted Bipartite Matching Weights combines bitwidth and number of multiplexers Clustering Allocation Scheduling Operator binding Register binding Resizing Bitwidth-aware register binding

  21. This step computes the number of bits needed for each operator input The maximum number of bits for each operand Clustering Allocation Scheduling Operator binding Register binding 8 4 4 8 * * Resizing op1 op1 9 9 3 3 * * op2 op2 8 4 9 9 e1 e1 e2 e2 MUL MUL Operator size determination By using commutability MAX BEST Gain : 4 bits

  22. Experiments (3) Resizing + MBWM ops+reg (1) Only Resizing (2) Resizing + LEA Our approach Full bit-aware Clustering Allocation Scheduling Operator binding Register binding Resizing

  23. Experimental setup • Environment • High Level Synthesis : Our tool (GAUT) • Logic synthesis : ISE 10.1 • Target device : FPGA Xilinx Virtex5 xv5110T • Applications • Low-pass video line filter • 1st order, non-linear, Volterra • 16-tap Finite Impulse Response Filter • 16-tap Least Mean Square • 32-points Fast Fourier Transform • 32*32 Discrete Cosine Transform For each application several bitwidths (from 8 to 32) have been considered for input data

  24. Experimental results : Power

  25. Experimental results : Area

  26. Experimental results : trade-off

  27. Conclusion & perspectives • Our approach is based on bit-width aware clustering, scheduling, binding and sizing steps • Both bitwidth and multiplexers are optimized • Both area and power consumption are reduced (resp. 30% and 25% in average • Up to 64% power reduction for a same bit-width • Our approach generates optimized accelerators which will be used in a complete low-power framework

  28. Bitwidth-Aware High-Level Synthesis for Designing Low-Power DSP Applications G. Lhairech-Lebreton, P. Coussy, D. Heller, E. Martin University Bretagne-Sud / Lab-STICC ghizlane.lhairech@univ-ubs.fr December 14, 2010

More Related