1 / 35

An Interconnect-Centric Approach to Cyclic Shifter Design

This research paper explores an interconnect-centric approach to cyclic shifter design, optimizing for delay, power, and reliability. It discusses different shifter topologies and presents a methodology for optimal permutation scheme using ILP formulations.

bryannelson
Download Presentation

An Interconnect-Centric Approach to Cyclic Shifter Design

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Interconnect-Centric Approach to Cyclic Shifter Design Haikun Zhu, Yi Zhu C.-K. Cheng Harvey Mudd College. David M. Harris Harvey Mudd College.

  2. Outline • Motivation • Previous Work • Approaches • Fanout-Splitting • Cell order optimization by ILP • Conclusions

  3. Motivation • Interconnect dominates gate in present process technology • Delay, power, reliability, process variation, etc. • Conventional datapath design focuses on logic depth minimization Source: ITRS roadmap 2005

  4. Technology Trends • Device (ITRS roadmap 2005, Table 40a) • Updated Berkeley Predictive Interconnect Model

  5. Shifter Taxonomy • Functionality • Logical Shift: MSBs stuffed with 0’s • Arithmetic Shift: Extend original MSB • Cyclic Shift (rotation) • Bidirectional Shift • Circuit Topology • Barrel Shifter • Logarithmic Shifter

  6. Barrel Shifter Schematic layout • Pros • Every data signal pass only one transmission gate • Cons • Input capacitance is • # transistors = • Requires additional decoder for control signals

  7. Logarithmic Shifter layout Schematic • Pros • # transistors = • Cons • Long inter-stage wires, especially for cyclic shifters Target of Optimization

  8. Cyclic Shifter -- Applications • Finite Field Arithmetic • In normal basis, squaring is done by cyclic shifting. • Encryption • ShiftRows operation in Rijndael algorithm. • DCT processing unit • Address generator • Bidirectional shifting • Can be implemented as a cyclic shifter with additional masking logic • CORDIC algorithm • etc …

  9. Previous Work • Bit interleave • Two dimensional folding strategy • Gate duplicating • Ternary shifting • Comparison between barrel shifter and log shifter

  10. Timing • Shifting & non-shifting paths are intertwined together. • Wire load on the critical path is Power • When configured to pass through, the non-shifting paths have to be switched as well. Cyclic Shifter – Traditional Design • MUX-based

  11. Timing • Shifting & non-shifting paths are now decoupled. • Wire load on the longest path is Power • When shifting, non-shifting paths are at rest. Fanout Splitting Shifter • Use DEMUXes instead of MUXes

  12. Example Right rotate 5 bits Red lines are signal lines Green lines are quiet lines

  13. Dynamic Power Consumption • Dynamic Power • Switching Probability MUX based design DEMUX based design SP= 3/16 SP= 1/4 SP = Switching Probability

  14. Gate Complexity • Re-factoring design • No extra complexity at gate level, both are DEMUX-based MUX-based

  15. Duality NAND gates network NOR gates network • Duality provides flexibility for low level implementation • NAND gates are good for static CMOS. • NOR gates are good for dynamic circuits.

  16. fixed fixed Cell Permutation • Datapath usually assumes bit-slice structure • The cell order of the input/output stages must be fixed • However, the cells in the intermediate stages are free to permute. free

  17. Problem Statement • Given • A N-bit rotator • Fixed linear order of the input/output stages • Find • An optimal permutation scheme of the intermediate stages such that the longest path is minimized (or, the total wire length s.t. delay constraint).

  18. ILP Formulation • Introduce a set of binary decision variables • if and only if logic cell is at physical location on level • The solution space is fully defined by constraints

  19. ILP Formulation (cont’) • Minimum delay formulation • Minimum power formulation Which can be expanded into objective

  20. ILP Formulation (cont’) • Represent the length of a single wire segment • Formulating absolute operation Psuedo-linear constraints discarded because we’re trying to minimize

  21. Complexity • Minimum total wire length formulation • The case of one level of free cells is a minimum weight bipartite matching problem • For the case of two or more levels of free cells, optimal polynomial algorithm is unknown • Hardness of minimum delay formulation un-established. Logic index Physical location

  22. ILP Complexity • The ILP formulation does not scale well • Both #integer variables and #constraints are • CPLEX uses on branch & bound – exponential growth • Sliding window scheme • Only cells in the window are allowed to permute • Consists of multiple passes; terminate when there is no improvement between passes WW = 8 columnsWH = 3 rowsHS = 4 columnsVS = 1 column

  23. Power & Delay Evaluation • Overall Flow

  24. Optimal solution for 8-bit case A global optimal solution

  25. 16-bit and 32-bit cases 16-bit, global optimal solution 32-bit, suboptimal solution by sliding window method

  26. Analysis Results

  27. Power-Delay Tradeoff 8-bit 16-bit • Given Tmax constraint, optimize Ttotal 8-bit & 16-bit are global optimum by cplex 32-bit 64-bit 32-bit & 64-bit are suboptimal result by sliding window scheme

  28. Implementation Results • Implementation methodology • Standard cell based design using relative placement (by synopsys Physical Compiler) • Routing and timing analysis in synopsys Astro • Power estimation by synopsys PrimePower • Control signals manually buffered following FO4 rule • TSMC 90nm technology • Three types of designs investigated • Mux shifter using NAND2X2 gates • Mux shifter using MX2X2 gates • Demux shifter using NAND2X2 gates

  29. Implementation Results – Cont’ • 64-bit results • Most improvement comes from the interconnect * For mux shifters, this is d[0]->z[0] while for demux shifter it is d[0]->z[1]

  30. Outline • Motivation • Previous Work • Approaches • Fanout-Splitting • Cell order optimization by ILP • Conclusions and future work

  31. 1 27 29 5 25 15 23 7 21 31 19 9 17 23 11 13 3 21 25 11 13 9 7 5 3 15 1 31 19 29 27 17 20 10 18 16 24 14 8 12 4 2 0 30 28 22 6 24 10 28 2 20 6 18 16 26 14 22 4 12 0 26 30 8 Conclusions & Future Work • We have proposed • Fanout-splitting design • ILP based layout optimization • Future directions • Extend the fanout splitting idea and ILP formulation to ternary shifter • Try alternative hierarchical approach to tackle the ILP complexity issue

  32. The End Thank you!

  33. Derivation of switching probability Truth table Gate level implementation ofMUX and DEMUX For MUX For DEMUX

  34. Evaluating interconnect effect • Based on logical effort model • Technology independent • Easy to incorporate interconnect effect Gate delay Logical effort, only depends on gate type Electrical effort, depends on load cap Parasitic delay, only depends on gate type Wire load is integrated into h Electrical effort contributed by wire per column spanned Wire length normalized to cell width

  35. Deciding • Evaluate for a set of technology nodes • It is safe to assume hw=1

More Related