1 / 73

Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs

Explore algorithm efficiency, speed, and power criteria to optimize VLSI arithmetic designs. Learn about logical effort theory and energy-efficient adder architectures.

piland
Download Presentation

Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California Davis www.ece.ucdavis.edu/acsel Tutorial Presentation 16th International Symposium on Computer Arithmetic Santiago de Compostela, SPAIN June 18, 2003

  2. Issues to be addressed • How do we compare different topologies for their efficiency ? • How do we estimate speed and efficiency of our algorithm ? • What criteria's should we use when developing a new algorithm ? • How does power enter into this equation ? 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

  3. Additional Issues • Determine which topology is the best for given Power or Delay budget • Determine which topology can stretch the furthest in terms of speed or power 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

  4. Metric

  5. Previously used estimates Counting the number of gates (logic levels): not accurate 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

  6. Critical path in Motorola's 64-bit CLA 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

  7. Motorola's 64-bit CLAModified PG Block Intermediate propagate signals Pi:0 are generated to speed-up C3 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

  8. Fan-In and Fan-Out Dependency(Oklobdzija, Barnes: IBM 1985) 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

  9. Delay Comparison: Variable Block Adder(Oklobdzija, Barnes: IBM 1985) Delay Complexity 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

  10. Design Objective • Design takes time: • finding results afterward is not of much value • There is a disconnect between measures used by computer arithmetic when developing an algorithm and what is obtained after implementation • we want to estimate as close to the measured results • A simple tool that can evaluate different design trade-off for a given technology is needed • Power trade-off is the most important • speed and power are tradable 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

  11. Logical Effort Theory • “Back of the Envelope” complexity: good for estimating speed • Gate delay = linear function of load • Slope: logical effort  gate driving characteristics • Intersect: parasitic  gate internal load • “Logical Effort” accuracy is not sufficient • We needed to extend and refine the method • However, that becomes more than “Back of the Envelope” • Logical Effort does not account for possible power-delay trade-offs 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

  12. Logical Effort Theory • Excel –a platform of choice (ARITH-16) • Simple enough • Can provide computation quickly • Easy to enter a given design • Technology characterization is needed: • This needs to be done only once: available for every design afterwards • Domino gate = 2 stages of dynamic and static • Different driving characteristics of these stages • Multi-output gate (carry-look-ahead, Ling/conditional sum) • Energy model needs to be included 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

  13. Energy Motivation *courtesy of Intel Corp. Cache Processor thermal map Temp (oC) Execution core AGU 120oC AGUs: performance and peak-current limiters High activity  thermal hotspot Goal: high-performance energy-efficient design 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

  14. Kogge-Stone Adder PG 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 Carry-merge gates XOR Critical path = PG+5+XOR = 7 gate stages Generate,Propagate fanout of 2,3 Maximum interconnect spans 16b Energy inefficient 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

  15. Sparse-tree Adder Architecture Generate every 4th carry in parallel Side-path: 4-bit conditional sum generator 73% fewer carry-merge gatesenergy-efficient 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

  16. Kogge-Stone adder (8-stage) D = 8*(GBH)1/8*2.2 + 3.8*P 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

  17. MXA2 – Architecture & Result • Multiplexer-based • Generate carries using radix-2 (P,G) • 4-bit conditional sum selected by carries • 4-b cell width = 17m • 9-stage critical path • Per-stage effort = 3.7 • Total effort delay = 33.3 • Total parasitic = 22.5 • Total delay = 55.8 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

  18. HC2 – Architecture • Generate even carries using radix-2 (P,G) • Generate odd carries from even carries • CMOS adder for sum • 1-b cell width  4m • 10-stage critical path 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

  19. HC2 – Circuits & Results 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

  20. KS2 – Architecture & Results • Generate carries using radix-2 (P,G) • CMOS adder for sum • Similar circuits as HC2 • 1-b cell width  4m • 9-stage critical path 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

  21. KS4 – Architecture • Generate carries using redundant radix-4 (P,G) • Dynamic circuit • 1-b cell width  4m • 6-stage critical path 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

  22. KS4 – Circuits & Result 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

  23. P-Path G-Path (P,G,C) Network CLA4 – Architecture • Generate carries using radix-4 (P,G,C) • 1-b cell width  4m • 15-stage critical path 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

  24. CLA4 – Circuits & Result 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

  25. LNG4 – Architecture • Generate carries using Ling pseudo-carries • Conditional sums selected by local & long carries • 1-b cell width  5.1m; 9-stage critical path 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

  26. LNG4 – Circuits & Result 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

  27. Results from Simulation • Fairly consistent with logical effort analysis • Per-stage delay • 1.4 FO4 (static) • 0.8 FO4 (dynamic) 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

  28. Delay of Representative 64-b Adders 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

  29. What happened when Power is considered ? 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

  30. Energy-Delay Space Energy speed barrier Different Adders Emin power limit Dmin Delay 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

  31. Logical Effort 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

  32. Delay in a Logic Gate • Delay of a logic gate has two components • d = f + p • Logical effort describes relative ability of gate topology to • deliver current (defined to be 1 for an inverter) • Electrical effort is the ratio of output to input capacitance parasitic delay effort delay, stage effort electrical effort is also called “fanout” f = gh electrical effort = Cout/Cin logical effort *from Mathew Sanu / D. Harris 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

  33. Logical Effort Parameters: Inverter Delay g=2.2 (logic effort) d=gh+p p=3.8ps (parasitic delay) Fanout: h =Cin/Cout • d = gh + p • Delay increases linearly with fanout • More complex gates have greater g and p *from Mathew Sanu / D. Harris 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

  34. Normalized Logical Effort: Inverter *from Mathew Sanu / D. Harris 6 5 g = p = d = 1 4 inverter 1 Normalized delay: d 3 gh + p = h+1 effort delay 2 1 parasitic delay 1 2 3 4 5 Fanout: h = Cout/Cin • Define delay of unloaded inverter = 1 • Define logical effort ‘g’ of inverter = 1 • Delay of complex gates can be defined w.r.t d=1 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

  35. Computing Logical Effort DEF: Logical effort is the ratio of the input capacitance to the input capacitance of an inverter delivering the same output current • Measured from delay vs. fanout plots of simulated gates • Or estimated, counting capacitance in units of transistor W *from Mathew Sanu / D. Harris 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

  36. L.E for Adder Gates *from Mathew Sanu / D. Harris • Logical effort parameters obtained from simulation for std cells • Define logical effort ‘g’ of inverter = 1 • Delay of complex gates can be defined w.r.t d=1 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

  37. Normalized L.E • Logical effort & parasitic delay normalized to that of inverter *from Mathew Sanu 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

  38. Delay of a string of gates S S • Delay of a path, D = di = gihi + pi • gi & pi are constants • To minimize path delay, optimal values of hi are to be determined D is minimized when each stage bears the same effort, i.e. gihi = g i+1h i+1 *from Mathew Sanu / D. Harris 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

  39. bi B = f = gihi = F1/N Con-path + Coff-path b = Con-path Minimizing path delay  gi G = • Logical Effort of a string of gates: • Path Electrical Effort: • Branching Effort • Path Branching Effort: • Path Effort: F=GBH Cout(path)  = H = hi Cin(path) Delay is minimized when each stage bears the same effort: The minimum delay of an N-stage path is: NF1/N + P *from Mathew Sanu / D. Harris 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

  40. Inclusion of Wire Delayinto Logical Effort 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

  41. Wiring Load • Wiring in hand analysis • Only lumped capacitance included • Wiring in HSPICE • Short wire: 1-segment -model RC network • Long wire: 4-segment -model RC network • Using worst-case wire capacitance • Wire length • Estimated from most critical 1-bit pitch 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

  42. Modeling interconnect cap. • Include interconnect cap in branching factor Coff-path Coff-path PG CM0 PG CM0 Adder bitpitch Adder bitpitch Cint CM0 CM0 Con-path Con-path Con-path + Coff-path+Cint Con-path + Coff-path Cint = 2+ b = = 2 b = Con-path Con-path Con-path = 2 + I I : % int. cap to gate cap in 1 adder bitpitch 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

  43. Branching g0 g1 g2 g3 Logical Effort assumes the “branching” factor of this circuit to be 2. This is incorrect and can create inaccuracies 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

  44. Correction on Branching g0 g1 g2 g3 f0 = f1 , f2 = f3 Td1 = (f0 + f1 + parasitics)  Td2 = (f2 + f3 + parasitics)  Minimum Delay occurs when Td1 = Td2 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

  45. “Real” Branching Calculation Branching only equals 2 when: This explains why we had to resort to Excel ! 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

  46. Technology Characterization 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

  47. Characterization Setup • Logical Effort Requirements: • Equalize input and output transitions. • Logical Effort is characterized by varying the h (Cout/Cin) of a gate. By using a variable load of inverters each gate can be characterized over the same range of loads. • The Logical Effort of each gate is characterized for each input. • Energy is characterized for each output transition of the gate caused by each input transition. i.e. for an inverter: energy is measured for tLH and tHL 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

  48. tLH • tHL • Average • Energy In Gate Gate Gate Gate .. Variable Load LE Characterization Setup for Static Gates 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

  49. tHL • Energy In Gate Gate Variable Load LE Characterization Setup for Dynamic Gates 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

  50. LE Table (Static CMOS) • Technology: P/N Ratio = 2  INV = 3.67, pINV = 4.29 • Measured on worst-case single-input switching 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN

More Related