1 / 52

ECE260B – CSE241A Winter 2005 Design Styles Multi-Vdd/Vth Designs

ECE260B – CSE241A Winter 2005 Design Styles Multi-Vdd/Vth Designs. Website: http://vlsicad.ucsd.edu/courses/ece260b-w05. The Design Problem. Source: sematech97. A growing gap between design complexity and design productivity. Design Methodology.

bandele
Download Presentation

ECE260B – CSE241A Winter 2005 Design Styles Multi-Vdd/Vth Designs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ECE260B – CSE241AWinter 2005Design StylesMulti-Vdd/Vth Designs Website: http://vlsicad.ucsd.edu/courses/ece260b-w05

  2. The Design Problem Source: sematech97 A growing gap between design complexity and design productivity

  3. Design Methodology • Design process traverses iteratively between three abstractions: behavior, structure, and geometry • More and more automation for each of these steps

  4. Behavioral Description of Accumulator Design described as set of input-output relations, regardless of chosen implementation Data described at higher abstraction level (“integer”)

  5. Structural Description of Accumulator Design defined as composition of register and full-adder cells (“netlist”) Data represented as {0,1,Z} Time discretized and progresses with unit steps Description language: VHDL Other options: schematics, Verilog

  6. Implementation Methodologies

  7. Full Custom • Hand drawn geometry • All layers customized • Digital and analog • Simulation at transistor level • High density • High performance • Long design time Magic Layout Editor (UC Berkeley)

  8. Symbolic Layout • Dimensionless layout entities • Only topology is important • Final layout generated by “compaction” program Stick diagram of inverter

  9. Standard Cells • Organized in rows • Cells made as full custom by vendor (not user) • All layers customized • Digital with possible special analog cells • Simulation at gate level (digital) • Medium-high density • Medium-high performance • Reasonable design time Routing channel requirements are reduced by presence of more interconnect layers

  10. Standard Cell — Example [Brodersen92]

  11. Standard Cell - Example 3-input NAND cell (from Mississippi State Library) characterized for fanout of 4 and for three different technologies

  12. Automatic Cell Generation Random-logic layout generated by CLEO cell compiler (Digital)

  13. Module Generators — Compiled Datapath

  14. Macrocell-Based Design • Predefined macro blocks (uP, RAM, etc.) • Macro blocks made as full custom by vendor (IP blocks) • All layers customized • Digital and some analog • Simulation at behavior or gate level • High density • High performance • Short design time • Use standard on-chip busses • “System on a chip” (SOC) Macrocell Interconnect Bus Routing Channel

  15. SRAM SRAM Data paths Routing Channel Standard cells Video-encoder chip [Brodersen92] Macrocell Design Methodogoly Floorplan: Defines overall topology of design, relative placement of modules, and global routes of busses, supplies, and clocks

  16. Gate Array • Predefined transistors connected via metal • Two types: channel based, sea of gates • Only metal layers customized • Fixed array sizes • Digital cells in library • Simulation at gate level (digital) • Medium density • Medium performance • Reasonable design time

  17. Gate Array — Primitive Cells Committed Cell(4-input NOR) Uncommited Cell

  18. Sea-of-gates Random Logic Memory Subsystem LSI Logic LEA300K (0.6 mm CMOS)

  19. Prewired Arrays • Programmable logic blocks • Programmable connections between logic blocks • No layers customized (standard devices) • Digital only • Low-medium performance • Low-medium density • Programmable: SRAM, EPROM, Flash, Anti-fuse, etc. • Easy and quick design changes • Cheap design tools • Low development cost • High device cost • NOT a real ASIC Courtesy Altera Corp.

  20. Programmable Logic Devices PAL PLA PROM

  21. Field-Programmable Gate Arrays - Fuse-based Standard-cell like floorplan

  22. Interconnect Programming interconnect using anti-fuses

  23. Field-Programmable Gate Arrays - RAM-based

  24. RAM-based FPGA - Basic Cell (CLB) Courtesy of Xilinx

  25. RAM-based FPGA Xilinx XC4025

  26. High Performance Devices • Mixture of full custom, standard cells and macro’s • Full custom for special blocks: Adder (data path), etc. • Macro’s for standard blocks: RAM, ROM, etc. • Standard cells for non critical digital blocks

  27. Global Signaling and Layout • Global signaling and layout optimization • Multi-Vdd • Static power analysis • Multi-Vth + Vdd + sizing D. Sylvester, DAC-2001

  28. Global Signaling • Current global signaling paradigm  insert large static CMOS repeaters to reduce wire RC delay • Impending problems: • Too many repeaters • 180nm processors: 22K repeaters (Itanium), 70K (Power4) • Project 1-1.5M repeaters at 45-65nm technologies • Too much power • Many large repeaters = significant static and dynamic power • Too much noise • Repeater clustering complicates power distribution • Inductive coupling across wide bus structures D. Sylvester, DAC-2001

  29. GDSII Import Compact fixed width Cell Layout Optimization • Advanced layout techniques must allow • Continuous individual device sizing • Variable p/n ratios • Tapered FET stacking sizes • Arbitrary Vth assignments within gates • First cut: Cadabra  15-22% power reduction using 1st two approaches under fixed footprint constraint Optimize specific instances of standard gates Ref: Hurat, Cadabra D. Sylvester, DAC-2001

  30. Multi-Vdd • Global signaling and layout optimization • Multi-Vdd • Static power analysis • Multi-Vth + Vdd + sizing D. Sylvester, DAC-2001

  31. Multi-Vdd Status • Idea: Incorporate two Vdd’s to reduce dynamic power • Limited to a few recent Japanese multimedia processors • Example – 0.3 mm, 75MHz, 3.3V media processor (Toshiba) • Total power savings of 47% in logic, 69% in clock • Dynamic voltage scaling of mobile processors • Transmeta Crusoe, Intel Speedstep, etc. • Not considered in this talk • Very powerful technique currently applied only inlow-performance designs • Mentality: today’s high performance parts aren’t “limited” by power D. Sylvester, DAC-2001

  32. % of total paths Path delay (normalized to clock period) Lower Power Via Rich Replacement • Media processors and other low speed designs have many non-critical paths • 60-70% of paths have delay  half the clock period • After replacement, most paths become near critical • What about high-speed microprocessors? D. Sylvester, DAC-2001

  33. Similar Story For High-Performance • IBM 480 MHz PowerPC shows over 50% of paths have delay less than half the clock period • Implies that high-performance designs can benefit from multi-Vdd Ref: Akrout, JSSC98 D. Sylvester, DAC-2001

  34. After post-synthesis resizing Before post-synthesis resizing Resizing Is Not The Right Answer • Post-synthesis optimizations resize gates to recover power on non-critical paths • Looks similar to pre- and post-replacement figures in media processor… This is the wrong approach for nanometer design! Ref: Sirichotiyakul, DAC99 D. Sylvester, DAC-2001

  35. Multi-Vdd Instead of Sizing • Power ~ C Vdd2 f, where f is fixed • Key: Reducing gate width impacts power sub-linearly • Interconnect capacitance is not affected • Reducing supply voltage cuts power quadratically • All capacitive loads have lower voltage swing • How can we minimize delay penalty at low Vdd? D. Sylvester, DAC-2001

  36. Challenges For Multi-Vdd • Area overhead • Toshiba reported 7% rise in area due to placement restrictions, level converters, additional power grid routing • EDA tool support for the above issues (placement, dual power routing) • Noise analysis • Additional shielding required between Vdd,low and Vdd,high signals? • Including clock network D. Sylvester, DAC-2001

  37. Static Power • Global signaling and layout optimization • Multi-Vdd • Static power • Multi-Vth + Vdd + sizing D. Sylvester, DAC-2001

  38. Static Power • Why do we care about static power in non-portable devices? • Standby power is “wasted” -- leaves fewer Watts for computation • Worsens reliability by raising die temperatures • Leakage current is a function of Vth and subthreshold swing (Ss) (x10 at operating vs. room temp!) • Ss expected to remain at 80-85 mV/dec (room temp) • Device technology may cut this by ~20% • Vth reductions are mandated by scaling Vdd • Vth has been around Vdd/5 D. Sylvester, DAC-2001

  39. Vdd Pull Up Vout Pull Down Vcontrol Parasitic Node High Vth Device Leakage Suppression Approaches • Dual-Vth (most common) • Low-Vth on critical paths, high-Vth off • Only cost is additional masks • MTCMOS • Series inserted high-Vth device cuts leakage current when off (sleep mode) • Delay and area penalties, control device sizing is critical • Other techniques • Substrate biasing to control Vth • Dual-Vth domino • Use low-Vth devices only inevaluate paths D. Sylvester, DAC-2001

  40. Can Gate-length biasing help leakage reduction? • Reduce leakage? Variation of leakage and delay (each normalized to 1) for an NMOS device in an industrial 130nm technology • Reduce leakage variability? Biasing

  41. Gate-length Biasing • First proposed by Sirisantana et al. • Comparative study of effect of doping, tox and gate-length • Large bias used, significant slow down • Small bias • Little reduction in leakage beyond 10% bias while delay degrades linearly • Preserves pin compatibility  Technique applicable as post-RET step • Salient features • Design cycle not interfered • Zero cost (no additional masks)

  42. Granularity • Technology-level All devices in all cells have one biased gate-length • Cell-level All devices in a cell have one biased gate-length • Device-level All devices have independent biased gate-length Simplification: In each cell, NMOS devices have one gate-length and PMOS devices have another

  43. Device-Level Leakage Reduction

  44. Circuit level • Bias gate-length for non-critical cells • Library extended with each cell having a biased version • Benefits analyzed in conjunction with Multi-VT assignment and in isolation • SVT-SGL • DVT-SGL • SVT-DGL • DVT-DGL

  45. Results: Leakage Reduction With less than 2.5% delay penalty • Design Compiler used for VT assignment and gate-length biasing • Better results expected with Duet (academic sizer from Michigan)

  46. Multi-Vth + Vdd + Sizing • Global signaling and layout optimization • Multi-Vdd • Static power analysis • Multi-Vth + Vdd + sizing D. Sylvester, DAC-2001

  47. Multi-Everything • Need an approach that selects between speed, static power, and dynamic power • Should be scalable to nanometer design • Rules out dual-Vth domino or other dynamic logic families (low supplies kill performance advantages) • Techniques mentioned so far • Flexible, optimized cell layouts • Multi-Vdd • Dual-Vth • Put them all together D. Sylvester, DAC-2001

  48. Multi-Vdd Can Leverage Vth’s • Existing designs using multi-Vdd do not alter Vth in low-Vdd cells • Highly sub-optimal, delay is fully penalized • Limits cell replacement  limits power savings • Much better solution: reduce Vth in low-Vdd cells to carefully balance delay, static power, and dynamic power • Enforce technology scaling within a chip – whenever we reduce Vdd, we also reduce Vth to maintain speed D. Sylvester, DAC-2001

  49. Multi-Vdd + Vth Negates Delay Penalty Delay ~ CVdd/Ion • Scenarios • Constant Vth (current paradigm) • Scale Vth to maintain constant static power • Scale Vth to reduce static power linearly with Vdd • Delay penalty is substantially offset • Ion is very sensitive to Vth at Vdd < 1V • Pstatic reduces with Vdd due to linear term and smaller Ioff (Ionand DIBL ) D. Sylvester, DAC-2001

  50. Now Add Sizing • Multi-Vdd + multi-Vth + sizing/cell layout optimization attacks power from many angles (multi-dimensional) • Depending on criticality and switching activities, non-critical gates can be: • Assigned Vdd,low • Assigned Vdd,low + lower Vth • Assigned Vth,high • Downsized (at the individual transistor level if advantageous) • Assigned Vdd,low and upsized • For gates that cannot tolerate Vdd,low delay, this can be power efficient • And others D. Sylvester, DAC-2001

More Related