540 likes | 697 Views
ECE260B – CSE241A Winter 2005 Design Styles Multi-Vdd/Vth Designs. Website: http://vlsicad.ucsd.edu/courses/ece260b-w05. The Design Problem. Source: sematech97. A growing gap between design complexity and design productivity. Design Methodology.
E N D
ECE260B – CSE241AWinter 2005Design StylesMulti-Vdd/Vth Designs Website: http://vlsicad.ucsd.edu/courses/ece260b-w05
The Design Problem Source: sematech97 A growing gap between design complexity and design productivity
Design Methodology • Design process traverses iteratively between three abstractions: behavior, structure, and geometry • More and more automation for each of these steps
Behavioral Description of Accumulator Design described as set of input-output relations, regardless of chosen implementation Data described at higher abstraction level (“integer”)
Structural Description of Accumulator Design defined as composition of register and full-adder cells (“netlist”) Data represented as {0,1,Z} Time discretized and progresses with unit steps Description language: VHDL Other options: schematics, Verilog
Full Custom • Hand drawn geometry • All layers customized • Digital and analog • Simulation at transistor level • High density • High performance • Long design time Magic Layout Editor (UC Berkeley)
Symbolic Layout • Dimensionless layout entities • Only topology is important • Final layout generated by “compaction” program Stick diagram of inverter
Standard Cells • Organized in rows • Cells made as full custom by vendor (not user) • All layers customized • Digital with possible special analog cells • Simulation at gate level (digital) • Medium-high density • Medium-high performance • Reasonable design time Routing channel requirements are reduced by presence of more interconnect layers
Standard Cell — Example [Brodersen92]
Standard Cell - Example 3-input NAND cell (from Mississippi State Library) characterized for fanout of 4 and for three different technologies
Automatic Cell Generation Random-logic layout generated by CLEO cell compiler (Digital)
Macrocell-Based Design • Predefined macro blocks (uP, RAM, etc.) • Macro blocks made as full custom by vendor (IP blocks) • All layers customized • Digital and some analog • Simulation at behavior or gate level • High density • High performance • Short design time • Use standard on-chip busses • “System on a chip” (SOC) Macrocell Interconnect Bus Routing Channel
SRAM SRAM Data paths Routing Channel Standard cells Video-encoder chip [Brodersen92] Macrocell Design Methodogoly Floorplan: Defines overall topology of design, relative placement of modules, and global routes of busses, supplies, and clocks
Gate Array • Predefined transistors connected via metal • Two types: channel based, sea of gates • Only metal layers customized • Fixed array sizes • Digital cells in library • Simulation at gate level (digital) • Medium density • Medium performance • Reasonable design time
Gate Array — Primitive Cells Committed Cell(4-input NOR) Uncommited Cell
Sea-of-gates Random Logic Memory Subsystem LSI Logic LEA300K (0.6 mm CMOS)
Prewired Arrays • Programmable logic blocks • Programmable connections between logic blocks • No layers customized (standard devices) • Digital only • Low-medium performance • Low-medium density • Programmable: SRAM, EPROM, Flash, Anti-fuse, etc. • Easy and quick design changes • Cheap design tools • Low development cost • High device cost • NOT a real ASIC Courtesy Altera Corp.
Programmable Logic Devices PAL PLA PROM
Field-Programmable Gate Arrays - Fuse-based Standard-cell like floorplan
Interconnect Programming interconnect using anti-fuses
RAM-based FPGA - Basic Cell (CLB) Courtesy of Xilinx
RAM-based FPGA Xilinx XC4025
High Performance Devices • Mixture of full custom, standard cells and macro’s • Full custom for special blocks: Adder (data path), etc. • Macro’s for standard blocks: RAM, ROM, etc. • Standard cells for non critical digital blocks
Global Signaling and Layout • Global signaling and layout optimization • Multi-Vdd • Static power analysis • Multi-Vth + Vdd + sizing D. Sylvester, DAC-2001
Global Signaling • Current global signaling paradigm insert large static CMOS repeaters to reduce wire RC delay • Impending problems: • Too many repeaters • 180nm processors: 22K repeaters (Itanium), 70K (Power4) • Project 1-1.5M repeaters at 45-65nm technologies • Too much power • Many large repeaters = significant static and dynamic power • Too much noise • Repeater clustering complicates power distribution • Inductive coupling across wide bus structures D. Sylvester, DAC-2001
GDSII Import Compact fixed width Cell Layout Optimization • Advanced layout techniques must allow • Continuous individual device sizing • Variable p/n ratios • Tapered FET stacking sizes • Arbitrary Vth assignments within gates • First cut: Cadabra 15-22% power reduction using 1st two approaches under fixed footprint constraint Optimize specific instances of standard gates Ref: Hurat, Cadabra D. Sylvester, DAC-2001
Multi-Vdd • Global signaling and layout optimization • Multi-Vdd • Static power analysis • Multi-Vth + Vdd + sizing D. Sylvester, DAC-2001
Multi-Vdd Status • Idea: Incorporate two Vdd’s to reduce dynamic power • Limited to a few recent Japanese multimedia processors • Example – 0.3 mm, 75MHz, 3.3V media processor (Toshiba) • Total power savings of 47% in logic, 69% in clock • Dynamic voltage scaling of mobile processors • Transmeta Crusoe, Intel Speedstep, etc. • Not considered in this talk • Very powerful technique currently applied only inlow-performance designs • Mentality: today’s high performance parts aren’t “limited” by power D. Sylvester, DAC-2001
% of total paths Path delay (normalized to clock period) Lower Power Via Rich Replacement • Media processors and other low speed designs have many non-critical paths • 60-70% of paths have delay half the clock period • After replacement, most paths become near critical • What about high-speed microprocessors? D. Sylvester, DAC-2001
Similar Story For High-Performance • IBM 480 MHz PowerPC shows over 50% of paths have delay less than half the clock period • Implies that high-performance designs can benefit from multi-Vdd Ref: Akrout, JSSC98 D. Sylvester, DAC-2001
After post-synthesis resizing Before post-synthesis resizing Resizing Is Not The Right Answer • Post-synthesis optimizations resize gates to recover power on non-critical paths • Looks similar to pre- and post-replacement figures in media processor… This is the wrong approach for nanometer design! Ref: Sirichotiyakul, DAC99 D. Sylvester, DAC-2001
Multi-Vdd Instead of Sizing • Power ~ C Vdd2 f, where f is fixed • Key: Reducing gate width impacts power sub-linearly • Interconnect capacitance is not affected • Reducing supply voltage cuts power quadratically • All capacitive loads have lower voltage swing • How can we minimize delay penalty at low Vdd? D. Sylvester, DAC-2001
Challenges For Multi-Vdd • Area overhead • Toshiba reported 7% rise in area due to placement restrictions, level converters, additional power grid routing • EDA tool support for the above issues (placement, dual power routing) • Noise analysis • Additional shielding required between Vdd,low and Vdd,high signals? • Including clock network D. Sylvester, DAC-2001
Static Power • Global signaling and layout optimization • Multi-Vdd • Static power • Multi-Vth + Vdd + sizing D. Sylvester, DAC-2001
Static Power • Why do we care about static power in non-portable devices? • Standby power is “wasted” -- leaves fewer Watts for computation • Worsens reliability by raising die temperatures • Leakage current is a function of Vth and subthreshold swing (Ss) (x10 at operating vs. room temp!) • Ss expected to remain at 80-85 mV/dec (room temp) • Device technology may cut this by ~20% • Vth reductions are mandated by scaling Vdd • Vth has been around Vdd/5 D. Sylvester, DAC-2001
Vdd Pull Up Vout Pull Down Vcontrol Parasitic Node High Vth Device Leakage Suppression Approaches • Dual-Vth (most common) • Low-Vth on critical paths, high-Vth off • Only cost is additional masks • MTCMOS • Series inserted high-Vth device cuts leakage current when off (sleep mode) • Delay and area penalties, control device sizing is critical • Other techniques • Substrate biasing to control Vth • Dual-Vth domino • Use low-Vth devices only inevaluate paths D. Sylvester, DAC-2001
Can Gate-length biasing help leakage reduction? • Reduce leakage? Variation of leakage and delay (each normalized to 1) for an NMOS device in an industrial 130nm technology • Reduce leakage variability? Biasing
Gate-length Biasing • First proposed by Sirisantana et al. • Comparative study of effect of doping, tox and gate-length • Large bias used, significant slow down • Small bias • Little reduction in leakage beyond 10% bias while delay degrades linearly • Preserves pin compatibility Technique applicable as post-RET step • Salient features • Design cycle not interfered • Zero cost (no additional masks)
Granularity • Technology-level All devices in all cells have one biased gate-length • Cell-level All devices in a cell have one biased gate-length • Device-level All devices have independent biased gate-length Simplification: In each cell, NMOS devices have one gate-length and PMOS devices have another
Circuit level • Bias gate-length for non-critical cells • Library extended with each cell having a biased version • Benefits analyzed in conjunction with Multi-VT assignment and in isolation • SVT-SGL • DVT-SGL • SVT-DGL • DVT-DGL
Results: Leakage Reduction With less than 2.5% delay penalty • Design Compiler used for VT assignment and gate-length biasing • Better results expected with Duet (academic sizer from Michigan)
Multi-Vth + Vdd + Sizing • Global signaling and layout optimization • Multi-Vdd • Static power analysis • Multi-Vth + Vdd + sizing D. Sylvester, DAC-2001
Multi-Everything • Need an approach that selects between speed, static power, and dynamic power • Should be scalable to nanometer design • Rules out dual-Vth domino or other dynamic logic families (low supplies kill performance advantages) • Techniques mentioned so far • Flexible, optimized cell layouts • Multi-Vdd • Dual-Vth • Put them all together D. Sylvester, DAC-2001
Multi-Vdd Can Leverage Vth’s • Existing designs using multi-Vdd do not alter Vth in low-Vdd cells • Highly sub-optimal, delay is fully penalized • Limits cell replacement limits power savings • Much better solution: reduce Vth in low-Vdd cells to carefully balance delay, static power, and dynamic power • Enforce technology scaling within a chip – whenever we reduce Vdd, we also reduce Vth to maintain speed D. Sylvester, DAC-2001
Multi-Vdd + Vth Negates Delay Penalty Delay ~ CVdd/Ion • Scenarios • Constant Vth (current paradigm) • Scale Vth to maintain constant static power • Scale Vth to reduce static power linearly with Vdd • Delay penalty is substantially offset • Ion is very sensitive to Vth at Vdd < 1V • Pstatic reduces with Vdd due to linear term and smaller Ioff (Ionand DIBL ) D. Sylvester, DAC-2001
Now Add Sizing • Multi-Vdd + multi-Vth + sizing/cell layout optimization attacks power from many angles (multi-dimensional) • Depending on criticality and switching activities, non-critical gates can be: • Assigned Vdd,low • Assigned Vdd,low + lower Vth • Assigned Vth,high • Downsized (at the individual transistor level if advantageous) • Assigned Vdd,low and upsized • For gates that cannot tolerate Vdd,low delay, this can be power efficient • And others D. Sylvester, DAC-2001