330 likes | 524 Views
A Deep Sub-Micron VLSI Design Flow using Layout Fabrics. Sunil P. Khatri University of Colorado, Boulder Amit Mehrotra University of Illinois, Urbana-Champaign Robert K Brayton Alberto L Sangiovanni-Vincentelli University of California, Berkeley. Our VLSI Design Flow. Logic netlist.
E N D
A Deep Sub-Micron VLSI Design Flow using Layout Fabrics Sunil P. Khatri University of Colorado, Boulder Amit Mehrotra University of Illinois, Urbana-Champaign Robert K Brayton Alberto L Sangiovanni-Vincentelli University of California, Berkeley
Our VLSI Design Flow Logic netlist Logic Optimization Optimized logic netlist Technology Mapping Placement Routing Layout
Motivation • Modern IC processes • Feature size well below 1 micron • Certain electrical effects increasingly important • Cross-talk • Electromigration • Self Heat • Statistical variations • Logic abstraction eroded • Existing design paradigms need to be rethought
C C 1 1 C 1 C C C 2 2 2 C 2 v a a C 1 C 1 v C C 2 2 C 2 a a v a v a a v a a C a v a C 1 C 1 C 1 C C 1 1 1 v C v v 1 C C C C 2 2 C 2 2 C C 2 C C 2 2 2 2 a C C a 2 2 a Research Focus • The cross-talk issue • Tackled in an ad-hoc manner • Increases turn-around time • Verified cross-talk trends • Accurate 3-D capacitance extraction • Delay variation 2.47:1 (200 mm wires, 10X drivers, 0.1 mm technology)
Outline • Previous Approaches • New idea: The Fabric Approach • Fabric1 (in DAC-1999) • Standard-cell based design • Fabric3 (in ICCAD-2000) • Network of PLA based design • Further Tasks • Summary
Previous Approaches • [ALPHA 97] : • Metal layers 3 and 6 dedicated to power • Not viable in future processes • [Rubio 94]: • Functional analysis based on layout • Post-layout methods don’t scale • [Kirkpatrick 94, 96] : • Concept of digital sensitivity • Requires don’t-care and image computations
V S V G S S S V S • G S V Solution: Layout Fabrics • We handle cross-talkby design • A new layout and design paradigm • Repeating dense wiring fabric (DWF) pattern at minimum pitch
Research Contribution • Verify cross-talk trends • Fabric1 [KMBSO99] (in DAC) • Incorportated into traditional design flow • Fabric3 [KBS00] (in ICCAD-00) • Network of PLAs • Detailed electrical characterization • Synthesis, wire removal algorithms • Both utilize DWF pattern • 1.02:1 cross-talk delay variation
Layout Fabrics • Advantages • Pre-characterized parasitics • Uniform, low cross-coupling capacitance • 40X lower, 2% delay variation • Uniform, low signal inductance • Automatic power and ground routing • Uniform, low power and ground resistance • Can effectively implement regular structures • Disadvantages • 5% increase in total capacitance • Area penalty • Power increase
Capacitance in DWF • Experimental setup • “Strawman” process model, copper wires, low-K dielectric • Capacitances from 3-D field solver (space3d) • Simulated three wires in spice • 0.1 micron process, Metal2 wires • Length 200 microns, 10x minimum drivers • Non-DWF • Delay variation 2.47:1 • Signal integrity problems for fast slew rates • With DWF • 40X reduction in cross-coupling capacitance • Delay variation 1.02:1, no signal integrity problem
Inductance in the DWF • Low and uniform in DWF • Current return path is at minimum spacing • In regular layout style, varies greatly • Problems reported for clock signals • Compared inductance of Metal8 trace • Verified using ASITIC Inductance (nH / micron)
VDD/GND Resistance in DWF • Check resistance at various points in DWF • Compare with standard cell case • Varies greatly • Measured at end of row • L/W = 1000/8 VDD/GND resistance (ohms)
Buffer Insertion in DWF • Easily performed • VDD and GND available all over routing area
Fabric1 - Introduction • DWF pattern utilized chip-wide • Library cells implemented in this pattern Std Cell Fabric Cell • Synthesis, placement and routing use standard cell methodology
Fabric3 • Network of Programmable Logic Arrays • Combine many logic nodes into a PLA • Routing area utilizes DWF pattern • PLA implements a multi-output function • example : f = a b + c ; g = a b + c a b b c f g a b c a b AND plane OR Plane
g f b a b a clk Fabric3 PLA Core Layout
PLA Standard Cell PLAs v/s Standard Cells • PLAs are denseand fast
PLA Characteristics • Why is the PLA area and delay so low? • Wiring localized within PLA • PLA core transistor sizes are minimum • No p-transistor to n-transistor diffusion spacing • “Gigahertz” chip utilized pre-charged PLAs • High performance • Quick implementation • Didn’t use a network of PLAs
g f e d a c b Network of PLAs • PLAs are pre-charged • Inputs to all PLAs must settle before evaluation begins
Network of PLAs • For correct operation: • PLA dependency graph must be acyclic • Evaluation of PLAi after completion of slowest PLAj in its “fanin” • Self-timed design style • Each PLA generates a completion signal • Overhead of one wordline, one output • Delay formula to find slowest PLAj
Decomposition • Algorithm collapses wiring into PLAs • Input: multi-level combinational network W bound H bound • Output: Correct network of PLAs • Our algorithm greedily grows a PLA until either bound is violated • Attempt to reduce wires by selecting fanouts for inclusion in the PLA being grown
Choice of W, H • Choice of W • Driven by synthesis constraints • Large W means larger runtimes • espresso and folding done in inner loop • Use W between 25 and 50 • Choice of H • Driven by power considerations • Large H also affects synthesis runtimes • Used H between 15 and 40
g 4 g g 4 3 f f 3 f g 4 g 4 g 3 f e d 4 g 2 2 e d 4 e d g 2 2 4 3 f g 4 3 f 3 f g 3 f 4 3 f e d 2 2 c 3 b f a e d 1 1 1 c 2 b a c b 2 a 1 1 1 e d 2 2 e d 2 2 e d 2 2 e d 2 2 c b a e 1 d 1 1 2 2 c b a 1 1 1 c b a 1 1 1 c b a 1 1 1 c b a 1 1 1 c b a 1 1 1 c b a 1 1 1 Fabric3 - Decomposition
Place/Route Flow • PLA generation using perl script • Layout generated on the fly • 2 Layer experiments: • Placement using vpr • FPGA placement tool • All PLAs have approximately same size • Routing using wolfe • interface to TimberWolfSC and yacr • 3-6 Layer experiments: • Placement using CADENCE qplace • Routing using CADENCE router
Fabric3 - Results • Timing results essentially unchanged • For C3540, delay variation due to cross-talk is 3.45:1 (Stdcell) versus 1.07:1 (Fabric3)
Future Tasks • Better algorithms: • Better ways of decomposing original netlist • Refining the fabric: • Alternative denser fabrics • Encoding PLA inputs [Schmookler80] • Connecting gates to PLA outputs • Alternative implementation of logic blocks: • Different PLA styles • Alternative circuits
Summary • Layout fabricsto eliminate cross-talkin DSM VLSI design • New layout and design paradigm • Fix cross-talk by design • Highly regular and predictable • Network of PLA based design flow • PLA decomposition algorithms • Minimal area penalty • 15% timing improvement