330 likes | 340 Views
This paper explores a new layout and design paradigm called Layout Fabrics, which handles cross-talk and improves signal integrity in modern IC processes. The advantages and disadvantages of Layout Fabrics are discussed, along with experimental results and comparisons with traditional design approaches. The paper also presents the use of Fabric1 and Fabric3 in the design flow, along with their results and contributions.
E N D
A Deep Sub-Micron VLSI Design Flow using Layout Fabrics Sunil P. Khatri University of Colorado, Boulder Amit Mehrotra University of Illinois, Urbana-Champaign Robert K Brayton Alberto L Sangiovanni-Vincentelli University of California, Berkeley
Our VLSI Design Flow Logic netlist Logic Optimization Optimized logic netlist Technology Mapping Placement Routing Layout
Motivation • Modern IC processes • Feature size well below 1 micron • Certain electrical effects increasingly important • Cross-talk • Electromigration • Self Heat • Statistical variations • Logic abstraction eroded • Existing design paradigms need to be rethought
C C 1 1 C 1 C C C 2 2 2 C 2 v a a C 1 C 1 v C C 2 2 C 2 a a v a v a a v a a C a v a C 1 C 1 C 1 C C 1 1 1 v C v v 1 C C C C 2 2 C 2 2 C C 2 C C 2 2 2 2 a C C a 2 2 a Research Focus • The cross-talk issue • Tackled in an ad-hoc manner • Increases turn-around time • Verified cross-talk trends • Accurate 3-D capacitance extraction • Delay variation 2.47:1 (200 mm wires, 10X drivers, 0.1 mm technology)
Outline • Previous Approaches • New idea: The Fabric Approach • Fabric1 (in DAC-1999) • Standard-cell based design • Fabric3 (in ICCAD-2000) • Network of PLA based design • Further Tasks • Summary
Previous Approaches • [ALPHA 97] : • Metal layers 3 and 6 dedicated to power • Not viable in future processes • [Rubio 94]: • Functional analysis based on layout • Post-layout methods don’t scale • [Kirkpatrick 94, 96] : • Concept of digital sensitivity • Requires don’t-care and image computations
V S V G S S S V S • G S V Solution: Layout Fabrics • We handle cross-talkby design • A new layout and design paradigm • Repeating dense wiring fabric (DWF) pattern at minimum pitch
Research Contribution • Verify cross-talk trends • Fabric1 [KMBSO99] (in DAC) • Incorportated into traditional design flow • Fabric3 [KBS00] (in ICCAD-00) • Network of PLAs • Detailed electrical characterization • Synthesis, wire removal algorithms • Both utilize DWF pattern • 1.02:1 cross-talk delay variation
Layout Fabrics • Advantages • Pre-characterized parasitics • Uniform, low cross-coupling capacitance • 40X lower, 2% delay variation • Uniform, low signal inductance • Automatic power and ground routing • Uniform, low power and ground resistance • Can effectively implement regular structures • Disadvantages • 5% increase in total capacitance • Area penalty • Power increase
Capacitance in DWF • Experimental setup • “Strawman” process model, copper wires, low-K dielectric • Capacitances from 3-D field solver (space3d) • Simulated three wires in spice • 0.1 micron process, Metal2 wires • Length 200 microns, 10x minimum drivers • Non-DWF • Delay variation 2.47:1 • Signal integrity problems for fast slew rates • With DWF • 40X reduction in cross-coupling capacitance • Delay variation 1.02:1, no signal integrity problem
Inductance in the DWF • Low and uniform in DWF • Current return path is at minimum spacing • In regular layout style, varies greatly • Problems reported for clock signals • Compared inductance of Metal8 trace • Verified using ASITIC Inductance (nH / micron)
VDD/GND Resistance in DWF • Check resistance at various points in DWF • Compare with standard cell case • Varies greatly • Measured at end of row • L/W = 1000/8 VDD/GND resistance (ohms)
Buffer Insertion in DWF • Easily performed • VDD and GND available all over routing area
Fabric1 - Introduction • DWF pattern utilized chip-wide • Library cells implemented in this pattern Std Cell Fabric Cell • Synthesis, placement and routing use standard cell methodology
Fabric3 • Network of Programmable Logic Arrays • Combine many logic nodes into a PLA • Routing area utilizes DWF pattern • PLA implements a multi-output function • example : f = a b + c ; g = a b + c a b b c f g a b c a b AND plane OR Plane
g f b a b a clk Fabric3 PLA Core Layout
PLA Standard Cell PLAs v/s Standard Cells • PLAs are denseand fast
PLA Characteristics • Why is the PLA area and delay so low? • Wiring localized within PLA • PLA core transistor sizes are minimum • No p-transistor to n-transistor diffusion spacing • “Gigahertz” chip utilized pre-charged PLAs • High performance • Quick implementation • Didn’t use a network of PLAs
g f e d a c b Network of PLAs • PLAs are pre-charged • Inputs to all PLAs must settle before evaluation begins
Network of PLAs • For correct operation: • PLA dependency graph must be acyclic • Evaluation of PLAi after completion of slowest PLAj in its “fanin” • Self-timed design style • Each PLA generates a completion signal • Overhead of one wordline, one output • Delay formula to find slowest PLAj
Decomposition • Algorithm collapses wiring into PLAs • Input: multi-level combinational network W bound H bound • Output: Correct network of PLAs • Our algorithm greedily grows a PLA until either bound is violated • Attempt to reduce wires by selecting fanouts for inclusion in the PLA being grown
Choice of W, H • Choice of W • Driven by synthesis constraints • Large W means larger runtimes • espresso and folding done in inner loop • Use W between 25 and 50 • Choice of H • Driven by power considerations • Large H also affects synthesis runtimes • Used H between 15 and 40
g 4 g g 4 3 f f 3 f g 4 g 4 g 3 f e d 4 g 2 2 e d 4 e d g 2 2 4 3 f g 4 3 f 3 f g 3 f 4 3 f e d 2 2 c 3 b f a e d 1 1 1 c 2 b a c b 2 a 1 1 1 e d 2 2 e d 2 2 e d 2 2 e d 2 2 c b a e 1 d 1 1 2 2 c b a 1 1 1 c b a 1 1 1 c b a 1 1 1 c b a 1 1 1 c b a 1 1 1 c b a 1 1 1 Fabric3 - Decomposition
Place/Route Flow • PLA generation using perl script • Layout generated on the fly • 2 Layer experiments: • Placement using vpr • FPGA placement tool • All PLAs have approximately same size • Routing using wolfe • interface to TimberWolfSC and yacr • 3-6 Layer experiments: • Placement using CADENCE qplace • Routing using CADENCE router
Fabric3 - Results • Timing results essentially unchanged • For C3540, delay variation due to cross-talk is 3.45:1 (Stdcell) versus 1.07:1 (Fabric3)
Future Tasks • Better algorithms: • Better ways of decomposing original netlist • Refining the fabric: • Alternative denser fabrics • Encoding PLA inputs [Schmookler80] • Connecting gates to PLA outputs • Alternative implementation of logic blocks: • Different PLA styles • Alternative circuits
Summary • Layout fabricsto eliminate cross-talkin DSM VLSI design • New layout and design paradigm • Fix cross-talk by design • Highly regular and predictable • Network of PLA based design flow • PLA decomposition algorithms • Minimal area penalty • 15% timing improvement