1 / 28

CSE241 VLSI Digital Circuits Winter 2003 Lecture 07: Timing II

CSE241 VLSI Digital Circuits Winter 2003 Lecture 07: Timing II. Delay Calculation. Cell Fall. 0.147ns. 0.1ns. 0.178. Cell Rise. 0.12ns. 1.0pf. 0.261. Fall delay = 0.178ns Rise delay = 0.261ns Fall transition = 0.147ns Rise transition = …. Fall Transition. 0.147.

Download Presentation

CSE241 VLSI Digital Circuits Winter 2003 Lecture 07: Timing II

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSE241VLSI Digital CircuitsWinter 2003Lecture 07: Timing II

  2. Delay Calculation Cell Fall 0.147ns 0.1ns 0.178 Cell Rise 0.12ns 1.0pf 0.261 Fall delay = 0.178ns Rise delay = 0.261ns Fall transition = 0.147ns Rise transition = … Fall Transition 0.147

  3. PVT (Process, Voltage, Temperature) Derating Actual cell delay = Original delay x KPVT

  4. PVT Derating: Example + Min/Typ/Max Triples Proc_var (0.5:1.0:1.3) Voltage (5.5:5.0:4.5) Temperature (0:20:50) KP = 0.80 : 1.00 : 1.30 KV = 0.93 : 1.00 : 1.08 KT = 0.80 : 1.07 : 1.35 KPVT = 0.60 : 1.07 : 1.90 Cell delay = 0.261ns Derated delay = 0.157 : 0.279 : 0.496 {min : typical : max}

  5. Conservatism of Gate Delay Modeling • True gate delay depends on input arrival time patterns • STA will assume that only 1 input is switching • Will use worst slope among several inputs Vdd A B F tpd Time Vdd A F tpd Time

  6. This Class + Logistics • Reading • Smith, Chapters 15, 16 • http://vlsicad.ucsd.edu/Presentations/ICCAD00TUTORIAL/ • Possibly: Sarrafzadeh/Wong Chapters 2 - placement, 3 - routing, (4 – performance modeling) • Schedule • MT will be take-home (and, easy), BUT you lose 5% if you don’t show up on Thursday (attendance will be taken by Ben) • Thursday: Surprise guest lecturer on floorplan / placement • HW #12: Suppose that you want to work on timing edges that are most critical according to some F(slack of the edge, #paths through the edge). How would you modify the STA calculation (longest path in a DAG) so that it also calculates the number of paths through each edge? Slide courtesy of S. P. Levitan, U. Pittsburg

  7. Buffer Clustering • Hierarchical clustering connecting clock source (= root) to clock sinks (= leaves) of clustering tree • Fanout at each level between 5 and 200 (depends on buffer library) • Often specify a clock topology in the tool as, e.g., (1)-6-8-5  root has 6 children, each of which has 8 children, each of which has 5 (leaf) children  240 clock sinks • Big question: how to perform the hierarchical buffer clustering? • What makes a “good” cluster? Sylvester / Shepard, 2001

  8. Buffer Clustering by Space Partitioning • Example: Cadence CT-Gen • Pick fanout (e.g., 6-4) • Pick “long axis” of bounding box of sinks • Place buffers at medians (essentially) of chunks of sinks identified by space-partitioning • Why is this good? • Uses (or assumes) min wire; easily routed (Steiner routing; robust to ECOs; … • Why is it bad? • Oversizes drivers; commits to skew which could be avoided Sylvester / Shepard, 2001

  9. Buffer Clustering by Traditional Clustering • Example: SPC, old Cell3 CTS • Pick fanout (e.g., 6) • Find clusters of size 6 • Place buffers at centers or centroids or … of clusters • Recurse • Why is this good? • Can get near-zero skew trees? • Why is this bad? • ECOs; hard to route; more wire(?); difficult algorithms! • HW #13: Propose a hierarchical clustering strategy for buffered clock trees, and explain its pros and cons Sylvester / Shepard, 2001

  10. Outline • Clocking • Storage elements • Clocking metrics and methodology • Clock distribution • Package and useful-skew degrees of freedom • Clock power issues • Gate timing models

  11. Skew Reduction Using Package • Most clock network latency occurs at global level (largest distances spanned) • Latency  Skew • With reverse scaling, routing low-RC signals at global level becomes more difficult & area-consuming Sylvester / Shepard, 2001

  12. Skew Reduction Using Package mP/ASIC • RC of package-level wiring up to 4 orders of magnitude smaller than on-chip wiring • Global skew reduced • Lower capacitance  lower power • Opens up global routing tracks • Results not yet conclusive Solder bump substrate System clock • Incorporate global clock distribution into the package • Flip-chip packaging allows for high density, low parasitic access from substrate to IC Sylvester / Shepard, 2001

  13. Useful skew FF FF FF FF FF FF slow slow fast fast hold hold hold hold setup setup setup setup Useful skew • Local skew constraints • Shift slack to critical paths Useful Skew (= cycle-stealing) Zero skew Timing Slacks Zero skew • Global skew constraint • All skew is bad W. Dai, UC Santa Cruz

  14. D : longest path d : shortest path FF FF -d + thold < Skew < Tperiod - D - tsetup race condition safe cycle time violation permissible range Skew = Local Constraint • Timing is correct as long as the signal arrives in the permissible skew range W. Dai, UC Santa Cruz

  15. FF FF FF 6 ns 2 ns 4 0 4 0 “2 0 2”: more safety margin 2 -2 Skew Scheduling for Design Robustness • Design will be more robust if clock signal arrival time is in the middle of permissible skew range, rather than on edge • Can solve a linear program to maximize robustness = determine prescribed sink skews T = 6 ns “0 0 0”: at verge of violation W. Dai, UC Santa Cruz

  16. Potential Advantages of Useful Skew • Reduce peak current consumption by distributing the FF switch point in the range of permissible skew CLK CLK 0-skew U-skew W. Dai, UC Santa Cruz • Affords extra margin to increase clock frequency or reduce sizing (= power)

  17. Synthesis Placement 0-Skew Clock Synthesis Clock Routing Signal Routing Extraction & Delay Calculation Static Timing Analysis Conventional Zero-Skew Flow W. Dai, UC Santa Cruz

  18. Permissible range generation Initial skew scheduling Clock tree topology synthesis Clock net routing Clock timing verification Useful-Skew Flow Existing Placement U-Skew Clock Synthesis Clock Routing Signal Routing Extraction & Delay Calculation W. Dai, UC Santa Cruz Static Timing Analysis

  19. Outline • Clocking • Storage elements • Clocking metrics and methodology • Clock distribution • Package and used-skew degrees of freedom • Clock power issues • Gate timing models

  20. Clock Power • Power consumption in clocks due to: • Clock drivers • Long interconnections • Large clock loads – all clocked elements (latches, FF’s) are driven • Different components dominate • Depending on type of clock network used • Ex. Grid – huge pre-drivers & wire cap. drown out load cap. Sylvester / Shepard, 2001

  21. Clock Power Is LARGE P = a C Vdd2 f Sylvester / Shepard, 2001 Not only is the clock capacitance large, it switches every cycle!

  22. Low-Power Clocking • Gated clocks • Prevent switching in areas of chip not being used • Easier in static designs • Edge-triggered flops in ARM rather than transparent latches in Alpha • Reduced load on clock for each latch/flop • Eliminated spurious power-consuming transitions during latch flow-through (transparency) Sylvester / Shepard, 2001

  23. Clock Area • Clock networks consume silicon area (clock drivers, PLL, etc.) and routing area • Routing area is most vital • Top-level metals are used to reduce RC delays • These levels are precious resources (unscaled) • Power routing, clock routing, key global signals • Reducing area also reduces wiring capacitance and power • Typical #’s: Intel Itanium – 4% of M4/5 used in clock routing Sylvester / Shepard, 2001

  24. Clock Slew Rates • To maintain signal integrity and latch performance, minimum slew rates are required • Too slow – clock is more susceptible to noise, latches are slowed down, setup times eat into timing budget [Tsetup = 200 + 0.33 * Tslew (ps)], more short-circuit power for large clock drivers • Too fast – burns too much power, overdesigned network, enhanced ground bounce • Rule-of-thumb: Trise and Tfall of clock are each between 10-20% of clock period (10% - aggressive target) • 1 GHz clock; Trise = Tfall = 100-200ps Sylvester / Shepard, 2001

  25. Example: Alpha 21264 Grid + H-tree approach Power = 32% of total Wire usage = 3% of metals 3 & 4 Sylvester / Shepard, 2001 4 major clock quadrants, each with a large driver connected to local grid structures

  26. Alpha 21264 Skew Map Sylvester / Shepard, 2001 Ref: Compaq, ASP-DAC00

  27. Power vs. Skew • Fundamental design decision • Meeting skew requirements is easy with unlimited power budget • Wide wires reduce RC product but increase total C • Driver upsizing reduces latency ( reduces skew as well) but increases buffer cap • SOC context: plastic package  power limit is 2-3 W Sylvester / Shepard, 2001

  28. Clock Distribution Trends • Timing • Clock period dropping fast, skew must follow • Slew rates must also scale with cycle time • Jitter – PLL’s get better with CMOS scaling but other sources of noise increase • Power supply noise more important • Switching-dependent temperature gradients • Materials • Cu reduces RC slew degradation, potential skew • Low-k decreases power, improves latency, skew, slews • Power • Complexity, dynamic logic, pipelining  more clock sinks • Larger chips  bigger clock networks Sylvester / Shepard, 2001

More Related