1 / 33

CMOS VLSI DESIGN

CMOS VLSI DESIGN. Kasin Vichienchom kvkasin@kmitl.ac.th Lecture#6. Timing Issues. Clock Non-Ideality Clock Skew Jitter Clock Distribution Networks Metrics Types Examples. Clock Non-Ideality. Clock skew ( phase shift between clocks)

forbes
Download Presentation

CMOS VLSI DESIGN

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CMOS VLSI DESIGN Kasin Vichienchom kvkasin@kmitl.ac.th Lecture#6

  2. Timing Issues • Clock Non-Ideality • Clock Skew • Jitter • Clock Distribution Networks • Metrics • Types • Examples

  3. Clock Non-Ideality • Clock skew ( phase shift between clocks) • Spatial variation in temporally equivalent clock edges; deterministic + random, tSK • Due to RC delay in clock distribution which is depending on the difference indistance, clock load, clock driver, process variation • Clock jitter (clock period variation) • Temporal variations in consecutive edges of the clock signal; modulation + random noise • Cycle-to-cycle (short-term) tJS • Long term tJL • Variation of the pulse width • Important for level sensitive clocking Source : Jan Rabaey

  4. Clk tSK Clk tJS Clock Non-Ideality Skew and Jitter Source : Jan Rabaey

  5. Clock Skew-Negative Source : Jan Rabaey

  6. Clock Skew-Positive Source : Jan Rabaey

  7. Effect of Clock Skew • No Skew Constrain: TCLK≥ tctq + tmax + tsetup

  8. Effect of Clock Skew • Negative Skew If tmax = t1: setup time violation If tmax = t2: zero clocking (load old data) Constrain: TCLK≥ tctq + tmax + tsetup +| δ|

  9. Effect of Clock Skew • Positive Skew If tmin = t1: double clocking (take data of the next state) If tmin = t2: hold time violation Constrain: tctq + tmin≥ thold +δ

  10. Clock Distribution • Clock Distribution Metrics • Power • Area • Skew • Types of Clock Distribution • Tree • Grid • Hybrid • Case Studies: DEC Alpha µProcessor Source : Jan Rabaey

  11. Metrics-Power Power dissipation of clock network: P = αCLVDD2f α = 1 Very large Source : Jan Rabaey

  12. Metrics-Area • Clock networks consume silicon area (clock drivers, PLL, etc.) and routing area • Routing area is most vital • Top-level metals are used to reduce RC delays • These levels are precious resources (unscaled) • Power routing, clock routing, key global signals • By minimizing area used, we also reduce wiring capacitance and power • Typical number: Intel Itanium – 4% of M4/5 used in clock routing Source : Jan Rabaey

  13. Clock: Slew Rate • To maintain signal integrity and latch performance, minimum slew rates are required • Too slow – clock is more susceptible to noise, latches are slowed down, eats into timing budget • Too fast – burning too much power, overdesigned network, enhanced ground bounce • Rule-of-thumb: tr and tf of clock are each between 10-20% of clock period • Example:1 GHz clock; tr = tf =100-200 ps Source : Jan Rabaey

  14. Clock: Technology Trend • Heavily pipelined designs • more latches • more capacitive load for clock • Larger chips • more wirelength needed to cover the entire die • Complexity • more functionality and devices • more clocked elements (loads) • Dynamic logic • more clocked elements (loads) Source : Jan Rabaey

  15. GCLK Driver r D GCLK GCLK e r v i i v r e D r Driver GCLK Grid System • No RC delay matching • Large Power(huge drivers) Source : Jan Rabaey

  16. Grid System • Gridded clock distribution was common on earlier DEC Alpha microprocessors • Advantages: • Skew determined by grid density and not overly sensitive to load position • Clock signals are available everywhere • Tolerant to process variations • Usually yields extremely low skew values Source : Jan Rabaey

  17. Grid System Disadvantages: • Huge amounts of wiring & power • Wire cap large • Strong drivers needed – pre-driver cap large • Routing area large • To minimize all these penalties, make grid pitch coarser • Skew gets worse • Losing the main advantage • Don’t overdesign – let the skew be as large as tolerable • Grids aren’t feasible for most designs due to power Source : Jan Rabaey

  18. H-Tree System • Original H-tree (Bakoglu) • One large central driver • Recursive H-style structure to match wirelengths • Halve wire width at branching points to reduce reflections Source : Jan Rabaey

  19. H-Tree System • Drawback of original tree concept • slew degradation along long RC paths • unrealistically large central driver • Clock drivers can create large temperature gradients (ex. Alpha 210640 ~30° C) • non-uniform load distribution • Inherently non-scalable (wire resistance skyrockets) • Solution to some problems • Introduce intermediate buffers along the way • Specifically at branching points Source : Jan Rabaey

  20. Buffered H-Tree • Advantages – Ideally zero-skew – Can be low power (depending on skew requirements) – Low area (silicon and wiring) – CAD tool friendly (regular) • Disadvantages – Sensitive to process variations – Local clocking loads are inherently non-uniform Source : Jan Rabaey

  21. Realistic H-Tree a balanced load segment (tile) RC matched for an 400 MHz IBM’s 1998 Microprocessor Source : Jan Rabaey

  22. H-Tree Balancing H-Tree Some techniques: a) Introduce dummy loads b) Snaking of wirelength to match delays Source : Jan Rabaey

  23. Hybrid-System • Globally – Tree • • Power requirements are reduced compared to global grid • – Smaller routing requirements, frees up global tracks • • Trees are easily balanced at the global level • – Keeps global skew low (with minimal process variation) Source : Jan Rabaey

  24. Hybrid-System Locally – Grid • Smaller grid distribution area allows for coarser grid pitch – Lower power in interconnect – Lower power in predrivers – Routing area reduced • Local skew is kept very small • Easy access to clock by simply tapping off grid Source : Jan Rabaey

  25. final drivers pre-driver Case Studies: DEC 21164 tcycle= 3.3ns • 2 phase single wire clock, distributed globally • 2 distributed driver channels • Reduced RC delay/skew • Improved thermal distribution • 3.75nF clock load • 58 cm final driver width • Local inverters for latching • Conditional clocks in caches to reduce power • More complex race checking • Device variation tskew = 150ps trise = 0.35ns Clock waveform Location of clock driver on die Source : Jan Rabaey

  26. Case Studies: DEC 21164 Source : Jan Rabaey

  27. tcycle= 1.67ns trise = 0.35ns tskew = 50ps Case Studies: DEC 21264 EV6 (Alpha 21264) Clocking 600 MHz – 0.35 micron CMOS Global clock waveform • 2 Phase, with multiple conditional buffered clocks • 2.8 nF clock load • 40 cm final driver width • Local clocks can be gated “off” to save power • Reduced load/skew • Reduced thermal issues • Multiple clocks complicate race checking Source : Jan Rabaey

  28. Case Studies: DEC 21264 EV6 (Alpha 21264) Clocking 600 MHz – 0.35 micron CMOS Source : Jan Rabaey

  29. ps 5 10 15 20 25 30 35 40 45 50 ps 300 305 310 315 320 325 330 335 340 345 Case Studies: DEC 21264 GCLK Skew (at Vdd/2 Crossings) GCLK Rise Times (20% to 80% Extrapolated to 0% to 100%) Source : Jan Rabaey

  30. Case Studies: DEC EV7 Active Skew Management and Multiple Clock Domains • widely dispersed drivers • DLLs compensate static and low-frequency variation • divides design and verification effort • DLL design and verification is added work • tailored clocks Source : Jan Rabaey

  31. DEC Alpha Clocking • EV421064 0.75 µm, 200 MHz 1992 • Single global clock driver, 5 levels of buffering • 35 cm driver, 3.25 nF, 40% power • EV5211640.5 µm, 300 MHz 1995 • One central, two side clock drivers • 58 cm driver, 3.75 nF, 40% power • EV6212640.35µm, 600 MHz 1998 • Clock grid, 4 window panes, hierarchical, gated clock domains • 40 cm driver, 2.8 nF • EV70.18µm, 1.2 GHz 2002 • Multiple clock domains, DLLs Source : Jan Rabaey

  32. Itanium 2 H-Tree • Four levels of buffering: • Primary driver • Repeater • Second-level clock buffer • Gater • Route around obstructions Source : Harris

  33. Conclusion • Getting the clock everywhere on a die at the exact same time is difficult • Requires a lot of power to reduce skew (big drivers, wide wires, etc.) • Balanced H-trees are in common use • Design automation tools exist to synthesize these trees • Clocks must • be robust to variations/noise, • have relatively sharp slew rates

More Related