360 likes | 730 Views
CMOS VLSI DESIGN. Kasin Vichienchom kvkasin@kmitl.ac.th Lecture#6. Timing Issues. Clock Non-Ideality Clock Skew Jitter Clock Distribution Networks Metrics Types Examples. Clock Non-Ideality. Clock skew ( phase shift between clocks)
E N D
CMOS VLSI DESIGN Kasin Vichienchom kvkasin@kmitl.ac.th Lecture#6
Timing Issues • Clock Non-Ideality • Clock Skew • Jitter • Clock Distribution Networks • Metrics • Types • Examples
Clock Non-Ideality • Clock skew ( phase shift between clocks) • Spatial variation in temporally equivalent clock edges; deterministic + random, tSK • Due to RC delay in clock distribution which is depending on the difference indistance, clock load, clock driver, process variation • Clock jitter (clock period variation) • Temporal variations in consecutive edges of the clock signal; modulation + random noise • Cycle-to-cycle (short-term) tJS • Long term tJL • Variation of the pulse width • Important for level sensitive clocking Source : Jan Rabaey
Clk tSK Clk tJS Clock Non-Ideality Skew and Jitter Source : Jan Rabaey
Clock Skew-Negative Source : Jan Rabaey
Clock Skew-Positive Source : Jan Rabaey
Effect of Clock Skew • No Skew Constrain: TCLK≥ tctq + tmax + tsetup
Effect of Clock Skew • Negative Skew If tmax = t1: setup time violation If tmax = t2: zero clocking (load old data) Constrain: TCLK≥ tctq + tmax + tsetup +| δ|
Effect of Clock Skew • Positive Skew If tmin = t1: double clocking (take data of the next state) If tmin = t2: hold time violation Constrain: tctq + tmin≥ thold +δ
Clock Distribution • Clock Distribution Metrics • Power • Area • Skew • Types of Clock Distribution • Tree • Grid • Hybrid • Case Studies: DEC Alpha µProcessor Source : Jan Rabaey
Metrics-Power Power dissipation of clock network: P = αCLVDD2f α = 1 Very large Source : Jan Rabaey
Metrics-Area • Clock networks consume silicon area (clock drivers, PLL, etc.) and routing area • Routing area is most vital • Top-level metals are used to reduce RC delays • These levels are precious resources (unscaled) • Power routing, clock routing, key global signals • By minimizing area used, we also reduce wiring capacitance and power • Typical number: Intel Itanium – 4% of M4/5 used in clock routing Source : Jan Rabaey
Clock: Slew Rate • To maintain signal integrity and latch performance, minimum slew rates are required • Too slow – clock is more susceptible to noise, latches are slowed down, eats into timing budget • Too fast – burning too much power, overdesigned network, enhanced ground bounce • Rule-of-thumb: tr and tf of clock are each between 10-20% of clock period • Example:1 GHz clock; tr = tf =100-200 ps Source : Jan Rabaey
Clock: Technology Trend • Heavily pipelined designs • more latches • more capacitive load for clock • Larger chips • more wirelength needed to cover the entire die • Complexity • more functionality and devices • more clocked elements (loads) • Dynamic logic • more clocked elements (loads) Source : Jan Rabaey
GCLK Driver r D GCLK GCLK e r v i i v r e D r Driver GCLK Grid System • No RC delay matching • Large Power(huge drivers) Source : Jan Rabaey
Grid System • Gridded clock distribution was common on earlier DEC Alpha microprocessors • Advantages: • Skew determined by grid density and not overly sensitive to load position • Clock signals are available everywhere • Tolerant to process variations • Usually yields extremely low skew values Source : Jan Rabaey
Grid System Disadvantages: • Huge amounts of wiring & power • Wire cap large • Strong drivers needed – pre-driver cap large • Routing area large • To minimize all these penalties, make grid pitch coarser • Skew gets worse • Losing the main advantage • Don’t overdesign – let the skew be as large as tolerable • Grids aren’t feasible for most designs due to power Source : Jan Rabaey
H-Tree System • Original H-tree (Bakoglu) • One large central driver • Recursive H-style structure to match wirelengths • Halve wire width at branching points to reduce reflections Source : Jan Rabaey
H-Tree System • Drawback of original tree concept • slew degradation along long RC paths • unrealistically large central driver • Clock drivers can create large temperature gradients (ex. Alpha 210640 ~30° C) • non-uniform load distribution • Inherently non-scalable (wire resistance skyrockets) • Solution to some problems • Introduce intermediate buffers along the way • Specifically at branching points Source : Jan Rabaey
Buffered H-Tree • Advantages – Ideally zero-skew – Can be low power (depending on skew requirements) – Low area (silicon and wiring) – CAD tool friendly (regular) • Disadvantages – Sensitive to process variations – Local clocking loads are inherently non-uniform Source : Jan Rabaey
Realistic H-Tree a balanced load segment (tile) RC matched for an 400 MHz IBM’s 1998 Microprocessor Source : Jan Rabaey
H-Tree Balancing H-Tree Some techniques: a) Introduce dummy loads b) Snaking of wirelength to match delays Source : Jan Rabaey
Hybrid-System • Globally – Tree • • Power requirements are reduced compared to global grid • – Smaller routing requirements, frees up global tracks • • Trees are easily balanced at the global level • – Keeps global skew low (with minimal process variation) Source : Jan Rabaey
Hybrid-System Locally – Grid • Smaller grid distribution area allows for coarser grid pitch – Lower power in interconnect – Lower power in predrivers – Routing area reduced • Local skew is kept very small • Easy access to clock by simply tapping off grid Source : Jan Rabaey
final drivers pre-driver Case Studies: DEC 21164 tcycle= 3.3ns • 2 phase single wire clock, distributed globally • 2 distributed driver channels • Reduced RC delay/skew • Improved thermal distribution • 3.75nF clock load • 58 cm final driver width • Local inverters for latching • Conditional clocks in caches to reduce power • More complex race checking • Device variation tskew = 150ps trise = 0.35ns Clock waveform Location of clock driver on die Source : Jan Rabaey
Case Studies: DEC 21164 Source : Jan Rabaey
tcycle= 1.67ns trise = 0.35ns tskew = 50ps Case Studies: DEC 21264 EV6 (Alpha 21264) Clocking 600 MHz – 0.35 micron CMOS Global clock waveform • 2 Phase, with multiple conditional buffered clocks • 2.8 nF clock load • 40 cm final driver width • Local clocks can be gated “off” to save power • Reduced load/skew • Reduced thermal issues • Multiple clocks complicate race checking Source : Jan Rabaey
Case Studies: DEC 21264 EV6 (Alpha 21264) Clocking 600 MHz – 0.35 micron CMOS Source : Jan Rabaey
ps 5 10 15 20 25 30 35 40 45 50 ps 300 305 310 315 320 325 330 335 340 345 Case Studies: DEC 21264 GCLK Skew (at Vdd/2 Crossings) GCLK Rise Times (20% to 80% Extrapolated to 0% to 100%) Source : Jan Rabaey
Case Studies: DEC EV7 Active Skew Management and Multiple Clock Domains • widely dispersed drivers • DLLs compensate static and low-frequency variation • divides design and verification effort • DLL design and verification is added work • tailored clocks Source : Jan Rabaey
DEC Alpha Clocking • EV421064 0.75 µm, 200 MHz 1992 • Single global clock driver, 5 levels of buffering • 35 cm driver, 3.25 nF, 40% power • EV5211640.5 µm, 300 MHz 1995 • One central, two side clock drivers • 58 cm driver, 3.75 nF, 40% power • EV6212640.35µm, 600 MHz 1998 • Clock grid, 4 window panes, hierarchical, gated clock domains • 40 cm driver, 2.8 nF • EV70.18µm, 1.2 GHz 2002 • Multiple clock domains, DLLs Source : Jan Rabaey
Itanium 2 H-Tree • Four levels of buffering: • Primary driver • Repeater • Second-level clock buffer • Gater • Route around obstructions Source : Harris
Conclusion • Getting the clock everywhere on a die at the exact same time is difficult • Requires a lot of power to reduce skew (big drivers, wide wires, etc.) • Balanced H-trees are in common use • Design automation tools exist to synthesize these trees • Clocks must • be robust to variations/noise, • have relatively sharp slew rates