1 / 26

High-Performance Timing Analysis and Delay Reduction Techniques

Learn about timing diagrams, delay analysis, clock frequency, glitch reduction, MOSFET layout, and strategies for driving large loads in static CMOS circuits. Understand the impact of capacitances and gate delays on performance optimization. Explore practical examples.

dunlap
Download Presentation

High-Performance Timing Analysis and Delay Reduction Techniques

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture #23 Performance OUTLINE • Timing diagrams (from Lecture 22) • Delay analysis (from Lecture 22) • Maximum clock frequency - three figures of merit • Continuously-switched inverters • Ring oscillators Reading (Rabaey et al.) Parts of Ch. 5: Pages 179-184; 193-203; 212-217; 220-227; 230-232 Perspective and Summary

  2. tpHL tpLH 1 F t 0 To further simplify timing analysis, we can define the propagation delay as Propagation Delay in Timing Diagrams • To simplify the drawing of timing diagrams, we can approximate the signal transitions to be abrupt (though in reality they are exponential). A F 1 A t 0

  3. B 1 t 0 B•C 1 t 0 A+B 1 t 0 F 1 0 Glitching Transitions A,B,C The propagation delay from one logic gate to the next can cause spurious transitions, called glitches, to occur. (A node can exhibit multiple transitions before settling to the correct logic level.) 1 t 0 tp 2tp 3tp A+B A B F B C B•C t

  4. Glitch Reduction • Spurious transitions can be minimized by balancing signal paths Example: F = A•B•C•D

  5. MOSFET Layout and Cross-Section Top View: Cross Section:

  6. Source and Drain Junction Capacitance Csource = Cj(AREA) + Cjsw(PERIMETER) = CjLSW + CJSW(2LS + W)

  7. V DD Computing the Output Capacitance Example 5.4 (pp. 197-203) 2l=0.25mm Out In PMOS W/L=9l/2l Poly-Si Out In NMOS W/L=3l/2l GND Metal1

  8. V DD 2l=0.25mm • Capacitances for 0.25mm technology: • Gate capacitances: • Cox(NMOS) = Cox(PMOS) = 6 fF/mm2 • Overlap capacitances: • CGDO(NMOS) = Con = 0.31fF/mm • CGDO(PMOS)= Cop = 0.27fF/mm • Bottom junction capacitances: • CJ(NMOS) = KeqbpnCj = 2 fF/mm2 • CJ(PMOS) = KeqbppCj = 1.9 fF/mm2 • Sidewall junction capacitances: • CJSW(NMOS) = KeqswnCj = 0.28fF/mm • CJSW(PMOS) = KeqbppCj = 0.22fF/mm PMOS W/L=9l/2l Out In NMOS W/L=3l/2l GND

  9. Typical MOSFET Parameter Values • For a given MOSFET fabrication process technology, the following parameters are known: • VT (~0.5 V) • Coxand k (<0.001 A/V2) • VDSAT ( 1 V) • l ( 0.1 V-1) Example Req values for 0.25 mm technology (W = L):

  10. Compute propagation delays

  11. Examples of Propagation Delay • Typical clock periods: • high-performance mP: ~15 FO4 delays • PlayStation 2: 60 FO4 delays

  12. V DD MP1 v out CL + v in - MN1 STATIC CMOS DRIVING LARGE LOADS The load, CL , may be the capacitance of a long line on the chip (e.g. up to 1pF, or may be the load on one of the chip output pins (e.g. up to 50pF). We have seen that the typical driving resistance R for a minimum sized inverter is in the range of 10 KW. A 1 KW resistor driving a 50pF load would have a stage delay of 35nsec, huge in comparison to normal stage delays. Thus we need to use larger devices to drive large capacitive loads, that is greatly increase W/L. However, increasing W/L of a stage will increase the load it presents to the stage driving it, and we just move the delay problem back one stage.

  13. V V DD DD MPB MP1 v out PROPOSED SOLUTION: Insert several simple inverter stages with increasing W/L between Inverter 1 + v in - MNB MN1 and the load CL. The total delay through the multiple stages will be less than the delay of one single stage driving CL. V DD MPB2 MPB1 MPB3 MP1 v out CL + v in - MNB2 MNB1 MNB3 MN1 STATIC CMOS DRIVING LARGE LOADS PROBLEM: A minimum sized inverter drives a large load, CL, leading to excessive delay, even with a buffer stage. CL

  14. STATIC CMOS DRIVING LARGE LOADS V DD MPB MP1 v out + v in 50 pF - MNB MN1 W/L = 4 W/L = 9615 Example: The 2.5V 0.25mm CMOS inverter driving 50 pF load. Properties:W/L|N =1/.25, W/L|P =2/.25, VDD = 2.5V, VT = 0.5V. Rn = 13 KW /4 = 3.25 KW ; Rp = 31 KW /8 = 3.75 KW 5nm oxide thickness , Cox =6.9 fF/mm2. NMOS: CGp = W x L x Cox =1.7fF. PMOS : CGp = W x L x Cox =3.4fF. Thus CIN= 5.2fF Basic gate delay (0.69RC) is about 10pS. If we size one inverter to drive the load with this time constant it requires a W/L increase by a factor of 50pF/5.2fF =9615. So CIN= 50000fF =50pF for the buffer gate! Thus the gate delay for the first stage is (50000/5.2)X10pS = 96.1nS. Total delay = 96.1 + .01 = 96.11nS. TOO LONG and NO IMPROVEMENT! Note: We are ignoring drain capacitance in these examples.

  15. V DD MPB2 MPB1 MPB3 MP1 v out CL + v in - MNB2 MNB1 MNB3 MN1 STATIC CMOS DRIVING LARGE LOADS Same example with tapered device sizes (geometric series) Case 1: Same example, but with buffer devices scaled by factor of 98 (982=9615 ) Stage 1 load = 98 X 5.2fF, (R= 3.5K) Stage 2 load = 50 pF , (R = 3.5K /98) Delay = 98 X 10pS + 96nS/98 =0.98 +0.98 nS ~2nS Case 2: Now taper through 3 buffer stages with W/L ratios of 9.9 (9.94=9615) 4 equal gate delays of 9.9 x 10pS =99pS Total = 4 X .099nS ~0.4nS Gate delay through 4 gates is much less than through 2! Note: We are ignoring drain capacitance in these examples.

  16. V DD MPB2 MPB1 MPB3 MP1 v out CL + v in - MNB2 MNB1 MNB3 MN1 STATIC CMOS DRIVING LARGE LOADS Comments In our example we got better results with 3 buffer stages than 1. 7 buffer stages would do even better. How many buffer stages are optimum? Well under these simple assumptions (like ignoring drain and wiring capacitance, and operating asynchronously) you can show that the number of buffer stages, N obeys N +1 = ln(R) where R is the ratio of the load capacitance to the capacitance of a minimum sized stage. This formula is not important, but you should remember the concept that buffering with multiple stages usually leads to lower net delay if the load is large.

  17. V DD MP4 MP3 v out1 = v in2 + v in1 - MN2 MN1 1) We have defined the unit delaytp as the time until Vout1 reaches VDD /2 starting at either 0V (rising) or VDD (falling) . Vin1 is a step function. How to measure inverter performance? There are two other measures of performance which we can also consider: 2) The stage delay when the input is a continuous square-wave clock input. 3)The delay of a pulse through a multi-stage “ring oscillator”,

  18. V DD Suppose Vin1 goes from low to high. MP4 MP3 v out1 = v in2 0.5 VDD + v in1 - MN2 MN1 tp Vout1 goes from VDD to ground. We defined the inverter delay tpHLas the time until Vout1 reaches VDD /2 . Unit gate delay performance measurement V VDD t Because when it reaches this value, the following stage will sense that its input has switched from high to low. Similarly tpLH is the time for the output to rise from zero to VDD /2 when the input is falling. Maximum frequency is just 1/(tpHL+ tpLH) The properly designed stage will have similar delay time for rising input as for falling input. (Design proper ratio of Wp to Wn)

  19. VIN , VX VDD Vh etc. In Vl 1/f   t5 t4 t1 t3 t2 Lets follow VX for VIN starting at t=0 can solve simultaneously given Dt/RC Driving Inverters (or gates) with Square-Wave Clock Node X loaded by CX Inverter 1 has output resistance Rp or Rn Output slowly converges to sawtooth waveform. Let’s find relationship between max and min values vh and vl after many many cycles: (1) Pull down: (2) Pull up: Example:

  20. VDD etc. In t5 t4 t1 t3 t2 1/f   Square-Wave Drive Inverter 2 will operate correctly so long as VX passes through vil and vih. We approximate response of devices in inverter 2 as instantaneous (remember the steep transfer curve). Let’s look at VX after a long time. When VX crosses down through vil, inverter 2 switches, and when it crosses up through vih, it switches back V ih V il

  21. MAXIMUM CLOCK FREQUENCY fmax : Increase f until inverter 2 fails to toggle because its input does not pass through its threshold(s). In general, Rp Rn, so rise or fall is slower. If frequency increases when will inverter fail? If VX does not pass through Vil or Vih, because frequency is too high.

  22. Example: Take R = 3 K, C = 5 fF, tpHL = tpLH = 0.69 RC = 10pS ; So fmax1 = 50GHz Now consider the square-wave drive case: Take VDD=2.5V, Vih = 1.5, Vil = 1V , so in this symmetric case: - Δt/R C - = = + Δt/R C v V e and v V ( V - V )e p n il ih ih DD il DD Solving either equation with RC = 15pS, Dt = 6.1pS; fmax2 = 1012/12.2=82GHz (obviously this result depends on our somewhat arbitrary choice for Vih and Vil ) V ih V il

  23. 1 2 3 n 4 Let the average delay per stage be tMIN then the time around loop is N  tMIN . One period is twice around the loop, so , something very easy to measure. [ If tMIN is 20pSec but N is 1001, the period 1/ fRO is 40 nSec.] Now we. define fmax* by ,so could be 1001 easy to measure (low frequency) Ring Oscillator Odd number of stages As soon as the inverter 1 drives inverter 2’s input past Vil (falling) or Vih (rising), inverter 2 switches and starts driving input node of  toward its switch point, etc. Note: V starts at 0V (rising) or VDD (falling) WHY? Result: Signal propagates along chain at another kind of maximum clock frequency fmax* (really maximum propagation frequency ) NOTE: fmax *< fmax2 WHY?

  24. 1=VDD 0=0V 1 0 1 0 close switch Odd number of stages Ring Oscillator As soon as the switch closes inverter 5 drives inverter 1’s input up (starting at 0 V). When it reaches Vih inverter 1 switches and starts driving input node of inverter two down, starting at VDD. . We note that the transient always starts at 0 or VDD and ends at Vih or Vil , respectively. This clearly takes longer than the clock-driven chain of inverter transient. Need to solve same exponential equations as in square-wave drive, but with different limits: Up: Start at 0, end at Vih. Vih = VDD[1-exp(-DtLH/RpC)] Down: Start at VDD, end at Vil. Vil = VDD[exp(-DtHL/RnC)] Solve for DtLH and DtHL and avg. to get tMIN: tMIN= (DtLH + DtHL )/2

  25. 1=VDD 0=0V 1 0 1 0 close switch 101 Stages, same parameters: (RC = 15 pS) Ring Oscillator Example From Vih = VDD[1-exp(-DtLH/RpC)] we find DtLH = 13.7pS Similarly from Vil = VDD[exp(-DtHL/RnC)] DtHL = 13.7pS Thus the delay through 101 stages, twice is 202 X 13.7 =2.78nS. The ring oscillator frequency is 109/2.78 = 360 MHz. Finally, fmax* = 360 X 101 = 36 GHz. This is of course less than either the 50GHz estimated from unit gate delay or the 82 GHz estimated from square-wave driven max toggle frequency.

More Related