1 / 79

Clocked Storage Elements for High-Performance and Low-Power Systems The book under the same title is published by J. Wil

Clocked Storage Elements for High-Performance and Low-Power Systems The book under the same title is published by J. Wiley Pub. Co. Vojin G. Oklobdzija* June 23th, 2003 Presentation given at: EPFL Lausanne, Switzerland *Advanced Computer System Engineering Laboratory

lynton
Download Presentation

Clocked Storage Elements for High-Performance and Low-Power Systems The book under the same title is published by J. Wil

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Clocked Storage Elements for High-Performance and Low-Power SystemsThe book under the same title is published by J. Wiley Pub. Co. Vojin G. Oklobdzija* June 23th, 2003 Presentation given at: EPFL Lausanne, Switzerland *Advanced Computer System Engineering Laboratory University of California Davis Presentations available at: http://www.ece.ucdavis.edu/acsel

  2. Outline • Why working on Clocked Storage Elements ? • M-S Latch is not a Flip-Flop ! • How do we compare them ? • What are the relevant parameters ? • What is an appropriate setup ? • What do we use in high-performance microprocessors ? • How do they compare ? • What should we do for low-power ? • How do they compare ? • What next ? Ideas, Suggestions, Insights Prof. V.G. Oklobdzija, University of California

  3. Importance Prof. V.G. Oklobdzija, University of California

  4. ISSCC-2002 Clock trends in high-performance systems Prof. V.G. Oklobdzija, University of California

  5. Courtesy: Doug Carmean, Intel Corp, Hot-Chips-13 presentation Prof. V.G. Oklobdzija, University of California

  6. Why working on Clocked Storage Elements ? Example: In a 2.0 GHZ processor T=500pS • Typically clocked storage element D-Q delay is in the order of 100-150pS • If one can design a faster CSE: e.g. 80-100pS D-Q, this represents 10-15% performance improvement • If in addition one can absorb 20pS of clock uncertainties and embedd one level of logic – this can yield up to 20% performance improvement • Try to achieve 10-20% performance improvement by introducing new features in the architecture ! • This is sufficient to turn an architect into a circuit designer ! Prof. V.G. Oklobdzija, University of California

  7. Basic Definitions Prof. V.G. Oklobdzija, University of California

  8. Clock Generation and Distribution Non-idealities • Jitter • Jitter is a temporal variation of the clock signal manifested as uncertainty of consecutive edges of a periodic clock signal. • It is caused by temporal noise events • Manifested as: - cycle-to-cycle or short-term jitter, tJS - long-term jitter, tJL • Characteristic of clock generation system • Skew • Is a time difference between temporally-equivalent or concurrent edges of two periodic signals • Manifests as SE-to-SE fluctuation of clock arrival at the same time instance • Characteristic of clock distribution system • Caused by spatial variations in signal propagation Prof. V.G. Oklobdzija, University of California

  9. Clock Uncertainties Prof. V.G. Oklobdzija, University of California

  10. Difference between Latch and Flip-Flop Prof. V.G. Oklobdzija, University of California

  11. After the transition of the clock data can not change Latch is “transparent” Difference between Latch and Flip-Flop Prof. V.G. Oklobdzija, University of California

  12. Two-Phase Clocking with Two-Phase Double Latch Prof. V.G. Oklobdzija, University of California

  13. Two-Phase Clocking with One-Phase Double Latch Some people refer to this latch arrangement as: “negative edge Flip-Flop” ! Prof. V.G. Oklobdzija, University of California

  14. How can one recognize the difference without knowing what is inside the “black-box” ? Flip-Flop and M-S Latch Arrangement Prof. V.G. Oklobdzija, University of California

  15. F-F and M-S Latch: Difference Experiment: Failed ! Prof. V.G. Oklobdzija, University of California

  16. No Clock Pulse Capturing Latch Flip-Flop M-S Latch F-F and M-S Latch: Difference Structural Difference: S R Prof. V.G. Oklobdzija, University of California

  17. PG Theory of Operation: Sn+1 Prof. V.G. Oklobdzija, University of California

  18. R S Flip-Flop: Example-2 D=0 pulse D=1 SAFF DEC Alpha 21264 (Madden & Bowhill, 1990, Matsui 1994) Prof. V.G. Oklobdzija, University of California

  19. F-F Derivation using Delayed Clock Equivalent to: Prof. V.G. Oklobdzija, University of California

  20. Systematically Derived ET FF N. Nedovic, V. G. Oklobdzija, “Dynamic Flip-Flop with Improved Power”, ICCD 2000, Sept. 2000 Prof. V.G. Oklobdzija, University of California

  21. Flip-Flop: Example (HLFF, H. Partovi) Prof. V.G. Oklobdzija, University of California

  22. Flip-Flop: Example (HLFF, H. Partovi) Prof. V.G. Oklobdzija, University of California

  23. Timing and Power metrics Prof. V.G. Oklobdzija, University of California

  24. Delay • Sum of setup time U and Clk-Q delay is the only true measure of the performance with respect to the system speed • T = TClk-Q + TLogic + Tsetup+ Tskew T TD-Q=TClk-Q + TSetup TClk-Q TLogic TSetup Prof. V.G. Oklobdzija, University of California

  25. Delay vs. Setup/Hold Times Sampling Window Prof. V.G. Oklobdzija, University of California

  26. Timing Characteristics Prof. V.G. Oklobdzija, University of California

  27. Absorbing Clock Uncertainties Prof. V.G. Oklobdzija, University of California

  28. Hybrid Latch Flip-Flop Skew absorption Partovi et al, ISSCC’96 Prof. V.G. Oklobdzija, University of California

  29. Power Consumption • All power related to the SE can be divided into: • Input power • Data power (PD) • Clock power (PCLK) • Internal power (PINT) • Load power (PLOAD) • PLOAD can be merged into PINT • Internal power is a function of • data activity ratio () – number of captured data transitions with respect to number of clock transitions (max=100%) • no activity (0000… and 1111…) • maximum activity (0101010..) • average activity (random sequence) • Glitching activity • Delay is (minimum D-Q) • Clk-Q + setup time Prof. V.G. Oklobdzija, University of California

  30. State Element Performance Metrics It is always possible trade power for speed Common metrics: • Power-Delay Product (PDP) • Misleading measure • Good only if measured at constant frequency = EDP • EDP - Energy-Delay Product (EDP) • More accurate measure • ED2P – Energy-Delay2-Product • A new measure, being justified by new results (Hofstee, Nowka, IBM) Prof. V.G. Oklobdzija, University of California

  31. PDP, EDP Comparison High Voltage Low Voltage Slow Corner Prof. V.G. Oklobdzija, University of California

  32. Design & optimization tradeoffs • Opposite Goals • Minimal Total power consumption • Minimal Delay • Power-Delay tradeoff • Minimize Power-Delay product (PDPtot) @ f=const. Opt. Opt. Opt. Prof. V.G. Oklobdzija, University of California

  33. Clocked Storage Elements:Examples Prof. V.G. Oklobdzija, University of California

  34. Simulation Conditions: • Power Supply Voltage: VDD=1.8V nominal • Temperature T=27°C nominal • Technology: 0.18m Fujitsu • Fan-Out of 4 Delay = 75pS • Transistor Widths • Minimal 0.36m • Maximal 10m • Load: 14 minimal inverters in the technology used • Clock frequency: 500MHz (250MHz for Dual-Egde) • Data/Clock slopes of ideal signal 100ps Prof. V.G. Oklobdzija, University of California

  35. Transmission Gate MS Latch • Two staticized transmission gate transparent latches • Direct path D-Q consists of two transmission gates and two regenerative inverters • Two-phase clock • Advantage: symmetric high-to-low and low-to-high transitions are achievable • Disadvantage: large cost associated with two-phase clock distribution PowerPC 603 (Gerosa, JSSC 12/94) • Comments: • Very low internal power. • Large Total Power due to clock and data load Prof. V.G. Oklobdzija, University of California

  36. C2MOS MS Latch • Forward path consists of two clocked inverters - parts of C2MOS latches • Degradation of speed due to pMOS stacks • Degradation in speed due to non-ideal 2-phase clock • Large clock power (if not buffered locally) Y. Suzuki, “Clocked CMOS Calculator Circuitry”, IEEE J. Solid-State Circuits, Dec. 1973 Prof. V.G. Oklobdzija, University of California

  37. SAFF: Strong Arm 110 • Staticized Sense Amplifier Flip-Flop • Weak nMOS keeps set/reset signals low • Second stage – non-clocked SR latch • Additional NMOS transistor causes slightly increased power consumption and delay degradation • Bad timing characteristics due to the latching stage. Signal propagates through three stages. • Unbalanced rising and falling time of the output signals (speed degraded by 40%) Prof. V.G. Oklobdzija, University of California

  38. Modified SAFF • The first stage is unchanged sense amplifier • Second stage is sized to provide maximum switching speed • Driver transistors are large • Keeper transistors are small and disengaged during transitions Nikolic, Oklobdzija, Stojanovic ISSCC ‘99 V. Stojanovic, US Patent No. 6,232,810 Prof. V.G. Oklobdzija, University of California

  39. Systematicaly Derived SAFF: Example-2 • New pulse-generating stage • Inverters decoupling gates from MN3, MN4 • MN5, MN6 provide leakage current paths • Second stage is unchanged Nikolic, Oklobdzija, ESSCIRC’99 V. Stojanovic, US Patent No. 6,232,810. Prof. V.G. Oklobdzija, University of California

  40. Sense Amplifier-based Flip-Flop (SAbFF) • Emerged as a workaround for SAFF drawbacks • floating nodes (keeping the Sb, Rb nodes low with additional transistors parallel to data-controlled transistors) • symmetric second stage (push-pull realization) • Internal signals still experience transition on every clock cycle V. Stojanovic, US Patent No. 6,232,810. Prof. V.G. Oklobdzija, University of California

  41. Comparison with other SAFFs Nikolic, Oklobdzija, ESSCIRC’99 800 CMOS, nominal corner, Leff = 0.18m, VDD = 1.8V, T = 25C, load on both outputs 700 Falling Egde SAFF Clk-Output Delay [ps] w/NOR 600 500 Rising Egde 400 SAFF w/NAND 300 Rising Egde 200 SAFF Falling Egde SAFF Rising Egde SAFF this work 100 this work 0 0 50 100 150 200 250 Load [fF] Prof. V.G. Oklobdzija, University of California

  42. Conditional Capture Flip-Flop (CCFF) 0.18m Fujitsu; f = 500MHz; VDD = 1.8V; Data activity 50% • Principle of Operation • Suppress any transition in flip-flop if the input to be captured is equal to previous output value • Double-ended realization • FF functionality achieved by producing clock pulse • Static operation by use of keepers • Second stage is pass-transistor latch • Comments • Contention with keepers causes larger first stage • Large power consumption despite conditional signaling B. S. Kong, et all, ISSCC 2000 Prof. V.G. Oklobdzija, University of California

  43. Partovi’s HLFF • Hybrid Latch-Flip-Flop combination • Negative set-up time of -80pS • Robustness to clock skew and fast clocking Our simulations show AMD K-6, Partovi, ISSCC’96 • Gains • speed (negative setup time) • robustness to clock skew • Drawbacks • sensitivity to clock slope • relatively high internal power (due to precharge) Prof. V.G. Oklobdzija, University of California

  44. Semi-Dynamic Flip-Flop • Hybrid combination used in UltraSPARC-III • Very fast circuit ( 173pS Clk-Q delay .18u technology, 1.8V, 27oC ) • Problem D=Q=1: Our simulations shows F. Klass, VLSI Circuits’98 • Negative setup time • Feature of small penalty for embedded logic • Relatively high internal power consumption and clock load Prof. V.G. Oklobdzija, University of California

  45. Transmission Gate Flip-Flop (TGFF) • Two transmission gates define transparency window • Time window with non precharge-evaluate structure • Low input activity => low output activity • Comments: • Two transmission gates increase delay • Noticeable data power Prof. V.G. Oklobdzija, University of California

  46. Comparison Prof. V.G. Oklobdzija, University of California

  47. Overall Results 4 fo4 2 fo4 Prof. V.G. Oklobdzija, University of California

  48. Overall Results Prof. V.G. Oklobdzija, University of California

  49. Overall Results Prof. V.G. Oklobdzija, University of California

  50. Conventional Clk-Q vs. minimum D-Q • Hidden positive setup time • Degradation of Clk-Q Older 0.22u comparison results Prof. V.G. Oklobdzija, University of California

More Related