440 likes | 471 Views
Issues in System on the Chip Clocking November 6th, 2003 SoC Design Conference, Seoul, KOREA. Vojin G. Oklobdzija Advanced Computer System Engineering Laboratory University of California Davis Presentation available at: http://www.ece.ucdavis.edu/acsel. Directions in SoC Clocking.
E N D
Issues in System on the Chip ClockingNovember 6th, 2003SoC Design Conference, Seoul, KOREA Vojin G. Oklobdzija Advanced Computer System Engineering Laboratory University of California Davis Presentation available at:http://www.ece.ucdavis.edu/acsel
Directions in SoC Clocking • Synchronous / Asynchronous paradigm • Synchronous solutions: • Clock uncertainty absorption • Time borrowing • Skew-Tolerant Domino • Using both edges of the clock • Conclusion Prof. V.G. Oklobdzija, University of California
ISSCC-2002 Clock frequency trends Prof. V.G. Oklobdzija, University of California
Processor Frequency Trends Courtesy of: Intel, S. Borkar • Frequency doubles each generation • Number of gates/clock reduce by 25% Prof. V.G. Oklobdzija, University of California
Multi-GHz Clocking Problems • Fewer logic in-between pipeline stages: • Out of 7-10 FO4 allocated delays, FF can take 2-4 FO4 • Clock uncertainty can take another FO4 • The total could be ½ of the time allowed for computation Prof. V.G. Oklobdzija, University of California
Clock Uncertainties Prof. V.G. Oklobdzija, University of California
Motivation for Improving on Clocked Storage Elements Example: In a 2.0 GHZ processor T=500pS • Typically clocked storage element D-Q delay is in the order of 100-150pS • If one can design a faster CSE: e.g. 80-100pS D-Q, this represents 10-15% performance improvement • If in addition one can absorb 20pS of clock uncertainties and embedd one level of logic – this can yield up to 20% performance improvement • Try to achieve 10-20% performance improvement by introducing new features in the architecture ! • This is sufficient to turn an architect into a circuit designer ! Prof. V.G. Oklobdzija, University of California
Consequences of multi-GHz Clocks • Pipeline boundaries start to blur • Clocked Storage Elements must include logic • Wave pipelining, domino style, signals used to clock ….. • Synchronous design only in a limited domain • Asynchronous communication between synchronous domains Prof. V.G. Oklobdzija, University of California
Synchronous / Asynchronous Design on the Chip • 1 Billion transistors on the chip by 2005-6 • 64-b, 4-way issue logic core requires ~2 Million Table 1: Transistor count in typical RISC processors Prof. V.G. Oklobdzija, University of California
Synchronous / Asynchronous Design on the Chip 10 million transistors 1 Billion Transistors Chip Prof. V.G. Oklobdzija, University of California
Two views of the world: - Asynchronous - Synchronous Prof. V.G. Oklobdzija, University of California
Asynchronous Paradigm • Logic Stage can take any time it needs • Max. Speed limited by Handshake overhead • Increased complexity of logic (de-glitching) Prof. V.G. Oklobdzija, University of California
Synchronous Paradigm • Max Speed determined by the slowest logic block • Latch / FF timing overhead • Fixed clock frequency (set by longest path) Prof. V.G. Oklobdzija, University of California
Synchronous Paradigm • Clocked Storage Elements: Flip-Flops and Latches should be viewed as synchronization elements, not merely as storage elements ! • Their main purpose is to synchronize fast and slow paths: • prevent the fast path from corrupting the state Prof. V.G. Oklobdzija, University of California
Synchronous World: Tricks and Solutions • Clocked Storage Elements with clock uncertainty absorption features • Time Borrowing • Incorporation of Synchronization features into the logic • Skew Tolerant Domino • Utilizing both edges of the Clock Prof. V.G. Oklobdzija, University of California
Clocked Storage Element Overhead D Q Logic D Q N • The time taken from the pipeline by the CSE is U and Clk-Q delay. Thus, D-Q delay is relevant, not Clk-Q : T = TClk-Q + TLogic + U+ Tskew Clk Clk T TClk-Q TLogic U TD-Q=TClk-Q + U Tskew Prof. V.G. Oklobdzija, University of California
350 300 Minimum Data-Output 250 200 Clk-Output [ps] 150 Setup Hold 100 50 0 -200 -150 -100 -50 0 50 100 150 200 Data-Clk [ps] Delay vs. Setup/Hold Times Sampling Window Prof. V.G. Oklobdzija, University of California
Clock Uncertainty Absorption Prof. V.G. Oklobdzija, University of California
Single-Ended Skew Tolerant Flip-Flop Nedovic, Oklobdzija, Walker, ISSCC 2003 Prof. V.G. Oklobdzija, University of California
Clock Uncertainty Absrobtion Worst-case D DQ Nominal D D-Clk D Clock uncertainty t CU Early D D-Clk Late D D-Clk T =0 Nominal Clk Q D DQm D DQM Prof. V.G. Oklobdzija, University of California
Clock Uncertainty Absorption t =30ps t =100ps CU CU Clk Clk U =-5ps Opt D D 3ps 44ps U =30ps Q Q Opt D =261ps D =220ps DQM DQM (b) t =100ps ( a =56% ) ( a ) t =30ps ( a =90% ) CU CU CU CU Prof. V.G. Oklobdzija, University of California
Synchronous World: Tricks and Solutions • Clocked Storage Elements with clock uncertainty absorption features • Time Borrowing • Incorporation of Synchronization features into the logic • Skew Tolerant Domino • Utilizing both edges of the Clock Prof. V.G. Oklobdzija, University of California
Time Borrowing Prof. V.G. Oklobdzija, University of California
Critical Path with Time Borrowing Prof. V.G. Oklobdzija, University of California
Latches as synchronizers • The purpose of CSE it is to synchronize data flow. • We need to insert CSE to prevent “fast paths” from reaching the next logic stage too early. • If the signal arrives late – it is allowed to borrow time from the next stage • However, borrowing can not go for ever ….. Prof. V.G. Oklobdzija, University of California
Using Single Pulsed Latch Prof. V.G. Oklobdzija, University of California
Single Pulsed Latch *Courtesy of D. Markovic & Intel MRL Prof. V.G. Oklobdzija, University of California
Optimal Single Latch Clocking Single Latch System (Unger & Tan ‘83): Pm=P ≥ DLM+DDQM {miminal clock period} DLm>DLmB≥W+TT+TL+H-DCQm {shortest path} Wopt=TL+TT+U+DCQM-DDQM {minimal clock width} Example: 0.10mTechnology FO4=25-40pS, FF=80pS, Tunc=25-35pS, fmax=2.5-4. GHz, T=250-400pS Wopt~2Tunc~50-70pS DLm~4Tunc+H-DCQm~100-140pS {this is close to ½ of a cycle} Prof. V.G. Oklobdzija, University of California
Synchronous World: Tricks and Solutions • Clocked Storage Elements with clock uncertainty absorption features • Time Borrowing • Incorporation of Synchronization features into the logic • Skew Tolerant Domino • Utilizing both edges of the Clock Prof. V.G. Oklobdzija, University of California
Skew-Tolerant Domino(a.k.a. Opportunistic Time Borrowing)Intel Patent No.5,517,136 May 14, 1996 Prof. V.G. Oklobdzija, University of California
CMOS Domino as Memory Element • After the input changes – output remembers it • Pre-charge destroys the information • Proper phasing of the clock can allow passing the information from stage to stage Prof. V.G. Oklobdzija, University of California
Skew-Tolerant Domino Prof. V.G. Oklobdzija, University of California
Synchronous World: Tricks and Solutions • Clocked Storage Elements with clock uncertainty absorption features • Time Borrowing • Incorporation of Synchronization features into the logic • Skew Tolerant Domino • Utilizing both edges of the Clock Prof. V.G. Oklobdzija, University of California
Dual-Edge Triggered CSE • DET-CSE samples the input data on both edges of the clock • Reducing power consumption • Half of the original clock frequency for the same data throughput • Half of clock generation/distribution/SE-clock-related power is saved • However, it may introduce an overhead Prof. V.G. Oklobdzija, University of California
Dual-Edge Triggered Storage Element Topologies • Structurally, there are two different designs • Latch-Mux (LM) • Flip-Flop (FF) DET-Flip-Flop Non-transparency achieved by MUX DET-Latch Prof. V.G. Oklobdzija, University of California
Comparison with Single Edge SEs Prof. V.G. Oklobdzija, University of California
Comparison with Single Edge CSEs Prof. V.G. Oklobdzija, University of California
Single and Double Edge Triggered SE: Power Consumption (a=50%) Prof. V.G. Oklobdzija, University of California
Fo4=2.9 Prof. V.G. Oklobdzija, University of California
Symmetric Pulse Generator Flip-Flop (SPG-FF) Nedovic, Oklobdzija, Walker, ESSCIRC 2002 Prof. V.G. Oklobdzija, University of California
Conclusion • Clocking is the next challenge. Current clocking techniques may hold up to 10 GHz. Afterwards the pipeline boundaries start to vanish while more exotic clocking techniques will find their use. Synchronous design will be possible only in limited domains on the chip. A mix of Synchronous and Asynchronous design may emerge even in digital logic. • Synchronous Design: • Has not exhausted all the tricks • Asynchronous Design: • Has not solved all the problems • We need solutions from both for a successful SoC Design Prof. V.G. Oklobdzija, University of California