450 likes | 570 Views
Low-Latency Interfaces for Mixed-Timing Domains [in DAC-01]. Tiberiu Chelcea Steven M. Nowick Department of Computer Science Columbia University {tibi,nowick}@cs.columbia.edu. Introduction. Key Trend in VLSI systems: systems-on-a-chip (SoC) Two fundamental challenges:
E N D
Low-Latency Interfaces for Mixed-Timing Domains[in DAC-01] Tiberiu Chelcea Steven M. Nowick Department of Computer Science Columbia University {tibi,nowick}@cs.columbia.edu
Introduction Key Trend in VLSI systems: systems-on-a-chip (SoC) Two fundamental challenges: • mixed-timing domains • long interconnect delays Our Goal: design of efficient interface circuits Desirable Features: • arbitrarily robust • low-latency, high-throughput • modularity, scalability Few satisfactory solutions to date….
Timing Issues in SoC Design (a) single-clock (b) mixed-timing domains sync or async Domain #1 Domain #1 longinter- connect longinter- connect Domain #2 sync or async Domain #2
Timing Issues in SoC Design (cont.) Solution: provide interface circuits (a) single-clock (b) mixed-timing domains sync or async Domain #1 Domain #1 longinter- connect longinter- connect sync or async Domain #2 Domain #2 Carloni et al., “relay stations” NEW: “mixed-timingFIFO’s” NEW: “mixed-timing“relay stations”
Contributions Complete set of mixed-timing interface circuits: • sync-sync, async-sync, sync-async, async-async Features: • Arbitrary Robustness: wrt synchronization failures • High-Throughput: • in steady-state operation: no synchronization overhead • Low-Latency:“fast restart” • in empty FIFO: only synchronization overhead • Reusability: • each interface partitioned into reusable sub-components Two Contributions: • Mixed-Timing FIFO’s • Mixed-Timing Relay Stations
Contribution #1: Mixed-Timing FIFO’s Addresses issue of interfacing mixed-timing domains Features: token ring architecture • circular array of identical cells • shared buses: data + control • data: “immobile” once enqueued • distributed control: allows concurrent put/get operations 2 circulating tokens: define tail & head of queue Potential benefits: • low latency • low power • scalability
Contribution #2: Mixed-Timing Relay Stations Addresses issue of long interconnect delays “Latency-Insensitive Protocols”: safely tolerate long interconnect delays between systems Prior Contribution: introduce “relay stations” • single-clock domains (Carloni et al., ICCAD-99) Our Contribution: introduce “mixed-timing relay stations” • mixed-clock (sync-sync) • async-sync First proposed solutions to date….
Related Work Single-Clock Domains: handling clock discrepancies • clock skew and jitter (Kol98, Greenstreet95) • long interconnect delays (Carloni99) Mixed-Timing Domains: 3 common approaches • Use “Wrapper Logic”: • add logic layer to synchronize data/control (Seitz80, Seizovic94) • drawback:long latencies in communication • Modify Receiver’s Clock: • stretchable and pausible clocks (Chapiro84, Yun96, Bormann97, Sjogren/Myers97) • drawback: penalties in restarting clock
Related Work: Closer Approaches Mixed-Timing Domains (cont.): • Interface Circuits: Mixed-Clock FIFO’s (Intel, Jex et al. 1997): • drawback: significant area overhead = synchronizerfor each cell Our approach: mixed-clock FIFO’s • … only 2 synchronizers for entire FIFO
Outline • Mixed-Clock Interfaces • FIFO • Relay Station • Async-Sync Interfaces • FIFO • Relay Station • Results • Conclusions
Initiates put operations Indicates data items validity (always 1 in this design) Initiates get operations Indicates when FIFO full Bus for data items Indicates when FIFO empty Bus for data items Controls put operations Controls get operations Mixed-Clock FIFO: Block Level full req_get valid_get req_put Mixed-Clock FIFO synchronous put inteface synchronous get interface empty data_put data_get CLK_put CLK_get
Sender starts a put operation Put Controller enables a put operation FIFO not full TAIL Cell enqueues data Full Detector Put Controller Get Controller Empty Detector HEAD Mixed-Clock FIFO: Steady-State Simulation At the end of clock cycle Steady state: FIFO neither full, nor empty full req_put data_put CLK_put CLK_get data_get req_get valid_get empty
Passes the put token TAIL Full Detector Put Controller Get Controller Empty Detector HEAD Mixed-Clock FIFO: Steady-State Simulation full req_put data_put CLK_put CLK_get data_get req_get valid_get empty
TAIL Full Detector Put Controller Get Controller Empty Detector HEAD Mixed-Clock FIFO: Steady-State Simulation full req_put data_put CLK_put CLK_get data_get req_get valid_get empty Get Operation
TAIL Full Detector Put Controller Get Controller Empty Detector HEAD Steady state operation: Puts and Gets “reasonably spaced” Zero probability of synchronization failure Steady state operation: Zero synchronization overhead Mixed-Clock FIFO: Steady-State Simulation full req_put data_put CLK_put CLK_get data_get req_get valid_get empty
TAIL TAIL TAIL Full Detector Put Controller Get Controller Empty Detector HEAD Mixed-Clock FIFO: Steady-State Simulation full req_put data_put CLK_put CLK_get data_get req_get valid_get empty
Put interface stalled TAIL Full Detector Put Controller Get Controller Empty Detector HEAD Mixed-Clock FIFO: Full Scenario FIFO FULL full req_put data_put CLK_put CLK_get data_get req_get valid_get empty
TAIL Full Detector Put Controller Get Controller Empty Detector HEAD Mixed-Clock FIFO: Full Scenario full req_put data_put CLK_put CLK_get data_get req_get valid_get empty
TAIL Full Detector Put Controller Get Controller Empty Detector HEAD Mixed-Clock FIFO: Full Scenario FIFO NOT FULL full req_put data_put CLK_put CLK_get data_get req_get valid_get empty
TAIL Full Detector Put Controller Get Controller Empty Detector HEAD Mixed-Clock FIFO: Full Scenario full req_put data_put CLK_put CLK_get data_get req_get valid_get empty
Data item in En Enables a put operation Validity bit in Synchronous Put Part reusable reusable En en_put req_put data_put ptok_out gtok_out gtok_in ptok_in En Data Validity Controller Status Bits: f_i Cell FULL SR e_i Cell EMPTY En valid data_get en_get Synchronous Get Part Data item out Enables a get operation Validity bit out Mixed-Clock FIFO: Cell Implementation CLK_put en_put req_put data_put ptok_out ptok_in f_i REG e_i gtok_out gtok_in CLK_get en_get valid data_get
FIFO not full Full Detector Put Controller Get Controller Empty Detector Mixed-Clock FIFO: Architecture full req_put data_put CLK_put CLK_get data_get req_get valid_get empty
Synchronization Issues Challenge: interfaces are highly-concurrent • Global “FIFO state”: controlled by 2 different clocks Problem #1: Metastability • Each FIFO interface needs clean state signals Solution:Synchronize “full” & “empty” signals • “full” with CLK_put • “empty” with CLK_get Add 2 (or more) synchronizing latches to each signal Observable “full”/“empty”safely approximate true FIFO state
CLK_put full e_0 e_1 e_2 e_3 e_1 e_2 e_3 e_0 Two consecutive empty cells = FIFO not full CLK_put CLK_put Synchronizing Latches NO two consecutive empty cells Synchronization Issues (cont.) Problem #2:FIFO now may underflow/overflow! • synchronizing latches add extra latency Solution: Modify definitions of “full” and “empty” New FULL:0 or 1 empty cells left New EMPTY:0 or 1 full cells left New Full Detector
Synchronization Issues (cont.) Problem #3:Potential for deadlock Scenario: suppose only 1 data item in quiescent FIFO • FIFO still considered “empty” (new definition) • Get interface: cannot dequeue data item! Solution:bi-modal “empty detector”, combines: • “New empty” detector (0 or 1 data items) • “True empty” detector (0 data items) Two results folded into single global “empty” signal
Combine into global “empty” Detects “new empty” (0 or 1 empty cells) When NOT reconfigured, use “oe”: FIFO quiescent avoids deadlock When reconfigured use “ne”: FIFO active avoids underflow CLK_get CLK_get CLK_get CLK_get Detects “true empty” (0 empty cells) Reconfigure whenever active get interface Synchronization Issues: Avoiding Deadlock Bi-modal empty detection: select either ne or oe CLK_get ne f_0 f_1 f_2 f_3 f_1 f_2 f_3 f_0 empty en_get CLK_get oe f_0 f_1 f_2 f_3 req_get
FIFO not full Full Detector Put Controller Get Controller Empty Detector Mixed-Clock FIFO: Architecture full req_put data_put CLK_put CLK_get data_get req_get valid_get empty
Put Controller: enables put operation disabled when FIFOfull Get Controller: enables get operation indicates when data valid disabled when FIFOempty Put/Get Controllers en_get req_get en_put full req_put valid_get empty valid
Outline • Mixed-Clock Interfaces • FIFO • Relay Station • Async-Sync Interfaces • FIFO • Relay Station • Results • Conclusions
system 1 now sends “data packets” to system 2 system 1 sends “data items” to system 2 Delay = > 1 cycle Delay = 1 cycle RS RS RS RS Data Packet = • “stop” control = stopIn + stopOut • apply counter-pressure • result: stall communication data item + CLK Steady State: pass data on every cycle (either valid or invalid) validity bit Problem: Works only for single-clock systems! Relay Stations: Overview Proposed by Carloni et al. (ICCAD’99) System 1 System 2
MR mux switch AR Control Relay Stations: Implementation • In normal operation: • packetIn copied to MR and forwarded onpacketOut • When stopped (stopIn=1): • stopOutraised on the next clock edge • extra packet copied to AR packetIn packetOut stopOut stopIn
Steady state:always pass data Data items: both valid & invalid Stopping mechanism:stopIn & stopOut Steady state:only pass data when requested Data items:only valid data Stopping mechanism: none (only full/empty) Mixed- Clock FIFO Relay Station Relay Station vs. Mixed-Clock FIFO full empty validOut validIn stopOut stopIn req_put req_get dataOut dataIn dataIn dataOut
NEW MCRS RS RS RS RS CLK2 CLK1 Change ONLY Put and Get Controllers full req_get stopOut stopIn valid_get req_put valid_get valid_put Mixed-Clock FIFO Mixed-Clock Relay Station empty packetIn packetOut data_put CLK1 CLK2 data_get data_put data_get CLK_put CLK_get Mixed-Clock Relay Stations (MCRS) System 1 System 2 CLK Mixed-Clock Relay Station derived from the Mixed-Clock FIFO
Identical: - FIFO cells - Full/Empty detectors(...or can simplify) Only modify: Put & Get Controllers Always enqueue data (unless full) Mixed-Clock Relay Station: Implementation Mixed-Clock Relay Station vs. Mixed-Clock FIFO en_get stopIn en_put full validOut empty validIn to cells valid Put Controller Get Controller
Outline • Mixed-Clock Interfaces • FIFO • Relay Station • Async-Sync Interfaces • FIFO • Relay Station • Results • Conclusions
Async-Sync FIFO: Block Level Asynchronous put interface: uses handshaking communication • put_req: request operation • put_ack: acknowledge completion • no “full” signal Synchronous get interface: no change req_get req_get full put_req valid_get valid_get req_put put_ack Mixed-Clock FIFO Async-Sync FIFO empty empty data_put data_get put_data data_get CLK_put CLK_get CLK_get Async Domain Sync Domain
No Full Detector or Put Controller When FIFO full, acknowledgement withheld until safe to perform the put operation Asynchronous put interface Get Controller Empty Detector Get interface: exactly as in Mixed-Clock FIFO Async-Sync FIFO: Architecture put_ack put_req put_data cell cell cell cell cell CLK_get data_get req_get valid_get empty
Asynchronous Put Part Data Validity Controller reusable C OPT + from async FIFO (Async00) new DV En reusable (from mixed-clock FIFO) Synchronous Get Part Async-Sync FIFO: Cell Implementation put_ack put_req put_data we we1 e_i REG f_i gtok_in gtok_out CLK_get en_get get_data
System 1 (async) System 2 (sync) ARS ARS RS Async-Sync Relay Stations (ASRS) Micropipeline ASRS optional CLK2
Outline • Mixed-Clock Interfaces • FIFO • Relay Station • Async-Sync Interfaces • FIFO • Relay Station • Results • Conclusions
Results Each circuit implemented: • using both academic and industry tools • MINIMALIST: Burst-Mode controllers [Nowick et al. ‘99] • PETRIFY: Petri-Net controllers [Cortadella et al. ‘97] Pre-layout simulations: 0.6m HP CMOS technology Experiments: • various FIFO capacities (4/8/16 cells) • various data widths (8/16 bits)
Results: Latency Experimental Setup: - 8-bit data items - various FIFO capacities (4, 8, 16) Latency = time from enqueuing to dequeueing data into an empty FIFO For each design, latency not uniquely defined: Min/Max
Results: Maximum Operating Rate Synchronous interfaces: MegaHertz Asynchronous interfaces: MegaOps/sec Put vs. Get rates: - sync put faster than sync get - async put slower than sync get
Conclusions Introduced several new low-latency interface circuits Address 2 major issues in SoC design: • Mixed-timing domains • mixed-clock FIFO • async-sync FIFO • Long interconnect delays • mixed-clock relay station • async-sync relay station Other designs implemented and simulated: • Sync-Async FIFO + Relay Station • Async-Async FIFO + Relay Station Reusable components: mix & match to build circuits Provide useful set of interface circuits for SoC design