440 likes | 566 Views
Buffered Crossbars With Performance Guarantees. EE384Y Thursday, April 29, 2004. Shang-Tse (Da) Chuang Department of Electrical Engineering, Stanford University, http://yuba.stanford.edu/~stchuang. Motivation. Network operators want performance guarantees Throughput guarantee
E N D
Buffered Crossbars With Performance Guarantees EE384Y Thursday, April 29, 2004 Shang-Tse (Da) Chuang Department of Electrical Engineering, Stanford University, http://yuba.stanford.edu/~stchuang
Motivation • Network operators want performance guarantees • Throughput guarantee • Delay guarantee • High performance routers use crossbars • Hard to build crossbar-based routers with guarantees • My talk: • How a crossbar with a small amount of internal buffering can give guarantees
Contents • Throughput Guarantees • Buffered Crossbar - 100% Throughput • Buffered Crossbar - Work Conservation • Delay Guarantees • Traditional Crossbar – Emulating an OQ Switch • Buffered Crossbar – Emulating an OQ Switch
Generic Crossbar-Based Architecture Speedup of S VOQs Scheduler
Admissible Traffic • Traffic Matrix • Traffic is admissible if
Speedup of S Scheduler Throughput Guarantee • 100% Throughput • An algorithm delivers 100% throughput if for any admissible traffic the average backlog is finite
Previous Work Heuristics Wave Front Arbiter [Tamir] Parallel Iterative Matching [Anderson et al.] iSLIP [McKeown] 1985 1990 1995 2000 2005 Maximal Matching S=2 [Dai,Prabhakar] TheoreticallyProven Longest Port First [Mekkittikul et al.] Maximum Weight Matching [McKeown et al.]
Maximal Matching Has Become Hard • TTX Switch Fabric • Uses maximal matching • Speedup less than 2 • Consumes up to 8kW • Limited to ~2.5Tb/s • No 100% throughput guarantee
Traditional Crossbar • Crossbar Requirements • An input can send at most one cell • An output can receive at most one cell • Scheduling Problem • Must overcome two constraints simultaneously • New Crossbar • Relieve contention • Remove dependency between inputs and outputs
Contents • Throughput Guarantees • Buffered Crossbar - 100% Throughput • Buffered Crossbar - Work Conservation • Delay Guarantees • Traditional Crossbar – Emulating an OQ Switch • Buffered Crossbar – Emulating an OQ Switch
Buffered Crossbar • Arrival Phase • Scheduling Phases – Speedup of 2 • Departure Phase
Scheduling Phase • Input Schedule • Each input selects in parallel a cell for an empty crosspoint • Output Schedule • Each output selects in parallel a cell from a full crosspoint
Example of Input/Output Scheduling • Round-robin Policy • Each input schedules in a round-robin order • Each output schedules in a round-robin order
Previous Work • Buffered Crossbar Simulations [Rojas-Cessa et al. 2001] • 32x32 switch, Uniform Bernoulli Traffic, Round-Robin, S=1
100% Throughput • Theorem 1 • A buffered crossbar with speedup of 2 delivers 100% throughput for any admissible Bernoulli iid traffic using any work-conserving input/output schedules.
<1-ε ε <1-ε Intuition of Proof 1 2 1-ε ε = 2- ε 1-ε + + • When a flow is backed up, the services for this backlog exceeds the arrivals
0 if buffer empty 1 if buffer full Bij = Intuition of Proof Qij = Queue Length
Intuition of Proof • Recall • If Qij > 0, then for Xij, • Expected increase is 2 • Expected decrease If Bij = 1, then in output schedule one B*j will decrease If Bij = 0,then in input schedule one Qi* will decrease • Thus expected decrease is 2
Contents • Throughput Guarantees • Buffered Crossbar - 100% Throughput • Buffered Crossbar - Work Conservation • Delay Guarantees • Traditional Crossbar – Emulating an OQ Switch • Buffered Crossbar – Emulating an OQ Switch
Work Conservation • Work-conserving Property • If there is a cell for a given output in the system, that output is busy. Output Queued (OQ) Switch
Emulating an OQ switch • Under identical inputs, the departure time of every cell from both switches is identical ?
Input Priority List 7 6 5 2 3 2 8 1 9 5 1 6 4 1 4 3 • Label each cell with their corresponding departure times • Arrange input cells into an input priority list • Output selects crosspoint with earliest departure time
Bad guy Bad guys Good guy Input Priority List 2 2 7 6 5 2 8 3 1 9 6 5 1 4 4 1 3 • Label each cell with their corresponding departure times • Arrange input cells into an input priority list • Output selects crosspoint with earliest departure time
2 bad guys 2 good guys Definitions 2 7 6 5 2 2 8 3 1 9 6 5 1 4 4 1 3 • Output Margin – cells at its output with earlier departure time • Input Margin – cells ahead in input priority list destined to different outputs • Total Margin – Output Margin minus Input Margin
Emulation of FIFO OQ Switch 2 7 6 5 2 2 8 3 1 9 6 5 1 4 4 1 3 • Scheduling Phase • Crosspoint is full – Output Margin will increase by one • Crosspoint is empty – Input Margin will decrease by one • Total Margin increases by two
Emulation of FIFO OQ Switch 3 7 6 5 2 2 8 3 2 1 9 6 5 1 4 4 1 3 • Arrival Phase • Input Margin might increase by one • Departure Phase • Output Margin will decrease by one • Total Margin decreases by at most two
Emulation of FIFO OQ Switch 7 6 5 3 2 2 8 3 2 9 6 5 4 4 3 • Lemma 1 • For every time slot, total margin does not decrease
7 6 5 3 2 FIFO Insertion Policy 4 7 2 8 3 2 9 6 5 4 4 3 • Arrival Phase • Cell for non-empty VOQ, insert behind cells for same output • Cell for empty VOQ, insert at head of input priority list
FIFO Insertion Policy 7 6 5 4 3 2 7 2 8 3 2 9 6 5 4 4 3 • Lemma 2 • An arriving cell will have a non-negative total margin
Emulation of FIFO OQ Switch • Theorem 2 • A buffered crossbar with speedup of 2 can exactly emulate a FIFO OQ switch. • Result was shown independently • B. Magill, C. Rohrs, R. Stevenson, “Output-Queued Switch Emulation by Fabrics With Limited Memory”, in IEEE Journal on Selected Areas in Communications, pp.606-615, May. 2003. • Theorem 3 • A buffered crossbar with speedup of 2 can be work-conserving with a distributed algorithm.
Contents • Throughput Guarantees • Buffered Crossbar - 100% Throughput • Buffered Crossbar - Work Conservation • Delay Guarantees • Traditional Crossbar – Emulating an OQ Switch • Buffered Crossbar – Emulating an OQ Switch
one output, single PIFO queue push constrained traffic Push In First Out (PIFO) Delay Guarantees one output, many logical FIFO queues Weighted fair queueing sorts packets 1 constrained traffic m PIFO models • Weighted Fair Queueing • Weighted Round Robin • Strict priority etc.
Achieving Delay Guarantees in Crossbars • Theorem 4 • A crossbar switch with a speedup of 2 can exactly emulate an OQ switch which provides delay guarantees. • Theorem 5 • A crossbar switch with a speedup of 2-1/N is necessary and sufficient to exactly emulate an NxN FIFO OQ switch.
Contents • Throughput Guarantees • Buffered Crossbar - 100% Throughput • Buffered Crossbar - Work Conservation • Delay Guarantees • Traditional Crossbar – Emulating an OQ Switch • Buffered Crossbar – Emulating an OQ Switch
Emulation of PIFO OQ Switch 1 7 6 5 2 3 4 8 3 2 1 9 7 6 6 5 2 1 5 4 4 1 3 2 • Crosspoint Blocking • A cell in the crosspoint has a larger departure time • Swap Phase • If an arriving cell has a smaller departure time than the cell in the crosspoint, swap the two cells
PIFO Insertion Policy 5 1 7 6 5 4 3 4 1 8 3 2 1 2 3 9 7 6 2 1 5 4 1 2 3 • Arrival Phase • Insert cell directly behind cell with departure time just earlier • If cell has earliest departure time, then insert at head of input priority list
PIFO Emulation • Theorem 6 • A buffered crossbar with speedup of 3 can exactly emulate an OQ switch with delay guarantees.
Header Scheduling Architecture Input Linecard Output Linecard Buffered Crossbar Grants Header Scheduler Headers
8 3 1 1 1 Header Scheduling 2 2 7 6 3 2 5 2 6 5 9 4 2 2 4 4 3 • Schedule headers instead of cells • Headers are converted into grants in output schedule • Grants are sent back to the input
GrantFIFO Grant Stream Input Linecard Output Linecard Buffered Crossbar Grants Header Scheduler Headers • Input can receive N grants in one scheduling phase • Bounded to p+N-1 grants over p consecutive phases
Counter Example p=1 p=2 p=3 p=4 p=5 p=6 1 2 3 Cells ToOutput Queue 3 1 2 1 1 3 Crosspoints 3 3 3 2 3 2 GrantFIFO 3 3 3 3 1 1 2 2 3 Grants 3 3 3 3
Modified Buffered Crossbar • Modified Buffered Crossbar • N cells per crosspoint – requires N3 cell buffers • N cells per output – requires N2 cell buffers • Theorem 7 • A modified buffered crossbar with speedup of 2 can emulate an OQ switch with delay guarantees with a fixed delay of N scheduling phases.
Summary • Buffered crossbars • Uses crosspoints to relieve contention • Inputs and outputs schedule independently and in parallel • Performance guarantees • Throughput – any work-conserving input/output schedule • Work Conservation – simple insertion policy • Delay – header scheduling
Relevant Papers • Crossbars • Shang-Tse Chuang, Ashish Goel, Nick McKeown, Balaji Prabhakar, “Matching Output Queuing with a Combined Input Output Queued Switch,” IEEE Journal on Selected Areas in Communications, vol.17, n.6, pp.1030-1039, Dec.1999. • Buffered Crossbars • Shang-Tse Chuang, Sundar Iyer, Nick McKeown, “Practical Algorithms for Performance Guarantees in Buffered Crossbars,” Stanford HPNG Technical Report TR03-HPNG-061501 .