320 likes | 468 Views
Achieving 100% throughput Where we are in the course…. Switch model Uniform traffic Technique: Uniform schedule (easy) Non-uniform traffic, but known traffic matrix Technique: Non-uniform schedule (Birkhoff-von Neumann) Unknown traffic matrix Technique: Lyapunov functions (MWM)
E N D
Achieving 100% throughputWhere we are in the course… • Switch model • Uniform traffic • Technique: Uniform schedule (easy) • Non-uniform traffic, but known traffic matrix • Technique: Non-uniform schedule (Birkhoff-von Neumann) • Unknown traffic matrix • Technique: Lyapunov functions (MWM) • Faster scheduling algorithms • Technique: Speedup (maximal matchings) • Technique: Memory and randomization (Tassiulas) • Technique: Twist architecture (buffered crossbar) • Accelerate scheduling algorithm • Technique: Pipelining • Technique: Envelopes • Technique: Slicing • No scheduling algorithm • Technique: Load-balanced router
Buffered Crossbars With Performance Guarantees Taken from the 2004 Ph.D. defense of: Shang-Tse (Da) Chuang Department of Electrical Engineering, Stanford University, http://yuba.stanford.edu/~stchuang
Motivation • Network operators want performance guarantees • Throughput guarantee • Delay guarantee • High performance routers use crossbars • Hard to build crossbar-based routers with guarantees • My talk: • How a crossbar with a small amount of internal buffering can give guarantees
Contents • Throughput Guarantees • Buffered Crossbar - 100% Throughput • Buffered Crossbar - Work Conservation
Generic Crossbar-Based Architecture Speedup of S VOQs Scheduler
Admissible Traffic • Traffic Matrix • Traffic is admissible if
Speedup of S Scheduler Throughput Guarantee • 100% Throughput • An algorithm delivers 100% throughput if for any admissible traffic the average backlog is finite
Previous Work Heuristics Wave Front Arbiter [Tamir] Parallel Iterative Matching [Anderson et al.] iSLIP [McKeown] 1985 1990 1995 2000 2005 Maximal Matching S=2 [Dai,Prabhakar] TheoreticallyProven Longest Port First [Mekkittikul et al.] Maximum Weight Matching [McKeown et al.]
Maximal Matching Has Become Hard • TTX Switch Fabric • Uses maximal matching • Speedup less than 2 • Consumes up to 8kW • Limited to ~2.5Tb/s • No 100% throughput guarantee
Traditional Crossbar • Crossbar Requirements • An input can send at most one cell • An output can receive at most one cell • Scheduling Problem • Must overcome two constraints simultaneously • New Crossbar • Relieve contention • Remove dependency between inputs and outputs
Contents • Throughput Guarantees • Buffered Crossbar - 100% Throughput • Buffered Crossbar - Work Conservation • Delay Guarantees • Traditional Crossbar – Emulating an OQ Switch • Buffered Crossbar – Emulating an OQ Switch
Buffered Crossbar • Arrival Phase • Scheduling Phases – Speedup of 2 • Departure Phase
Scheduling Phase • Input Schedule • Each input selects in parallel a cell for an empty crosspoint • Output Schedule • Each output selects in parallel a cell from a full crosspoint
Example of Input/Output Scheduling • Round-robin Policy • Each input schedules in a round-robin order • Each output schedules in a round-robin order
Previous Work • Buffered Crossbar Simulations [Rojas-Cessa et al. 2001] • 32x32 switch, Uniform Bernoulli Traffic, Round-Robin, S=1
100% Throughput • Theorem 1 • A buffered crossbar with speedup of 2 delivers 100% throughput for any admissible Bernoulli iid traffic using any work-conserving input/output schedules.
<1-ε ε <1-ε Intuition of Proof 1 2 1-ε ε = 2- ε 1-ε + + • When a flow is backed up, the services for this backlog exceeds the arrivals
Contents • Throughput Guarantees • Buffered Crossbar - 100% Throughput • Buffered Crossbar - Work Conservation • Delay Guarantees • Traditional Crossbar – Emulating an OQ Switch • Buffered Crossbar – Emulating an OQ Switch
Work Conservation • Work-conserving Property • If there is a cell for a given output in the system, that output is busy. Output Queued (OQ) Switch
Emulating an OQ switch • Under identical inputs, the departure time of every cell from both switches is identical ?
Input Priority List 7 6 5 2 3 2 8 1 9 5 1 6 4 1 4 3 • Label each cell with their corresponding departure times • Arrange input cells into an input priority list • Output selects crosspoint with earliest departure time
Bad guy Bad guys Good guy Input Priority List 2 2 7 6 5 2 8 3 1 9 6 5 1 4 4 1 3 • Label each cell with their corresponding departure times • Arrange input cells into an input priority list • Output selects crosspoint with earliest departure time
2 bad guys 2 good guys Definitions 2 7 6 5 2 2 8 3 1 9 6 5 1 4 4 1 3 • Output Margin – cells at its output with earlier departure time • Input Margin – cells ahead in input priority list destined to different outputs • Total Margin – Output Margin minus Input Margin
Emulation of FIFO OQ Switch 2 7 6 5 2 2 8 3 1 9 6 5 1 4 4 1 3 • Scheduling Phase • Crosspoint is full – Output Margin will increase by one • Crosspoint is empty – Input Margin will decrease by one • Total Margin increases by two
Emulation of FIFO OQ Switch 3 7 6 5 2 2 8 3 2 1 9 6 5 1 4 4 1 3 • Arrival Phase • Input Margin might increase by one • Departure Phase • Output Margin will decrease by one • Total Margin decreases by at most two
Emulation of FIFO OQ Switch 7 6 5 3 2 2 8 3 2 9 6 5 4 4 3 • Lemma 1 • For every time slot, total margin does not decrease
7 6 5 3 2 FIFO Insertion Policy 4 7 2 8 3 2 9 6 5 4 4 3 • Arrival Phase • Cell for non-empty VOQ, insert behind cells for same output • Cell for empty VOQ, insert at head of input priority list
FIFO Insertion Policy 7 6 5 4 3 2 7 2 8 3 2 9 6 5 4 4 3 • Lemma 2 • An arriving cell will have a non-negative total margin
Emulation of FIFO OQ Switch • Theorem 2 • A buffered crossbar with speedup of 2 can exactly emulate a FIFO OQ switch. • Result was shown independently • B. Magill, C. Rohrs, R. Stevenson, “Output-Queued Switch Emulation by Fabrics With Limited Memory”, in IEEE Journal on Selected Areas in Communications, pp.606-615, May. 2003. • Theorem 3 • A buffered crossbar with speedup of 2 can be work-conserving with a distributed algorithm.
Summary • Buffered crossbars • Uses crosspoints to relieve contention • Inputs and outputs schedule independently and in parallel • Performance guarantees • Throughput – any work-conserving input/output schedule • Work Conservation – simple insertion policy
Relevant Papers • Crossbars • Shang-Tse Chuang, Ashish Goel, Nick McKeown, Balaji Prabhakar, “Matching Output Queuing with a Combined Input Output Queued Switch,” IEEE Journal on Selected Areas in Communications, vol.17, n.6, pp.1030-1039, Dec.1999. • Buffered Crossbars • Shang-Tse Chuang, Sundar Iyer, Nick McKeown, “Practical Algorithms for Performance Guarantees in Buffered Crossbars,” in preparation for IEEE/ACM Transactions on Networking.