220 likes | 336 Views
Analysis of a Packet Switch with Memories Running Slower than the Line Rate. Sundar Iyer, Amr Awadallah, Nick McKeown (sundaes,aaa,nickm)@stanford.edu Departments of Electrical Engineering & Computer Science, Stanford University http://klamath.stanford.edu/pps. Problem Statement.
E N D
Analysis of a Packet Switch with Memories Running Slowerthan the Line Rate Sundar Iyer, Amr Awadallah, Nick McKeown (sundaes,aaa,nickm)@stanford.edu Departments of Electrical Engineering & Computer Science, Stanford University http://klamath.stanford.edu/pps
Problem Statement Motivation: To design an extremely high speed packet switch with memories running slower than the line rate. Stanford University
1 2 3 N=4 Architecture of a PPS Demultiplexor OQ Switch Multiplexor (R/k) (R/k) R R 1 1 Multiplexor Demultiplexor R R OQ Switch 2 2 Demultiplexor Multiplexor R R 3 OQ Switch Demultiplexor Multiplexor R R k=3 N=4 (R/k) (R/k) Stanford University
Parallel Packet SwitchQuestions • Can it behave like a single big output queued switch? • Can it provide delay guarantees, strict-priorities, WFQ, …? Stanford University
Parallel Packet SwitchResults • If S > 2k/(k+2) @ 2 then a PPS can precisely emulate a FIFO output queued switch for all traffic patterns. • If S > 3k/(k+3) @ 3 then a PPS can precisely emulate an OQ switch with WFQ or strict priorities for all traffic patterns. Stanford University
Parallel Packet SwitchResults • If S > 2sqrt(N) then a PPS can precisely emulate a multicast FIFO OQ switch • If S > 2sqrt(2N) then a PPS can precisely emulate a multicast OQ switch with WFQ or strict priorities for all traffic patterns. Stanford University
Questions • Can we have a completely distributed algorithm? • Can we reduce the speedup further? • “Two is too much” • Can we smoothen the load on all the middle stage switches? Stanford University
Completely Distributed Algorithm • Local Available Output Link Set (LAOL) • Definition: • LAOL consists of the (k/s -1) “oldest” layers used by an input for that output. • We can prevent a layer from appearing in the LAOL till another k -k/s +1 cells have been sent to other layers for that output. • Result : • For any given output a layer is used only after k -k/s +1 cells to that output are sent . Stanford University
Conflict Free Ordering Parallel Packet Switch sR/k Demultiplexor . . . 5 .1 1 1 R 1 sR/k . . 4 2. 2 2 Demultiplexor Demultiplexor sR/k R R . . . 6 5 3 3 2 2 Demultiplexor sR/k . . .4 4 3 1 R 3 sR/k . . .6 5 4 2 Demultiplexor R N=4 sR/k . . . 7. 3 .6 5 Stanford University
Re-Sequencing • A cell might be delayed by as much as N/S time slots. • Cells might leave in a wrong order. • A buffer of size Nk/S will be needed to re-sequence cells to prevent out of order transmissions. Stanford University
A Practical Distributed Algorithm • If S > 2k/(k+2) @ 2 then a PPS with a completely distributed algorithm can precisely emulate a FIFO output queued switch for all traffic patterns. The PPS will have a fixed latency of Nk/S time slots. A re-sequencing buffer of size Nk/S is needed. Stanford University
PPS with no Speedup • Speedup = 1 • LAOL is round robin • |LAOL| = 1 • D(i,l): Number of cells sent by demultiplexor i to layer l Stanford University
Buffer Degree sR/k a • Degree of Buffer () sR/k b Demultiplexor sR/k a c e b R c sR/k c d sR/k d sR/k Stanford University
Buffered AIL Set (BAIL) • Buffered Available Input Link Set (BAIL) • “Set of layers which have less than cells in the buffer (including transmission) for layer l” • It is the set of layers which can start sending the arriving cell between time n and n + k” • Till now we have only considered a PPS with =0 Stanford University
Claim • BAIL is never empty • The buffer never overflows for some • LAOL is always satisfied Stanford University
Buffer Occupancy Sequence i-1 i =0 1 2 • The last of the i cells left at least by time t-k+1. • I >=(t-k+1– ti)/k >= (t- ti)/k - 1 • D(i,l) = I + c … t t-k+1 t1 t2 ti-1 ti Stanford University
Buffer Occupancy Sequence.. i-1 i =0 1 2 c … t t-k+1 t1 t2 ti-1 ti = N gives a contradiction. Stanford University
Observations • Each cell reaches the middle stage switch with a variable input delay, Di = 1..N. • If all cells are delayed at the input of the middle stage switches by “N - Di” then they all reach the outputs of the middle stage in order. Stanford University
Symmetry Argument • Demultiplexors • Cells arrive at rate R • Each cell has a property: output • Cells to same output are written in a round robin manner • Cells leave at link rate R • The buffer is used to prevent temporary load on the same middle stage switch • Max Delay = N Stanford University
Symmetry Argument … • Multiplexors • Cells need to be read in at rate R • Each cell has a property: input • Cells from same input are read in a round robin manner • Cells leave at a rate k(R/k) = R • The buffer is used to re-order cells and send them in a correct order. • Max Delay = N Stanford University
Buffered PPSResults • A PPS with a completely distributed algorithm and no speedup with a buffer degree N, can precisely emulate a FIFO output queued switch for all traffic patterns within a delay bound of 2N time slots. Stanford University
Conclusions • Implementation • Timestamps • Sequence Numbers • Open questions • Making QoS practical. • Making multicasting practical. Stanford University