530 likes | 648 Views
Packet-Mode Emulation of Output-Queued Switches. David Hay, CS, Technion Joint work with Hagit Attiya (CS) and Isaac Keslassy (EE). Outline. Cell-Mode Scheduling vs. Packet-Mode Scheduling Impossibility of an Exact Emulation Speedup-RQD Tradeoff Emulation with S 4 Emulation with S 2
E N D
Packet-Mode Emulation of Output-Queued Switches David Hay, CS, Technion Joint work with Hagit Attiya (CS) and Isaac Keslassy (EE)
Outline • Cell-Mode Scheduling vs. Packet-Mode Scheduling • Impossibility of an Exact Emulation • Speedup-RQD Tradeoff • Emulation with S4 • Emulation with S2 • Emulation of OQ switch w/ bounded buffer • Simulation Results
Trend towards Packet-Mode • Cell-mode scheduling is getting too hard • Fragmentation and reassembly should work very fast, at the external rate • Extra header for each cell loss of bandwidth • For optical switches such fragmentation and reassembly are prohibitive • Cell-mode schedulers are packet-oblivious • Degradation of the overall performance
Packet-Mode Scheduling [Marsan et al., 2002][Ganjali et al., 2003][Turner, 2006] • No need for fragmentation and reassembly • Must ensure contiguous packet delivery over the fabric • While input i delivers a packet to output j, neither input i nor output j can handle other packets. Can packet-mode schedulers provide similar performance guarantees as cell-mode schedulers?
Output Queuing Emulation • OQ switches are considered optimal with respect to queuing delay and throughput • But too hard to implement in practice… • Emulation: Same input traffic same output traffic • How hard is it for cell-mode / packet-mode CIOQ switch to emulate OQ switch?
Output Queuing Emulation • OQ switches are considered optimal with respect to queuing delay and throughput • But too hard to implement in practice… • Emulation: Same input traffic same output traffic • How hard is it for cell-mode / packet-mode CIOQ switch to emulate OQ switch?
Cell-Mode Emulation is Possible • Easy with speedup S=N • N scheduling decisions every time-slot: • In the 1st decision forward the cell of input 1 • In the 2nd decision forward the cell of input 2 • In the Nth decision forward the cell of input N
Cell-Mode Emulation is Possible • Easy with speedup S=N • N scheduling decisions every time-slot: • In the 1st decision forward the cell of input 1 • In the 2nd decision forward the cell of input 2 • In the Nth decision forward the cell of input N
Cell-Mode Emulation w/ S=2 [Chuang et al.,1999] • 1st Key Concept: Slackness of a cell (in the input side) L(C) = OC(C) - IT(C) • Slackness may decrease by at most 2 in every time-slot • A cell leaves the destination of C OC-- • A cell arrives at the input and is queued before C IT++ • Initial slackness can be made non-negative • When C arrive, Insert it in the OC(C)th place of its input buffer. Plan: Ensure that slackness always increases by 2 • Slackness is never negative • All cells are delivered on time Output Cushion: (“good guys”) How many cells are queued in the output-buffer of C’s destination, and should leave the OQ switch before C Input Thread: (“bad guys”) How many cells proceed C in its input-port buffer?
Cell-Mode Emulation w/ S=2 [Chuang et al.,1999] • Stable Marriage (stable matching): Given two equal-size sets M,W and preference lists from every mM, wW. Find a matching in which there are no two pairs (m,w),(m’,w’) s.t. • m prefer w’ over w • w’ prefer m over m • Classical problem in CS • Stable marriage always exists • Many algorithms..
Cell-Mode Emulation w/ S=2 [Chuang et al.,1999] • Critical Cell First (CCF) algorithm performs stable marriage at each decision: • M is the set of inputs, W is the set of outputs • i prefers o1 over o2 if there is a cell for o1 that is queued before all cells for o2 • o prefers i1 over i2 if there is a cell from i1 that should leave before all cells from i2
Cell-Mode Emulation w/ S=2 [Chuang et al.,1999] • For each cell C from input-port i to output port j, and each scheduling decision: • C is forwarded (and we don’t care about it) • C’ was forwarded from i, and i preferred to forward it IT-- • C’ was forwarded to j, and j preferred to receive it OC++ • Two scheduling decisions every time-slots Slackness always increases by 2
Cell-Mode Emulation • Easy with speedup S=N • Possible with speedup S=2 (w/ CCF) • Lower bound: S≥2-1/N is required [Chuang et al.,1999] What is the speedup required for packet-mode emulation?
Outline • Cell-Mode Scheduling vs. Packet-Mode Scheduling • Impossibility of an Exact Emulation • Speedup-RQD Tradeoff • Emulation with S4 • Emulation with S2 • Emulation of OQ switch w/ bounded buffer • Simulation Results
Packet-Mode Emulation is Impossible • Regardless of speedup • Even with speedup S=N
Outline • Cell-Mode Scheduling vs. Packet-Mode Scheduling • Impossibility of an Exact Emulation • Speedup-RQD Tradeoff • Emulation with S4 • Emulation with S2 • Emulation of OQ switch w/ bounded buffer • Simulation Results
Emulation w/ Relative Queuing Delay • The CIOQ switch is allowed a bounded lag behind the shadow OQ switch • Exact same behavior as the optimal OQ switch, but with some extra delay • Called relative queuing delay Can we provide packet-mode OQ emulation with bounded RQD and small speedup?
Our Results:Speedup-RQD tradeoff Speedup 2Lmax Lmax= maximum packet size (known value) Generalization of cell-mode scheduling with S=2: Taking each packet of size ≤ Lmax as one huge cell Lower bound on RQD (even with infinite speedup) 4 2 RQD Lower bound on the speedup (from cell-mode scheduling)
Intuition for Emulation Algorithms Packet Mode CIOQ Cell Mode CIOQ w/ S=2 Packet Mode OQ
PIFO Cell-Mode OQ Switch • FIFO = First-In First-Out
PIFO Cell-Mode OQ Switch • FIFO = First-In First-Out • PIFO = Push-In First-Out
FIFO Packet-Mode OQ Switch is a PIFO Cell-Mode Switch PIFO Cell-Mode OQ Switch • FIFO = First-In First-Out • PIFO = Push-In First-Out
Underlying CCF Algorithm • Cell-Mode CIOQ w/ CCF (and speedup S=2) emulates any PIFO cell-mode OQ switch [Chuang et al.,1999] • But, CCF does not maintain contiguous packet forwarding over the fabric! Packet Mode CIOQ Cell Mode CIOQ w/ S=2 PIFO Cell-Mode OQ = Packet Mode OQ
Intuition for Emulation Algorithms Packet Mode CIOQ • Two sub-steps: • Framing • Contiguous Decomposition Cell Mode CIOQ w/ S=2 Packet Mode OQ
time Frame-Based Schedulers Works in pipelined frame-based manner Within each frame: • Builda demand matrix for this frame • Schedule the demand matrix of the previous frame
≤2T + + + + + + + ≤2T + + + + + + + ≤2T + + + + + + + ≤2T + + + ≤ ≤ ≤ ≤ 2T 2T 2T 2T Building the Demand Matrix • At each frame of size T, CCF forwards at most 2T cells from each input and to each output. Number of cells CCF sent from input 1 to output 1 in the last frame Problem: A packet may span several frames.
Building the Demand Matrix • Count only packets whose last cell is forwarded by the CCF in the frame • Each row/column in the matrix is bounded by 2T+N(Lmax-1) • For each input-output pair only cells of one additional packet can be added. • Translates into RQD of 2T+(Lmax-2).
Intuition for Emulation Algorithms Packet Mode CIOQ • Two sub-steps: • Framing • Contiguous Decomposition Cell Mode CIOQ w/ S=2 Packet Mode OQ
Decomposing the Demand Matrix • Challenge: Decompose the matrix into permutations while maintaining contiguous packet delivery. • Each permutation dictates a scheduling decision. • First try: optimal Birkhoff von-Neumann decomposition results in 2T+N(Lmax-1) permutations.
Contiguous Greedy Decomposition • To maintain contiguous packet delivery: • If (i,j) was matched in iteration t-1 and there are more (i,j) cells to schedule keep for iteration t. • Find a greedy matching for the rest of the matrix. • Speedup: RQD: 2T+Lmax-1
Our Results:Speedup-RQD tradeoff Speedup 2Lmax S=4+ (N(Lmax-1))/T RQD = 2T+Lmax-1 Next… 4 2 RQD
Intuition for Emulation Algorithms Packet Mode CIOQ • Two sub-steps: • Framing • Contiguous Decomposition Cell Mode CIOQ w/ S=2 Packet Mode OQ
Emulation w/ S2 - Framing • Keep a separate demand matrix for every possible packet size • Example: Possible packets sizes are 3,4,6 # of size 3 packets # of size 4 packets # of size 6 packets
Mega Packets (of size 12) size 3 size 4 Emulation w/ S2 - Framing • Concatenate packets of the same size into mega-packets of size k=LCM(1,…,Lmax) • Leftover matrix for each size m size 6
Mega Packets (of size k=12) size 3 size 4 Emulation w/ S2 - Framing • Concatenate packets of the same size into mega-packets of size k=LCM(1,…,Lmax) • Leftover matrix for each size m size 6
Mega Packets (of size 12) size 3(leftovers) size 4 Emulation w/ S2 - Framing • Concatenate packets of the same size into mega-packets of size k=LCM(1,…,Lmax) • Leftover matrix for each size m size 6
Mega Packets (of size 12) size 3(leftovers) size 4 (leftovers) Emulation w/ S2 - Framing • Concatenate packets of the same size into mega-packets of size k=LCM(1,…,Lmax) • Leftover matrix for each size m size 6
Mega Packets (of size 12) size 3(leftovers) size 4 (leftovers) Emulation w/ S2 - Framing • Concatenate packets of the same size into mega-packets of size k=LCM(1,…,Lmax) • Leftover matrix for each size m size 6 (leftovers)
< 12/6 < 12/4 < 12/3 Mega Packets (of size 12) size 3(leftovers) size 4 (leftovers) Emulation w/ S2 - Framing • Sum of each row/column is bounded • For mega packets matrix: ≤ (2T+N(Lmax-1))/k • For each leftover matrix of size m: ≤ N(k -1)/m size 6 (leftovers)
Emulation w/ S2 - Decomposition • Optimally decompose (w/ Birkhoff von-Neumann) the mega-packets matrix and then the leftover matrices Hold each permutation k times for contiguous (mega)-packet delivery Bound on the mega-packets matrix