340 likes | 483 Views
Switches with Input Buffers (Cisco). Packet Switches with Input Buffers. Switching fabric Electronic chips (Mindspeed, AMCC, Vitesse) Space-wavelength selector (NEC, Alcatel) Fast tunable lasers (Lucent) Waveguide arrays (Chiaro) Scheduler
E N D
Packet Switches with Input Buffers • Switching fabric • Electronic chips (Mindspeed, AMCC, Vitesse) • Space-wavelength selector (NEC, Alcatel) • Fast tunable lasers (Lucent) • Waveguide arrays (Chiaro) • Scheduler • Packets compete not only with the packets destined for the same output but also with the packets sourced by the same input. Scheduling might become a bottleneck in a switch with hundreds of ports and gigabit line bit-rates.
Optical Packet Cross-bar (NEC,Alcatel) • A 2.56 Tb/s multiwavelength and scalable switch-fabric for fast packet-switching network, PTL 1998,1999, NEC
Optical Packet Cross-bar (Lucent) • A fast 100 channel wavelength tunable transmitter for optical packet switching, PTL 2001, Bell Labs
Scheduling Algorithms for Packet Switches with Input Buffers • Each input sends request for its HOL packet to the corresponding output. Each output grants one input, and this input-output pair will be connected in the next time slot. • Output utilization when inputs are fully loaded is: U=1-(1-1/N)N-1
3 4 4 4 4 4 4 2 2 1 1 1 1 1 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 1 1 1 1 4 3 3 3 3 3 4 3 3 3 3 3 2 4 1 2 3 4 4 4 1 1 1 2 1 1 1 .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Scheduling Algorithms for Packet Switches with Input Buffers
Scheduling Algorithms for Packet Switches with Input Buffers • In parallel iterative matching (PIM), SLIP or dual round-robin (DRR) inputs send requests to outputs, outputs grant inputs, and inputs then grant outputs in one iteration. It was proven that PIM finds a maximal matching after log2N +4/3 steps on average. • Maximum weighted matching and maximum matching algorithm maximize the weight of the connected pairs, and achieve 100% for i.i.d. traffic but have complexities O(N3log2N) and O(N2.5). • Sequential greedy scheduling is a maximal matching algorithm that is simple to implement. Maximal matching algorithm does not leave input-output pair unmatched.
Bandwidth ReservationsPacket Switches with Input Buffers • Anderson et al.: Time is divided into frames of F time slots. Schedule is calculated in each frame; Statistical matching algorithm. • Stiliadis and Varma: Counters are loaded per frame. Queues with positive counters are served with priority according to parallel iterative matching (PIM), their counters are then decremented by 1. DRR proposed by Chao et al. could be used as well. • Kam et al.: Counter is incremented for the negotiated bandwidth and decremented by 1 when the queue is served. Maximal weighted matching algorithm is applied. • Smiljanić: Counters are loaded per frame. Queues with positive counters are served with priority according to the maximal matching algorithm preferrably sequential greedy scheduling algorithm (SGS), where inputs sequentially choose outputs to transmit packets to.
Maximum and Maximal Matching Algorithm • It was shown that when packet arrivals are i.i.d and traffic distribution is admissible then 100% can pass the cross-bar if the maximum or the maximum weighted matching algorithms are applied. • It was shown that when packet arrivals obey a strong law of large numbers and traffic distribution is admissible then 50% can pass the cross-bar if the mximal matching algorithms are applied.
PIM, SLIP and DRR • In PIM and SLIP each input sends requests to all outputs for which it has packets, and in DRR only to one chosen output. SLIP and DRR use round-robin choices. • Theorem: PIM finds a maximal matching after log2N +4/3 steps on average. • Proof: Let n inputs request output Q, and let k of these inputs receive no grants. With probability k/n all requests are resolved, and with probability 1-k/n at most k requests are unresolved. The average number of requests is at most (1-k/n)·k≤n/4. So if there are N2 requests at the beginning, the expected number of unresolved requests after I iterations is N2/4i
PIM, SLIP and DRR • Proof (cont.): Let C be the last step on which the last request is resolved. Then:
SGS Implementation • All inputs one after another choose outputs, SGS is a maximal matching algorithm
SGS Uses Pipelining Ii -> Tk Input i chooses output for time slot k
Weighted Sequential Greedy Scheduling • i=1; • Input i chooses output j from Ok for which it has packet to send; Remove i from Ik and j from Ok; • If i<N choose i=i+1 and go to the previous step;
Weighted Sequential Greedy Scheduling • If k=1 mod F then cij=aij; Ik={1,...,N}; Ok={1,...,N}; i=1; • Input i chooses output j from Ok for which it has packet to send such that cij>0; Remove i from Ik and j from Ok; cij=cij-1; • If i<N choose i=i+1 and go to the previous step;
Non-blocking Nature of WSGS • Maximal matching algorithm does not leave input or output unmatched if there is a packet to be transmitted from the input to the output in question. • It can be proven that all the traffic passes through the cross-bar with the speedup of two which is run by a maximal matching algorithm, as long as the outputs are not overloaded.
Performance of Maximal Matching Algorithm Theorem: The maximal matching protocol (and so WSGS) ensures aij time slots per frame to input-output pair (i,j), if where Ti is the number of slots reserved for input i, and Rj is the number of slots reserved for output j. Proof: Note that
I: II: III: Admission Control for Maximal Matching Algorithm The maximal matching (and so WSGS) protocol ensures aij time slots per frame to input-output pair (i,j) if: F frame length Ti the number of slots reserved for input i, Rj the number of slots reserved for output j. ti, rjare normalized Ti, Rj.
Analogy with Circuit Switches • Inputs ~ Switches in the first stage • Time slots in a frame ~ Switches in the middle stage • Outputs ~ Switches in the last stage Non-blocking condition: Strictly non-blocking condition:
Rate and Delay Guranteed by Maximal Matching Algorithm (and WSGS) • Assume a coarse synchronization on a frame by frame basis, where a frame is the policing interval comprising F cell time slots of duration Tc. • Then, the delay of D=2·F·Tc is provided for the utilization of 50%. Or, this delay and utilization of 100% are provided for the fabric with the speedup of 2.
bit-rate reserved for multicast session k of input i multicast group k sourced by input i Port Congestion Due to Multicasting Solution: Packets should be forwarded through the switch by multicast destination ports.
Admission Control for Modified WSGS where Ei is the number of forwarded packets per frame
Admission Control for Modified WSGS Modified WSGS protocol ensures negotiated bandwidths to input-output pairs if for : I: II: F frame length, P forwarding fan-out Ti the number of slots reserved for input i, Ri the number of slots reserved for output i. ti, riare normalized Ti, Ri.
Rate and Delay Guaranteed by Modified WSGS • Assume again a coarse synchronization on a frame by frame basis. • Then, the delay of D= F·Tc is provided for the utilization of 1/(P+2), where P is the forwarding fan-out. Or, this delay and utilization of 100% are provided for the fabric speedup of P+2.
References • T. E. Anderson, S. S. Owicki, J. B. Saxe, and C. P. Thacker, “Highspeed switch scheduling for local-area networks,” ACM Transactions on Computer Systems, vol. 11, no. 4, November 1993, pp. 319-352. • N. McKeown et al., “The Tiny Tera: A packet switch core,” IEEE Micro, vol. 17, no. 1, Jan.-Feb. 1997, pp. 26-33. • A. Smiljanić, “Flexible bandwidth allocation in high-capacity packet switches,” IEEE/ACM Transactions on Networking, April 2002, pp. 287-293.
References • A. Smiljanić, “Scheduling of multicast trafc in high-capacity packet switches,” IEEE Communication Magazine, November 2002, pp. 72-77.