420 likes | 559 Views
EE384Y: Packet Switch Architectures Part II Load-balanced Switches. Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University nickm@stanford.edu http://www.stanford.edu/~nickm. The Arbitration Problem.
E N D
EE384Y: Packet Switch Architectures Part II Load-balanced Switches Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University nickm@stanford.edu http://www.stanford.edu/~nickm
The Arbitration Problem • A packet switch fabric is reconfigured for every packet transfer. • For example, at 160Gb/s, a new IP packet can arrive every 2ns. • The configuration is picked to maximize throughput and not waste capacity. • Known algorithms are probably too slow.
Approach • We know that a crossbar with VOQs, and uniform Bernoulli i.i.d. arrivals, gives 100% throughput for the following scheduling algorithms: • Pick a permutation uar from all permutations. • Pick a permutation uar from the set of size N in which each input-output pair (i,j) are connected exactly once in the set. • From the same set as above, repeatedly cycle through a fixed sequence of N different permutations. • Can we make non-uniform, bursty traffic uniform “enough” for the above to hold?
Design Example Stanford “Optics in Routers” project http://yuba.stanford.edu/or/ • Some challenging numbers: • 100Tb/s • 160Gb/s linecards • 640 linecards Goals Scale to High Linecard Speeds (160Gb/s) • No Centralized Scheduler • Optical Switch Fabric • Low Packet-Processing Complexity Scale to High Number of Linecards (640) Provide Performance Guarantees • 100% Throughput Guarantee • No Packet Reordering
Outline • Basic idea of load-balancing • Packet mis-sequencing • An optical switch fabric • Scaling number of linecards
R R ? ? Out R ? R ? R R R R R ? R R R ? Out ? R R R ? R ? R Out Switch capacity = N2R Router capacity = NR 100% Throughput in a Mesh Fabric R In R In In
R R/N R/N Out R/N R/N R R R R/N R/N Out R/N R R/N R/N Out If Traffic Is Uniform R In R In R In
R R R R ? R/N In R R/N Out R/N R/N R R R R R In R R R/N R/N Out R/N R R R R/N In R/N Out Real Traffic is Not Uniform
Out Out Out Out Out Load-Balanced Switch R R R R/N R/N In Out R/N R/N R/N R/N R/N R/N R R R In R/N R/N R/N R/N R/N R/N R R R R/N R/N In R/N R/N Load-balancing stage Forwarding stage 100% throughput for weakly mixing traffic (Valiant, C.-S. Chang)
Load-Balanced Switch Out Out Out R R In 3 1 2 R/N R/N R/N R/N R/N R/N R/N R/N R R In R/N R/N R/N R/N R/N R/N R/N R R R/N In R/N R/N
Load-Balanced Switch Out Out Out R R In R/N R/N R/N R/N 1 R/N R/N R/N R/N R R In R/N R/N 2 R/N R/N R/N R/N R/N R R R/N In R/N R/N 3
Intuition: 100% Throughput Out Out Out R R In R/N R/N R/N R/N R/N R/N R R R/N R/N In R/N R/N R/N R/N R/N R R/N R R/N R/N In R/N R/N • Arrivals to second mesh: • Capacity of second mesh: • Second mesh: arrival rate < service rate [C.-S. Chang]
Load Balancing 1 1 1 N N N Another way of thinking about it External Inputs Internal Inputs External Outputs Load-balancing cyclic shift Switching cyclic shift • First stage load-balances incoming packets • Second stage is a cyclic shift
1 2 1 2 1 1 1 N N N Load-Balanced Switch External Inputs Internal Inputs External Outputs Load-balancing cyclic shift Switching cyclic shift
Outline • Basic idea of load-balancing • Packet mis-sequencing • An optical switch fabric • Scaling number of linecards
Out 1 2 Out Out Packet Reordering R R In R/N R/N R/N R/N R/N R/N R/N R/N R R In R/N R/N R/N R/N R/N R/N R/N R R R/N In R/N R/N
Out 1 2 Out Out Bounding Delay Difference Between Middle Ports R R In R/N R/N R/N R/N R/N R/N R/N R/N R R In R/N R/N R/N R/N R/N R/N R/N R R R/N In R/N R/N
UFS (Uniform Frame Spreading) Out 3 2 1 1 Out 2 Out R R In R/N R/N R/N R/N R/N R/N R/N R/N R R In R/N R/N R/N R/N R/N R/N R/N R R R/N In R/N R/N
FOFF (Full Ordered Frames First) Out 1 Out 2 Out R R In R/N R/N R/N R/N R/N R/N R/N R/N R R In R/N R/N R/N R/N R/N R/N R/N R R R/N In R/N R/N
N FOFF (Full Ordered Frames First) 4 3 2 1 2 1 • Input Algorithm • N FIFO queues corresponding to the N output flows • Spread each flow uniformly: if last packet was sent to middle port k, send next to k+1. • Every N time-slots, pick a flow: - If full frame exists, pick it and spread like UFS - Else if all frames are partial, pick one in round-robin order and send it
Bounding Reordering Out 1 2 3 Out Out R R In R/N R/N R/N R/N R/N R/N R/N R/N R R In R/N R/N R/N R/N R/N R/N R/N R R R/N In R/N R/N
N FOFF Output 4 1 1 1 2 2 3 3 3 • Output properties • N FIFO queues corresponding to the N middle ports • Buffer size less than N2 packets • If there are N2 packets, one of the head-of-line packets is in order
FOFF Properties • Property 1:FOFF maintains packet order. • Property 2:FOFF has O(1) complexity. • Property 3:Congestion buffers operate independently. • Property 4:FOFF maintains an average packet delay within constant from ideal output-queued router. • Corollary:FOFF has 100% throughput for any adversarial traffic.
R R ? ? Out R ? ? R R R R R ? R R R R ? Out ? R R R ? R ? R Out Output-Queued Router R In R In In
Outline • Basic idea of load-balancing • Packet mis-sequencing • An optical switch fabric • Scaling number of linecards
From Two Meshes to One Mesh One linecard Out Out Out In Out R R In R/N R/N R/N R/N R/N R/N R/N R/N R R In R/N R/N R/N R/N R/N R/N R/N R R R/N In R/N R/N
One linecard R R R R In In In In Out Out Out Out From Two Meshes to One Mesh R First mesh Second mesh
2R 2R 2R 2R In In In In Out Out Out Out From Two Meshes to One Mesh R Combined mesh
One linecard C1, C2, …, CN C1 C2 C3 CN In In In In Out Out Out Out Many Fabric Options N channels each at rate 2R/N Any spreading device Options Space: Full uniform mesh Time: Round-robin crossbar Wavelength: Static WDM
1 1 1 l l l … , N 1 2 AWGR (Arrayed Waveguide Grating Router) A Passive Optical Component • Wavelength i on input port j goes to output port (i+j-1) mod N • Can shuffle information from different inputs 1 l Linecard 1 Linecard 1 1 Linecard 2 1 l Linecard 2 2 NxNAWGR 1 l Linecard N Linecard N N
A, A, A, A A, B, C, D B, B, B, B A, B, C, D C, C, C, C A, B, C, D D, D, D, D A, B, C, D N WDM channels, each at rate 2R/N In In In In Out Out Out Out Static WDM Switching: Packaging AWGR Passive andAlmost ZeroPower A B C D
Outline • Basic idea of load-balancing • Packet mis-sequencing • An optical switch fabric • Scaling number of linecards
Scaling Problem • For N < 64, an AWGR is a good solution. • We want N = 640. • Need to decompose.
R R 2R 2R In In Out Out R 2R In In Out Out In In Out Out In In Out Out A Different Representation of the Mesh Mesh
R R In In Out Out In In Out Out In In Out Out In In Out Out A Different Representation of the Mesh 2R/N
1 2 3 4 7 1 2 3 4 5 8 6 1 2 3 4 5 6 7 8 Example: N=8 2R/8
8 2 1 3 4 7 6 5 5 3 2 1 6 7 8 4 When N is Too LargeDecompose into groups (or racks) 2R 2R 4R 4R 4R/4 2R 2R
1 1 2 L L 2 1 2 L L 2 1 When N is Too LargeDecompose into groups (or racks) Group/Rack 1 Group/Rack 1 2R 2R 2RL/G 2R 2R 2RL 2RL 2R 2R 2RL/G Group/RackG Group/Rack G 2RL/G 2R 2R 2R 2R 2RL 2RL 2R 2RL/G 2R
Outline • Basic idea of load-balancing • Packet mis-sequencing • An optical switch fabric • Scaling number of linecards