1 / 29

IP routers with memory that runs slower than the line rate

IP routers with memory that runs slower than the line rate. Nick McKeown Assistant Professor of Electrical Engineering and Computer Science, Stanford University nickm@stanford.edu http://www.stanford.edu/~nickm. Outline. Trends in packet switch design Additional problem:

Download Presentation

IP routers with memory that runs slower than the line rate

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. IP routers with memory that runs slower than the line rate Nick McKeown Assistant Professor of Electrical Engineering and Computer Science, Stanford University nickm@stanford.edu http://www.stanford.edu/~nickm 1

  2. Outline • Trends in packet switch design • Additional problem: “Data rates may soon exceed memory bandwidth” • The Fork-Join Router & Parallel Packet Switches 2

  3. First Packet SwitchesShared Memory Numerous work has proven and made possible: • Fairness • Delay Guarantees • Delay Variation Control • Loss Guarantees • Statistical Guarantees Output 1 Input 1 Input 2 Output 2 Large, single dynamically allocated memory buffer: N writes per “cell” time N reads per “cell” time. Limited by memory bandwidth. Input N Output N 3

  4. Rate of writes/reads determined by switchfabric speedup Later Packet SwitchesSingle-stage crossbar with CIOQ and VOQs Virtual Output Queues 1 read per “cell” time 1 write per “cell” time Lookup & Drop Policy Output Scheduling Switch Fabric Lookup & Drop Policy Output Scheduling Switch Arbitration Lookup & Drop Policy Switch Core (Bufferless) Output Scheduling Linecard Linecard 4

  5. Myths about CIOQ-based crossbar switches • “Input-queued crossbars have low throughput” • An input-queued crossbar can have as high throughput as any switch. • “Crossbars don’t support multicast traffic well” • A crossbar inherently supports multicast efficiently. • “Crossbars don’t scale well” • Today, it is the number of chip I/Os, not the number of crosspoints, that limits the size of a switch fabric. Expect 5Tb/s crossbar switches. 5

  6. Myths about CIOQ-based crossbar switches (2) 4. “Crossbar switches can’t support delay/QoS guarantees” • With an internal speedup of 2, a CIOQ switch can (in theory) precisely emulate a shared memory switch for all traffic. 6

  7. What makes sense today? 7

  8. Output 1 Input 1 Input 2 Output 2 Input N Output N Summary of trend Higher Capacity • Multistage: • Clos • Banyan • Toroidal… 1 Switch Fabric Switch Arbitration 2 Less frequent arbitration Limited by: Memory bandwidth ~50Gb/s Limited by: Per-cell arbitration Power ~5Tb/s 8

  9. Buffer MemoryHow Fast Can I Make a Packet Buffer? 10ns on-chip DRAM Buffer Memory External Line e.g. OC768c 64-byte wide bus 64-byte wide bus SwitchFabric Rough Estimate: • 10ns per memory operation. • Two memory operations per packet. • Therefore, maximum ~26Gb/s. 9

  10. How can we make routers with 40Gb/s, 160Gb/s,… interfaces? 10

  11. Output 1 Input 1 Input 2 Output 2 Input N Output N Higher capacity and higher linerates Higher capacity 1 Multistage Switch Fabric 2 Less frequent arbitration Switch Arbitration 3 More parallelism: Fork-Join Router Limited by: Memory bandwidth ~50Gb/s Limited by: Per-cell arbitration Power ~5Tb/s Higher Linerates 11

  12. Fork-Join Router How can we: • Increase capacity. • Reduce power per subsystem. While at the same time… • Keep the system simple. • Support line rates faster than memory bandwidth. • Provide delay guarantees. Increase parallelism. Multiple racks. Single-stage buffering. Pkt-by-pkt load balancing. Hmmm….? 12

  13. The Fork-Join Router Router 1 rate, R rate, R 1 1 2 rate, R rate, R N N k Bufferless 13

  14. The Fork-Join Router • Advantages • Single-stage of buffering • kh a power per subsystem i • kh a memory bandwidth i • kh a fowarding table lookup rate i 14

  15. The Fork-Join Router • Questions • Switching: What is the performance? • Forwarding Lookups: How do they work? 15

  16. A Parallel Packet Switch Arriving packet tagged with egress port 1 Output Queued Switch rate, R rate, R 2 1 1 Output Queued Switch rate, R rate, R N N k Output Queued Switch 16

  17. Performance Questions • Can it be work-conserving? • Can it emulate a single big output queued switch? • Can it support delay guarantees, strict-priorities, WFQ, …? 17

  18. WorkConservation 1 Output Queued Switch R/k R/k 2 Output Queued Switch R/k R/k rate, R rate, R 1 1 R/k R/k k Output Queued Switch Output Link Constraint Input Link Constraint 18

  19. 5 1 1 4 3 2 1 Work Conservation 1 5 4 1 R/k R/k 4 1 2 2 R/k R/k 2 rate, R rate, R 1 1 3 R/k R/k k 3 Output Link Constraint 19

  20. Work Conservation 1 S(R/k) Output Queued Switch S(R/k) rate, R rate, R S(R/k) S(R/k) 2 1 1 Output Queued Switch rate, R rate, R N N k Output Queued Switch S(R/k) S(R/k) 20

  21. = ? Parallel Packet Switch 1 1 N N Precise Emulation of an Output Queued Switch Output Queued Switch 1 N N N 21

  22. Parallel Packet SwitchTheorems • If S > 2k/(k+2) @ 2 then a parallel packet switch can be work-conserving for all traffic. • If S > 2k/(k+2) @ 2 then a parallel packet switch can precisely emulate a FCFS output-queued switch for all traffic. 22

  23. Parallel Packet SwitchTheorems 3. If S > 3k/(k+3) @ 3 then a parallel packet switch can precisely emulate a switch with WFQ, strict priorities, and other types of QoS, for all traffic. 23

  24. Parallel Packet SwitchTheorems 4. If S >= 1 then a parallel packet switch with a small co-ordination buffer at rate R, can precisely emulate a FCFS switch for all traffic. 24

  25. Co-ordination buffers 1 Size Nk Size Nk R/k Output Queued Switch R/k rate, R rate, R R/k R/k 2 Output Queued Switch rate, R rate, R k Output Queued Switch R/k R/k 25

  26. Parallel Packet SwitchTheorems 5. If S > 2 then a parallel packet switch with a small co-ordination buffer at rate R, can precisely emulate a switch with WFQ, strict priorities, and other types of QoS, for all traffic. 26

  27. The Fork-Join Router • Questions • Switching: What is the performance? • Forwarding Lookups: How do they work? 27

  28. The Fork-Join RouterLookahead Forwarding Table Lookups Packet tagged with egress port at next router Lookup performed in parallel at rate R/k 28

  29. The Fork-Join Router Router 1 rate, R rate, R 1 1 2 rate, R rate, R N N k • Possibly >100Tb/s aggregate capacity • Linerates in excess of 100Gb/s 29

More Related