1 / 22

Architectural Results in the Optical Router Project

Architectural Results in the Optical Router Project. Da Chuang, Isaac Keslassy, Nick McKeown High Performance Networking Group http://klamath.stanford.edu. Internet traffic x2/yr. 5x. Router capacity x2.2/18 months. Fast (large) routers. Big POPs need big routers.

Download Presentation

Architectural Results in the Optical Router Project

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Architectural Results in the Optical Router Project Da Chuang, Isaac Keslassy, Nick McKeown High Performance Networking Group http://klamath.stanford.edu

  2. Internet traffic x2/yr 5x Router capacity x2.2/18 months

  3. Fast (large) routers • Big POPs need big routers POP with large routers POP with smaller routers • Interfaces: Price >$200k, Power > 400W • About 50-60% of interfaces are used for interconnection within the POP. • Industry trend is towards large, single router per POP.

  4. 100Tb/s optical router • Objective • To determine the best way to incorporate optics into routers. • Push technology hard to expose new issues. • Photonics, Electronics, System design • Motivating example: The design of a 100 Tb/s Internet router • Challenging but not impossible (~100x current commercial systems) • It identifies some interesting research problems

  5. 100Tb/s optical router Optical Switch Electronic Linecard #1 Electronic Linecard #625 160- 320Gb/s 160- 320Gb/s 40Gb/s • Line termination • IP packet processing • Packet buffering • Line termination • IP packet processing • Packet buffering 40Gb/s 160Gb/s 40Gb/s Arbitration Request 40Gb/s Grant (100Tb/s = 625 * 160Gb/s)

  6. Research Problems • Linecard • Memory bottleneck: Address lookup and packet buffering. • Architecture • Arbitration: Computation complexity. • Switch Fabric • Optics: Fabric scalability and speed, • Electronics: Switch control and link electronics, • Packaging: Three surface problem.

  7. Write Rate, R One 40B packet every 8ns Read Rate, R One 40B packet every 8ns Packet Buffering Problem Packet buffers for a 40Gb/s router linecard 10Gbits Buffer Memory Buffer Manager

  8. Memory Technology • Use SRAM? +Fast enough random access time, but • Too low density to store 10Gbits of data. • Use DRAM? +High density means we can store data, but • Can’t meet random access time.

  9. Read Rate, R One 40B packet every 8ns Can’t we just use lots of DRAMs in parallel? Read/write 320B every 32ns Buffer Memory Buffer Memory Buffer Memory Buffer Memory Buffer Memory Buffer Memory Buffer Memory Buffer Memory Bytes: 0-39 40-79 … … … … … 280-319 320B 320B Write Rate, R Buffer Manager One 40B packet every 8ns

  10. 320B 320B 320B 320B 320B 320B 320B 320B 320B 320B 40B 40B 40B 40B 40B 40B 40B 40B Works fine if there is only one FIFO Buffer Memory Bytes: 0-39 40-79 … … … … … 280-319 320B 320B Write Rate, R Read Rate, R Buffer Manager 40B 40B 320B 320B One 40B packet every 8ns One 40B packet every 8ns

  11. 320B 320B Write Rate, R Read Rate, R Buffer Manager ?B ?B 320B 320B One 40B packet every 8ns One 40B packet every 8ns In practice, buffer holds many FIFOs 1 320B 320B 320B 320B • e.g. • In an IP Router, Q might be 200. • In an ATM switch, Q might be 106. How can we writemultiple packets into different queues? 2 320B 320B 320B 320B Q 320B 320B 320B 320B Bytes: 0-39 40-79 … … … … … 280-319

  12. Large DRAM memory holds the body of FIFOs 54 53 52 51 50 10 9 8 7 6 5 1 95 94 93 92 91 90 89 88 87 86 15 14 13 12 11 10 9 8 7 6 2 86 85 84 83 82 11 10 9 8 7 DRAM Q Reading b bytes Writing b bytes 1 1 4 3 1 2 Arriving Departing 55 60 59 58 57 56 2 Packets Packets 2 1 2 4 3 5 97 96 R R Q Q 6 5 4 3 2 1 SRAM 87 88 91 90 89 Arbiter or Scheduler Small head SRAM Small tail SRAM Requests cache for FIFO heads cache for FIFO tails Hybrid Memory Hierarchy

  13. 160Gb/s Linecard: Packet Buffering DRAM DRAM DRAM 160 Gb/s 160 Gb/s Queue Manager SRAM • Solution • Hybrid solution uses on-chip SRAM and off-chip DRAM. • Identified optimal algorithms that minimize size of SRAM (12 Mbits). • Precisely emulates behavior of 40 Gbit SRAM. klamath.stanford.edu/~nickm/papers/ieeehpsr2001.pdf

  14. Research Problems • Linecard • Memory bottleneck: Address lookup and packet buffering. • Architecture • Arbitration: Computation complexity. • Switch Fabric • Optics: Fabric scalability and speed, • Electronics: Switch control and link electronics, • Packaging: Three surface problem.

  15. 100Tb/s optical router Optical Switch Electronic Linecard #1 Electronic Linecard #625 160- 320Gb/s 160- 320Gb/s 40Gb/s • Line termination • IP packet processing • Packet buffering • Line termination • IP packet processing • Packet buffering 40Gb/s 160Gb/s 40Gb/s Arbitration Request 40Gb/s Grant (100Tb/s = 625 * 160Gb/s)

  16. The Arbitration Problem • A packet switch fabric is reconfigured for every packet transfer. • At 160Gb/s, a new IP packet can arrive every 2ns. • The configuration is picked to maximize throughput and not waste capacity. • Known algorithms are too slow.

  17. 1 1 N N Problem: real traffic is non-uniform Cyclic Shift? Uniform Bernoulli iid traffic: 100% throughput

  18. 1 2 2 1 1 1 1 N N N Two-Stage Switch External Inputs Internal Inputs External Outputs Load-balancing cyclic shift Switching cyclic shift 100% throughput for broad range of traffic types (C.S. Chang et al., 2001)

  19. 1 2 1 2 1 1 1 N N N Problem: mis-sequencing External Inputs Internal Inputs External Outputs Cyclic Shift Cyclic Shift

  20. 1 1 1 N N N Preventing Mis-sequencing Large Congestion Buffers Small Coordination Buffers & ‘FFF’ Algorithm Cyclic Shift Cyclic Shift • The Full Frames First algorithm: • Keeps packets ordered and • Guarantees a delay bound within the optimum Infocom’02: klamath.stanford.edu/~nickm/papers/infocom02_two_stage.pdf

  21. Conclusions • Packet Buffering • Emulation of SRAM speed with DRAM density • Packet buffer for a 160 Gb/s linecard is feasible • Arbitration • Developed Full Frames First Algorithm • 100% throughput without scheduling

  22. External Inputs Internal Inputs External Outputs q(t) 2(t) 1(t) 1 a(t) b(t) 1 1 N • First cyclic shift: N N • Traffic rate: • Long-term service opportunities exceed arrivals: Two-Stage Switch

More Related