1 / 26

Gigabit Routing on a Software-exposed Tiled-Microprocessor

Gigabit Routing on a Software-exposed Tiled-Microprocessor. James W Anderson, Anthony Degangi, Anant Agarwal Umar Saif MIT Computer Science and AI Laboratory. Network Routers. xKb/sec xGb/sec. ~5 ports ~10 2 ports. Network “Switch” Network “Processor”.

darice
Download Presentation

Gigabit Routing on a Software-exposed Tiled-Microprocessor

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Gigabit Routing on a Software-exposed Tiled-Microprocessor James W Anderson, Anthony Degangi, Anant Agarwal Umar Saif MIT Computer Science and AI Laboratory

  2. Network Routers xKb/sec xGb/sec ~5 ports ~102 ports Network “Switch” Network “Processor”

  3. Three Challenges • Performance • 5 -- 10Gb/sec (OC-192) • Architectural Scalability • Throughput: x2.2/year • Port count: 10 -- 100 for edge routers • Programmability • Network Services: NAT, firewalls, VPN “Layer 7” switches • Monitoring: Loss rate, link utilization, traffic patterns

  4. Network Processors Conventional Wisdom Tiled “all-purpose” architectures

  5. MIT RAW Microprocessor Compute Pipeline 8 stage 32b MIPS-style single-issue in-order compute processor 32 KB IMem 32 KB DCache 4-stage 32b pipelined FPU Routers and wires for three on-chip mesh networks • Tiled-architecture • Low-latency mesh networks • Software-exposed pins • 8 32-bit channels • 2 DOR dynamic networks • Memory Dynamic(MDN) • General Dynamic(GDN) • 2 Static Networks • Streaming Tile-Multicast Registered at input  longest wire = length of tile

  6. RAW Microprocessor RAW Software-exposed tiled-architecture Software exposed Pins Software-exposed point-to-point networks Network Routing Parallel processing Flexible buffering Efficient, scalable switching

  7. However ..

  8. IPv4 Router: RFC 1812 • Look-up • DIR-24-8-BASIC [Gupta98] • Header verification • TTL update, header re-compute • Incremental Checksum [RFC 1141] • Switch to destination

  9. Evaluation Methodology • Maximum Loss Free Forwarding Rate MLFFR • Minimum-sized 64-byte packets • Millions of packets per second (mpps) • Maximum-sized 1500-byte packets: • Gigabit/sec • Captured Internet Trace: ~128 bytes • Packet Latency • RAW Clocked at 425 Mhz • Comparison with IXP1200 as a reference point

  10. RAW Router, Take 1: Parallelism Header Verify SRAM DRAM Packet Buffer Lookup tables Line Card Line Card Line Card Lookup 2 stage lookup Line Card Drain FIFO Line Card Line Card Line Card Line Card Packet Buffer Lookup tables SRAM DRAM Header recompute Interrupt Drain-tile

  11. Flow of Packets L: LookupV: VerifyU: UpdateD: Drain Lookup DRAM Line Card Line Card L V U D Line Card Line Card Line Card Line Card Line Card Line Card Lookup DRAM

  12. RAW Router, Take 1 Static MDN GDN SRAM DRAM Line Card Line Card Line Card Line Card Line Card Line Card Line Card Line Card • Static Network for Streaming Packets • Feed the pipeline • Stream the payload to DRAM • General Dynamic Network • Header Forwarding 3 -> 4 • Memory Dynamic Network • From memory to line-card SRAM DRAM

  13. Version I Performance 1.8 Gb/sec -- > 6.17Gb/sec 2.9 mpps -- > 6.23 mpps

  14. RAW Router Version 1 Shared Buffering SRAM DRAM Bus Contention Line Card Line Card Line Card Line Card Line Card Line Card Line Card Line Card Memory Dynamic Network DOR: x --> y SRAM DRAM

  15. RAW Router, Take 2: Buffering and Switching Line Card Line Card Line Card Line Card Drain FIFO Header recompute Interrupt Drain-tile SDRAM SDRAM Lookup 2 stage lookup SDRAM SDRAM Header Verify Lookup Lookup Line Card Line Card Line Card Line Card

  16. RAW Router, Take 2 Static MDN Line Card Line Card Line Card Line Card GDN • Respects DOR • No “bus contention” for DMAs (bottleneck is shared SDRAMs) • 2x Memory BW • No need to look at packet length • Dynamic networks for “out-of-band” communication SDRAM SDRAM SDRAM SDRAM Lookup Lookup Line Card Line Card Line Card Line Card

  17. Optimized buffering and switching 6.17 Gb/sec -- > 8.68Gb/sec 6.17 mpps -- > 6.77 mpps

  18. RAW Router, take 3: Reducing Memory Transactions Streaming DDR No fragmentation of frames Line Card Line Card Line Card Line Card SDRAM DRAM Pipelined Memory Requests SDRAM SDRAM SDRAM SDRAM Line Card Line Card Line Card Line Card

  19. Streaming packet buffers + 64-byte minimum buffering 8.68 Gb/sec -- > 9.57Gb/sec 6.77 mpps -- > 9.79 mpps

  20. Buffering on Line-cards 9.57 Gb/sec -- > 15.03Gb/sec 9.79 mpps -- > 9.79mpps

  21. All dynamic networks 9.57 Gb/sec -- > 8.50Gb/sec 9.57 mpps -- > 6.94 mpps

  22. Evaluation with captured Trace

  23. Packet Latency

  24. Conclusions • Tiled-architectures = NPU performance + enhanced programmability • RAW’s low-level software-control was vital for deriving performance: • Layout of routing functions • 30% improvement by altering layout • Role and behavior of the on-chip networks • 15% improvement by using GDN and static networks in place of MDN

  25. Conclusions • Network oblivious: 30-35% degradation • No Static networks: 10-30% degradation • Buffering on line-cards: 35% improvement

  26. Thank you! Questions: umar@mit.edu

More Related