Analysis of a Memory Architecture for Fast Packet Buffers

Analysis of a Memory Architecture for Fast Packet Buffers Sundar Iyer, Ramana Rao Kompella & Nick McKeown (sundaes,ramana, nickm)@stanford.edu Departments of Electrical Engineering & Computer Science, Stanford University http://klamath.stanford.edu

Example OC768c = 40Gb/s; RTT*BW = 10Gb; 64 byte packets Write Rate, R Buffer Memory Read Rate, R 1 packet every 12.8 ns 1 packet every 12.8 ns Packet Buffer Architecture Goal Determine and analyze techniques for building high speed (>40Gb/s) electronic packet buffers. Stanford University

This is not Possible Today Use SRAM? + fast enough random access time, but - too expensive, and - too low density to store 10Gb of data. Use DRAM? + high density means we can store data, but - too slow (typically 50ns random access time) Stanford University

Characteristics of Packet Buffer Architectures • The total throughput needed is at least 2(Ingress Rate) • Size of Buffer is at least R * RTT • The buffers have one or more FIFOs • The sequence in which the FIFOs are accessed is determined by an arbiter and is unknown apriori Stanford University

Large DRAM memory holds the middle of FIFOs 54 53 52 51 50 10 9 8 7 6 5 1 95 94 93 92 91 90 89 88 87 86 15 14 13 12 11 10 9 8 7 6 2 86 85 84 83 82 11 10 9 8 7 Q Reading b cells Writing b cells 1 1 55 60 59 58 57 56 1 4 3 2 Arriving Departing 2 2 Packets Packets 97 96 2 1 4 3 5 R R Q Q 87 88 91 90 89 6 5 4 3 2 1 Arbiter or Scheduler Small ingress SRAM Small egress SRAM Requests cache of FIFO tails cache of FIFO heads Memory Hierarchy Stanford University

Some Thoughts • The buffer architecture itself is well known • Many buffers are designed to give only statistical guarantees • We would like to be able to give deterministic guarantees Stanford University

System Design Parameters • Main Parameters • SRAM Size • Latency faced by a cell • System Parameters • I/O Bandwidth • Number of addresses • Use single address on every DRAM • Use different addresses on every DRAM • Use/Non Use of DRAM Burst Mode • (non) Existence of Bank conflicts Stanford University

The Six Commandments…. Thou shall … • Assumptions on system parameters • Have no speedup on DRAM • Have no speedup on I/O = 2R • Use a single address bus for every DRAM • Read/Write cells from the same queue • Split packets into cells of size “C”. • Use no DRAM Burst Mode • Design with no bank conflicts Stanford University

Symmetry Argument • The analysis and working of the ingress and egress buffer architectures are similar • We shall analyze only the egress buffer architecture Stanford University

Questions How large does the SRAM cache need to be: • To guarantee that a packet is immediately available in SRAM when requested, or • To guarantee that a packet is available within a maximum bounded time? What Memory Management Algorithm (MMA) should we use? Stanford University

Cells w Cells Cells Cells Q t = 0 t = 1 t = 2 t = 3 Cells Cells Cells Cells t = 4 t = 5 t = 6 t = 7 A Bad Case for the Queues …1 Q=5, w=9+, b=5 Stanford University

Cells Cells Cells Cells t = 8 t = 9 t = 10 t = 11 Cells Cells Cells Cells t = 13 t = 14 t = 17 … t = 12 A Bad Case for the Queues …2 Q=5, w=9+, b=5 Stanford University

ResultSingle Address Bus • Impatient Arbiter: MDQF-MMA (maximum deficit queue first), with a SRAM buffer of size Qb[2 +ln Q] guarantees zero latency. Stanford University

Questions How large does the SRAM cache need to be: • To guarantee that a packet is immediately available in SRAM when requested, or • To guarantee that a packet is available within a maximum bounded time? What Memory Management Algorithm (MMA) should we use? This talk… Stanford University

Cells • In SRAM: A Dynamic Buffer of size Q(b-1) Q Requests in lookahead buffer b - 1 • Lookahead: Arbiter requests for future Q(b-1) + 1 slots are known Q(b-1) + 1 • Compute: Find out the queue which will runs into “trouble” earliest green! Cells • Replenish: “b” cells for the “troubled” queue Q b - 1 Earliest Critical Queue First (ECQF-MMA) Stanford University

Cells Cells Cells Requests in lookahead buffer Requests in lookahead buffer Requests in lookahead buffer t = 0; Green Critical t = 1 t = 2 Cells Cells Cells Requests in lookahead buffer Requests in lookahead buffer Requests in lookahead buffer t = 3 t = 5 t = 4; Blue Critical Cells Cells Cells Requests in lookahead buffer Requests in lookahead buffer Requests in lookahead buffer t = 6 t = 7 t = 8; Red Critical Example of ECQF-MMA: Q=4, b=4 Stanford University

ResultSingle Address Bus • Patient Arbiter: ECQF-MMA (earliest critical queue first), minimizes the size of SRAM buffer to Q(b-1) cells; and guarantees that requested cells are dispatched within Q(b-1)+1 cell slots. Stanford University

Implementation Numbers (64byte cells, b = 8, DRAM T = 50ns) • VOQ Switch - 32 ports • Brute Force : Egress. SRAM = 10 Gb, no DRAM • Patient Arbiter : Egress. SRAM = 115kb, Lat. = 2.9 us, DRAM = 10Gb • Impatient Arbiter : Egress. SRAM = 787 kb, DRAM = 10Gb • Patient Arbiter(MA) : No SRAM, Lat. = 3.2us, DRAM = 10Gb • VOQ Switch - 32 ports, 16 Diffserv classes • Brute Force : Egress. SRAM = 10Gb, no DRAM • Patient Arbiter : Egress. SRAM = 1.85Mb, Lat. = 45.9us, DRAM 10Gb • Impatient Arbiter : Egress. SRAM = 18.9Mb, DRAM = 10Gb • Patient Arbiter(MA) : No SRAM, Lat. = 51.2us, DRAM = 10Gb Stanford University

Analysis of a Memory Architecture for Fast Packet Buffers

Analysis of a Memory Architecture for Fast Packet Buffers

Presentation Transcript

Designing Packet Buffers for Internet Routers

[Packet] My Memory Organization

Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection

Packet analysis

Fast Buffer Memory with Deterministic Packet Departures

Caching Queues in Memory Buffers

Swift: A Fast Dynamic Packet Filter

Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection

The BSD Packet Filter: A New Architecture for User-level Packet Capture

Memory Architecture

Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection

Fast Forwarding Table Lookup Exploiting GPU Memory Architecture

Packet Scheduling with Bounded Buffers

The Designs and Analysis of a Scalable Optical Packet Switching Architecture

Buffers Post Lab Analysis

Designing Packet Buffers for Internet Routers

Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection

Memory Architecture

MEMORY ARCHITECTURE

Designing Packet Buffers for Router Linecards

The Designs and Analysis of a Scalable Optical Packet Switching Architecture

Lecture 3: Memory Buffers and Scheduling