180 likes | 289 Views
Analysis of a Memory Architecture for Fast Packet Buffers. Sundar Iyer, Ramana Rao Kompella & Nick McKeown (sundaes,ramana, nickm)@stanford.edu Departments of Electrical Engineering & Computer Science, Stanford University http://klamath.stanford.edu. Example
E N D
Analysis of a Memory Architecture for Fast Packet Buffers Sundar Iyer, Ramana Rao Kompella & Nick McKeown (sundaes,ramana, nickm)@stanford.edu Departments of Electrical Engineering & Computer Science, Stanford University http://klamath.stanford.edu
Example OC768c = 40Gb/s; RTT*BW = 10Gb; 64 byte packets Write Rate, R Buffer Memory Read Rate, R 1 packet every 12.8 ns 1 packet every 12.8 ns Packet Buffer Architecture Goal Determine and analyze techniques for building high speed (>40Gb/s) electronic packet buffers. Stanford University
This is not Possible Today Use SRAM? + fast enough random access time, but - too expensive, and - too low density to store 10Gb of data. Use DRAM? + high density means we can store data, but - too slow (typically 50ns random access time) Stanford University
Characteristics of Packet Buffer Architectures • The total throughput needed is at least 2(Ingress Rate) • Size of Buffer is at least R * RTT • The buffers have one or more FIFOs • The sequence in which the FIFOs are accessed is determined by an arbiter and is unknown apriori Stanford University
Large DRAM memory holds the middle of FIFOs 54 53 52 51 50 10 9 8 7 6 5 1 95 94 93 92 91 90 89 88 87 86 15 14 13 12 11 10 9 8 7 6 2 86 85 84 83 82 11 10 9 8 7 Q Reading b cells Writing b cells 1 1 55 60 59 58 57 56 1 4 3 2 Arriving Departing 2 2 Packets Packets 97 96 2 1 4 3 5 R R Q Q 87 88 91 90 89 6 5 4 3 2 1 Arbiter or Scheduler Small ingress SRAM Small egress SRAM Requests cache of FIFO tails cache of FIFO heads Memory Hierarchy Stanford University
Some Thoughts • The buffer architecture itself is well known • Many buffers are designed to give only statistical guarantees • We would like to be able to give deterministic guarantees Stanford University
System Design Parameters • Main Parameters • SRAM Size • Latency faced by a cell • System Parameters • I/O Bandwidth • Number of addresses • Use single address on every DRAM • Use different addresses on every DRAM • Use/Non Use of DRAM Burst Mode • (non) Existence of Bank conflicts Stanford University
The Six Commandments…. Thou shall … • Assumptions on system parameters • Have no speedup on DRAM • Have no speedup on I/O = 2R • Use a single address bus for every DRAM • Read/Write cells from the same queue • Split packets into cells of size “C”. • Use no DRAM Burst Mode • Design with no bank conflicts Stanford University
Symmetry Argument • The analysis and working of the ingress and egress buffer architectures are similar • We shall analyze only the egress buffer architecture Stanford University
Questions How large does the SRAM cache need to be: • To guarantee that a packet is immediately available in SRAM when requested, or • To guarantee that a packet is available within a maximum bounded time? What Memory Management Algorithm (MMA) should we use? Stanford University
Cells w Cells Cells Cells Q t = 0 t = 1 t = 2 t = 3 Cells Cells Cells Cells t = 4 t = 5 t = 6 t = 7 A Bad Case for the Queues …1 Q=5, w=9+, b=5 Stanford University
Cells Cells Cells Cells t = 8 t = 9 t = 10 t = 11 Cells Cells Cells Cells t = 13 t = 14 t = 17 … t = 12 A Bad Case for the Queues …2 Q=5, w=9+, b=5 Stanford University
ResultSingle Address Bus • Impatient Arbiter: MDQF-MMA (maximum deficit queue first), with a SRAM buffer of size Qb[2 +ln Q] guarantees zero latency. Stanford University
Questions How large does the SRAM cache need to be: • To guarantee that a packet is immediately available in SRAM when requested, or • To guarantee that a packet is available within a maximum bounded time? What Memory Management Algorithm (MMA) should we use? This talk… Stanford University
Cells • In SRAM: A Dynamic Buffer of size Q(b-1) Q Requests in lookahead buffer b - 1 • Lookahead: Arbiter requests for future Q(b-1) + 1 slots are known Q(b-1) + 1 • Compute: Find out the queue which will runs into “trouble” earliest green! Cells • Replenish: “b” cells for the “troubled” queue Q b - 1 Earliest Critical Queue First (ECQF-MMA) Stanford University
Cells Cells Cells Requests in lookahead buffer Requests in lookahead buffer Requests in lookahead buffer t = 0; Green Critical t = 1 t = 2 Cells Cells Cells Requests in lookahead buffer Requests in lookahead buffer Requests in lookahead buffer t = 3 t = 5 t = 4; Blue Critical Cells Cells Cells Requests in lookahead buffer Requests in lookahead buffer Requests in lookahead buffer t = 6 t = 7 t = 8; Red Critical Example of ECQF-MMA: Q=4, b=4 Stanford University
ResultSingle Address Bus • Patient Arbiter: ECQF-MMA (earliest critical queue first), minimizes the size of SRAM buffer to Q(b-1) cells; and guarantees that requested cells are dispatched within Q(b-1)+1 cell slots. Stanford University
Implementation Numbers (64byte cells, b = 8, DRAM T = 50ns) • VOQ Switch - 32 ports • Brute Force : Egress. SRAM = 10 Gb, no DRAM • Patient Arbiter : Egress. SRAM = 115kb, Lat. = 2.9 us, DRAM = 10Gb • Impatient Arbiter : Egress. SRAM = 787 kb, DRAM = 10Gb • Patient Arbiter(MA) : No SRAM, Lat. = 3.2us, DRAM = 10Gb • VOQ Switch - 32 ports, 16 Diffserv classes • Brute Force : Egress. SRAM = 10Gb, no DRAM • Patient Arbiter : Egress. SRAM = 1.85Mb, Lat. = 45.9us, DRAM 10Gb • Impatient Arbiter : Egress. SRAM = 18.9Mb, DRAM = 10Gb • Patient Arbiter(MA) : No SRAM, Lat. = 51.2us, DRAM = 10Gb Stanford University