260 likes | 386 Views
Fast Buffer Memory with Deterministic Packet Departures. Mayank Kabra, Siddhartha Saha, Bill Lin University of California, San Diego. Packet Buffer in Routers. Linecards. Incoming linecards have 40byte@40Gbps = 8ns to read and write a packet.
E N D
Fast Buffer Memory with Deterministic Packet Departures Mayank Kabra, Siddhartha Saha, Bill Lin University of California, San Diego
Packet Buffer in Routers Linecards • Incoming linecards have 40byte@40Gbps = 8ns to read and write a packet. • The routers need to store the packets to deal with congestion. • Bandwidth X RTT = 40Gb/s*250ms = 1Gb buffer. • Too big to store in SRAM, hence need to use DRAM. • Problem: DRAM access time ~40ns. So, there is roughly 10x speed difference. In Router Core: Scheduler and Packet Buffers Out In Out In Out
Parallel and Interleaved DRAM banks • Assume the speed difference is 3x P P P P P P SRAM DRAMs
Problems with Parallelism • The access pattern can create problems. • If we try to access 3, 6, 9 and 11 one after another, it is possible to issue interleaved read requests and read those packets out at Line Speed. 1 2 3 7 6 5 4 8 9 12 11 10 13 14 DRAMs
Problems with Parallelism • But, accessing 2 & 3or 10 & 11 in succession is problematic. • This is an example of a Bank Conflict 1 2 3 7 6 5 4 8 9 12 11 10 13 14 DRAMs
Use The Packet Departure Time • Wide classes of routers (Crossbar Routers) where the packets departures are determined by the scheduler on the fly. • Packet buffers which cater to these routers exist but are complex • There are other high performance routers such as Switch-Memory-Switch, Load Balance Routers for which packet departure time can be calculated when the packet is inserted in the buffer. Solution Idea: We will use the known departure times of the packets to schedule them to different DRAM banks such that there won’t be any conflicts.
Packet Buffer Abstraction • Fixed sized packets, time is slotted (Example: 40Gb/s, 40 byte packet => 8ns). • The buffer may contain arbitrary large number of logical queues, but with deterministic access. • Single-write Single-read time-deterministic packet buffer model.
Packet Buffer Architecture • Interleaved memory architecture with multiple slower DRAM banks. • K slower DRAM banks. • b time slots to complete a single memory read or write operation. • b consecutive time slots is a frame. • A time slot t belongs to frame [t/b]
Packet Buffer Operation 1 2 K-1 K DRAMs ... aggregate de-aggregate b packets … … arriving packets departing packets SRAM Bypass Buffer
Packet Arrival [Frame 1] • Frame 1: • Assume b = 3 • Packets P1, P2 & P3 arrive in time slot 1, 2 and 3 respectively. • They are aggregated before writing to the DRAM. 1 2 P3 P2 P1 3 4 5 DRAMs
Packet Arrival [Frame 2] • Frame 2: • Packets P1, P2 & P3 are being written to the DRAM banks (1, 2 & 3) during Frame 2. • New packets P4, P5, P6 comes, which are stored in the buffer. 1 2 P1 P6 P5 P4 P2 3 P3 4 5 DRAMs
Packet Departure [Frame 19] • Packets P58, P59 & P60 are scheduled to depart at time slots 58, 59 and 60 respectively (frame 20). • They will be read from the DRAM banks one frame slot before their departure frame slot (frame 19) 1 P59 2 3 P60 4 P58 5 DRAMs
Packet Departure [Frame 20] • Packets P58, P59 & P60 are read from the buffer and are output from the switch at time slot 58, 59 and 60 respectively. 1 P59 2 P60 3 P58 4 5 DRAMs
SRAM Bypass Buffer • The operational model dictates that the minimum round trip latency to write and read a packet from one of the DRAM banks is 4 frames. • Thus, a packet with a departure time less than 4b-1 time slots away cannot be stored into DRAM. • A small amount of SRAM (size 4b) is used as a bypass buffer.
Number of DRAM banks • Arrival Write Conflicts: At any current frame f, there can be at most b packets that will be written to the DRAM banks (including the current packet). P P P Hence, for each packet, there will be maximum of b-1 “Arrival Write Conflicts” DRAMs
Number of DRAM banks • Arrival Read Conflicts: At any current frame f, there can be at most b packets that will be read from the DRAM banks. Those b banks will be busy in the current time frame and will be unavailable. P P P Hence, for each packet, there will be maximum of b “Arrival Read Conflicts” DRAMs
Number of DRAM banks • Departure Read Conflicts: Any packet that is written in the current frame f, it will eventually need to be read in a future frame d for departure. At that future frame d, there are b-1 other departing packets. P Hence, for each packet, there will be maximum of b-1 “Departure Read Conflicts” P P DRAMs
How Many DRAM Banks? • Total Conflicts: • Arrival Write: (b-1) • Arrival Read: b • Departure Read: (b-1) • Hence, total (3b-2) conflicts. • If the number of banks is more than (3b-2), we will always have a free bank for all the packets. P DRAMs
DRAM Bank Selection • To find a compatible memory, maintain a two dimensional read-transaction bitmap R. • Each row corresponds to a frame slot. • Each column corresponds to a DRAM bank (hence 3b – 1 columns). • R(f, m) denotes whether mthDRAM bank has an already stored packet that must be read at the fth frame slot.
DRAM Bank Selection • Write-reservation bitmap W of size (3b – 1) • W(m) denotes that in current frame, mthmemory bank has been assigned an arriving packet.
DRAM Bank Selection • Approach: Greedy solution avoiding the three types of conflicts. • To check if a memory bank is compatible for a packet p arriving at timeframe f, and having a departure timeframe d: • Check NOT(W(m) | R(f,m) | R(d, m)) • Instead of checking one memory bank at a time, we can check all of them at once: • V = NOT(W | R(f) | R(d)), where R(f) and R(d) are the row vectors. • From V, get the index of the first compatible memory. • If n is the bank selected for p, then set W(n) = 1 and R(d,n) = 1.
Size of the Bitmap • Size of the packet buffer is T packets i.e., T is the farthest departure time slot relative to the current time slot. • Farthest departure frame: • Each row in the bitmap is (3b – 1) bits, then the size of the bitmap is: • Assuming a RTT of 250ms and a line rate of 40Gb/s, the packet buffer would correspond to a memory requirement of T = 3 x 107 packets, which makes the bitmap size close to 11MB.
Additional Details • Location of a packet in the DRAM: • Once a bank has been selected, need a way to assign the actual memory location to write, and later, read the packet. • Determine the memory location based on the departure frame using a circular indexing to map a frame to a packet location in the memory. • How to reorder/de-aggregate the packets? • Store the timestamp in the DRAM with the packet.
Conclusion • Developed a simple packet buffer architecture when the packet departure times are known e.g., Switch-Memory-Switch and Load-Balanced Routers. • Can support arbitrary large number of logical queues. • Number of DRAM banks and SRAM bypass buffer depend only on the physical parameters.