Statistical Analysis of Packet Buffer Architectures

Statistical Analysis of Packet Buffer Architectures Gireesh Shrimali, Isaac Keslassy, Nick McKeown E-mail: gireesh@stanford.edu

Packet Buffering Line rate, R Line rate, R Memory Memory 1 1 Scheduler • Big: For TCP to work well, the buffers need to hold one RTT (about 0.25s) of data. • Fast: Clearly, the buffer needs to store (retrieve) packets as fast as they arrive (depart). Memory Line rate, R Line rate, R N N Scheduler Scheduler Input or Output Line Card Shared Memory Buffer

Write Rate, R One 40B packet every 8ns Read Rate, R Scheduler requests causes random access One 40B packet every 8ns An ExamplePacket buffers for a 40Gb/s line card 10Gbits Problem is solved if a memory can be (random) accessed every 4 ns and store 10Gb of data Buffer Memory Buffer Manager

Key Question How can we design high speed packet buffers from commodity available memories?

Available Memory Technology • Use SRAM? +Fast enough random access time, but • Too low density to store 10Gbits of data. • Use DRAM? +High density means we can store data, but • Can’t meet random access time.

Read Rate, R One 40B packet every 8ns Can’t we just use lots of DRAMs in parallel? Buffer Memory Buffer Memory Buffer Memory Buffer Memory Buffer Memory Buffer Memory Buffer Memory Buffer Memory 40B 320B 320B 320B Write Rate, R Buffer Manager One 40B packet every 8ns Scheduler Requests

320B 320B 320B 320B 320B 320B 320B 320B 320B 320B 40B 40B 40B 40B 40B 40B 40B 40B Works fine if there is only one FIFO queue 40B 320B 320B 320B Write Rate, R Read Rate, R Buffer Manager(on chip SRAM) 40B 40B 320B 320B One 40B packet every 8ns One 40B packet every 8ns Scheduler Requests Aggregate 320B for the queue in fast SRAM and read and write to all DRAMs in parallel

In practice, buffer holds many FIFOs 1 320B 320B 320B 320B • e.g. • In an IP Router, Q might be 200. • In an ATM switch, Q might be 106. We don’t know which head of line packet the scheduler will request next? 2 320B 320B 320B 320B Q 320B 320B 320B 320B 40B 320B 320B 320B Write Rate, R Read Rate, R Buffer Manager(on chip SRAM) ?B ?B 320B 320B One 40B packet every 8ns One 40B packet every 8ns Scheduler Requests

1 2 Q 1 1 55 60 59 58 57 56 1 4 3 2 2 2 97 96 2 1 4 3 5 Q Q 87 88 91 90 89 6 5 4 3 2 1 Small tail SRAM Small head SRAM cache for FIFO tails cache for FIFO heads Parallel Packet BufferHybrid Memory Hierarchy Large DRAM memory holds the body of FIFOs 54 53 52 51 50 10 9 8 7 6 5 95 94 93 92 91 90 89 88 87 86 15 14 13 12 11 10 9 8 7 6 86 85 84 83 82 11 10 9 8 7 DRAM B = degree of parallelism Writing B cells Reading B cells Buffer Manager Arriving Departing Packets Packets R R (ASIC with on chip SRAM) Scheduler Requests

Objective • Would like to Minimize the size of SRAM while providing reasonable guarantees • So, ask the following question If the designer is willing to tolerate a certain drop probability then how small can the SRAM get?

Memory Management Algorithm • Algorithm: At every service opportunity serve a FIFO from the set of FIFOs with occupancy greater than or equal to B • B-work conserving - thus minimizes SRAM size • Round-robin performs as well as largest FIFO first • Some definitions • FIFO occupancy counter: L(i,t) • Sum of occupancies: L(t)

Model A(1,t) L(t) A(t) D(t) A(Q,t) • Model SRAM as a queue • Arrival process A(t) superposition of Q sources A(i,t) with rates • Deterministic service at rate 1 • Queue is stable, i.e., • Approach: assume A(i,t) are independent of each other • Step 1: Analyze for IID sources • Step 2: Show that the IID case is the worst case • Tools used • Analysis in continuous time domain • Use

Fixed Batch Decomposition L(t) R(1,t) A(1,t) B*MA(1,t) B*ML(t) B*MD(t) Arrivals Departures R(Q,t) A(Q,t) B*MA(Q,t) R(t) Remainder Workload Quotient Workload

Assumptions A(i,t) are • independent of each other • stationary and ergodic • simple point processes

PDF of SRAM Occupancy • Theorem: The quotient workload and the remainder workload are independent of each other • Thus The distribution of SRAM occupancyis the convolution of the distributions of the quotientand remainder workloads

PDF of Remainder Workload • Theorem: For large Q, PDF of remainder workload approaches a Gaussian distribution with mean Q(B- 1)/2 & variance Q(B^2-1)/12 • Intuition: Application of central limit theorem

PDF of Quotient Workload • Theorem [Cao, Ramanan INFOCOM 2002]: For large Q, the behavior of the quotient FIFO approaches the behavior of an M/D/1 queue with the same load • Numerical solution through recurrence relations • Depends only on load • Independent of Q and B • Close to impulse at low loads

PDF of Buffer Occupancy • Q = 1024; B = 4; Q(B-1)/2 = 1536

Simulations (load=0.9) • Complementary CDF for Q = 1024; B = 4; load = 0.9 • Theory upper bounds simulations

Conclusions • Established exact bounds relating the drop probability to the SRAM size • Model may be applicable to many queueing systems with batch service • Compared to deterministic guarantees ([Iyer, McKeown HPSR 2001]), an improvement by at most a factor of two • O(QB) a hard lower bound for this architecture

Statistical Analysis of Packet Buffer Architectures

Statistical Analysis of Packet Buffer Architectures

Presentation Transcript

EE384x: Packet Switch Architectures

Performance Analysis of Software Architectures

Statistical Analysis of DNA

Performance Analysis of Multiprocessor Architectures

Block-Based Packet Buffer with Deterministic Packet Departures

Fundamentals of Statistical Analysis

Standards and Statistical Production Architectures

Packet analysis

Fast Buffer Memory with Deterministic Packet Departures

Statistical Analysis of Data

EE384x: Packet Switch Architectures I

048866: Packet Switch Architectures

048866: Packet Switch Architectures

EE384x: Packet Switch Architectures

Statistical analysis of variability

048866: Packet Switch Architectures

EE384x: Packet Switch Architectures

Packet Switches with Output and Shared Buffer

Packet Scheduling for Deep Packet Inspection on Multi-Core Architectures

Competitive Buffer Management with Packet Dependencies

OR Project Group II: Packet Buffer Proposal

Packet Switch Architectures