200 likes | 339 Views
Statistical Analysis of Packet Buffer Architectures. Gireesh Shrimali, Isaac Keslassy, Nick McKeown E-mail: gireesh@stanford.edu. Packet Buffering. Line rate, R. Line rate, R. Memory. Memory. 1. 1. Scheduler.
E N D
Statistical Analysis of Packet Buffer Architectures Gireesh Shrimali, Isaac Keslassy, Nick McKeown E-mail: gireesh@stanford.edu
Packet Buffering Line rate, R Line rate, R Memory Memory 1 1 Scheduler • Big: For TCP to work well, the buffers need to hold one RTT (about 0.25s) of data. • Fast: Clearly, the buffer needs to store (retrieve) packets as fast as they arrive (depart). Memory Line rate, R Line rate, R N N Scheduler Scheduler Input or Output Line Card Shared Memory Buffer
Write Rate, R One 40B packet every 8ns Read Rate, R Scheduler requests causes random access One 40B packet every 8ns An ExamplePacket buffers for a 40Gb/s line card 10Gbits Problem is solved if a memory can be (random) accessed every 4 ns and store 10Gb of data Buffer Memory Buffer Manager
Key Question How can we design high speed packet buffers from commodity available memories?
Available Memory Technology • Use SRAM? +Fast enough random access time, but • Too low density to store 10Gbits of data. • Use DRAM? +High density means we can store data, but • Can’t meet random access time.
Read Rate, R One 40B packet every 8ns Can’t we just use lots of DRAMs in parallel? Buffer Memory Buffer Memory Buffer Memory Buffer Memory Buffer Memory Buffer Memory Buffer Memory Buffer Memory 40B 320B 320B 320B Write Rate, R Buffer Manager One 40B packet every 8ns Scheduler Requests
320B 320B 320B 320B 320B 320B 320B 320B 320B 320B 40B 40B 40B 40B 40B 40B 40B 40B Works fine if there is only one FIFO queue 40B 320B 320B 320B Write Rate, R Read Rate, R Buffer Manager(on chip SRAM) 40B 40B 320B 320B One 40B packet every 8ns One 40B packet every 8ns Scheduler Requests Aggregate 320B for the queue in fast SRAM and read and write to all DRAMs in parallel
In practice, buffer holds many FIFOs 1 320B 320B 320B 320B • e.g. • In an IP Router, Q might be 200. • In an ATM switch, Q might be 106. We don’t know which head of line packet the scheduler will request next? 2 320B 320B 320B 320B Q 320B 320B 320B 320B 40B 320B 320B 320B Write Rate, R Read Rate, R Buffer Manager(on chip SRAM) ?B ?B 320B 320B One 40B packet every 8ns One 40B packet every 8ns Scheduler Requests
1 2 Q 1 1 55 60 59 58 57 56 1 4 3 2 2 2 97 96 2 1 4 3 5 Q Q 87 88 91 90 89 6 5 4 3 2 1 Small tail SRAM Small head SRAM cache for FIFO tails cache for FIFO heads Parallel Packet BufferHybrid Memory Hierarchy Large DRAM memory holds the body of FIFOs 54 53 52 51 50 10 9 8 7 6 5 95 94 93 92 91 90 89 88 87 86 15 14 13 12 11 10 9 8 7 6 86 85 84 83 82 11 10 9 8 7 DRAM B = degree of parallelism Writing B cells Reading B cells Buffer Manager Arriving Departing Packets Packets R R (ASIC with on chip SRAM) Scheduler Requests
Objective • Would like to Minimize the size of SRAM while providing reasonable guarantees • So, ask the following question If the designer is willing to tolerate a certain drop probability then how small can the SRAM get?
Memory Management Algorithm • Algorithm: At every service opportunity serve a FIFO from the set of FIFOs with occupancy greater than or equal to B • B-work conserving - thus minimizes SRAM size • Round-robin performs as well as largest FIFO first • Some definitions • FIFO occupancy counter: L(i,t) • Sum of occupancies: L(t)
Model A(1,t) L(t) A(t) D(t) A(Q,t) • Model SRAM as a queue • Arrival process A(t) superposition of Q sources A(i,t) with rates • Deterministic service at rate 1 • Queue is stable, i.e., • Approach: assume A(i,t) are independent of each other • Step 1: Analyze for IID sources • Step 2: Show that the IID case is the worst case • Tools used • Analysis in continuous time domain • Use
Fixed Batch Decomposition L(t) R(1,t) A(1,t) B*MA(1,t) B*ML(t) B*MD(t) Arrivals Departures R(Q,t) A(Q,t) B*MA(Q,t) R(t) Remainder Workload Quotient Workload
Assumptions A(i,t) are • independent of each other • stationary and ergodic • simple point processes
PDF of SRAM Occupancy • Theorem: The quotient workload and the remainder workload are independent of each other • Thus The distribution of SRAM occupancyis the convolution of the distributions of the quotientand remainder workloads
PDF of Remainder Workload • Theorem: For large Q, PDF of remainder workload approaches a Gaussian distribution with mean Q(B- 1)/2 & variance Q(B^2-1)/12 • Intuition: Application of central limit theorem
PDF of Quotient Workload • Theorem [Cao, Ramanan INFOCOM 2002]: For large Q, the behavior of the quotient FIFO approaches the behavior of an M/D/1 queue with the same load • Numerical solution through recurrence relations • Depends only on load • Independent of Q and B • Close to impulse at low loads
PDF of Buffer Occupancy • Q = 1024; B = 4; Q(B-1)/2 = 1536
Simulations (load=0.9) • Complementary CDF for Q = 1024; B = 4; load = 0.9 • Theory upper bounds simulations
Conclusions • Established exact bounds relating the drop probability to the SRAM size • Model may be applicable to many queueing systems with batch service • Compared to deterministic guarantees ([Iyer, McKeown HPSR 2001]), an improvement by at most a factor of two • O(QB) a hard lower bound for this architecture