280 likes | 381 Views
Adaptive Backpressure: Efficient Buffer Management for On-Chip Networks. Authors: Daniel U. Becker , Nan Jiang, George Michelogiannakis , William J. Dally Stanford University Presenter: Han Liu University of California, San Diego. Background. NoCs become huge
E N D
Adaptive Backpressure:Efficient Buffer Management for On-Chip Networks Authors: Daniel U. Becker, Nan Jiang, George Michelogiannakis, William J. Dally Stanford University Presenter: Han Liu University of California, San Diego
Background • NoCs become huge • Hundreds of cores on a single die • Currently using: Input-queued routers • Input buffer resources become significant • Input buffer sharing is attractive in NoCs • Pros: Improves area and power efficiency • Cons: facilitates spread of congestion Han Liu
Overview • Adaptive Backpressure mitigates performance degradation by avoiding unproductive use of buffer space in the presence of congestion • Avoid downsides of buffer sharing while maintaining benefits in benign case Han Liu
Motivation • Assumption: buffers are good • More flexible routing • Helps traffic waiting closer to the destination • Is this always true? • Energy, area efficiency • Implementation difficulty Han Liu
Train Example Boston (Destination) Denver (buffer) San Diego (Source) Buffers are good Han Liu
Motivation • Static buffer vsDynamic buffer management VC1 Static VC2 Wasted buffer VC1 Dynamic VC2 Han Liu
Dynamic Buffer Management • Buffer space is expensive resource in NoCs • 30-35% network power (MIT RAW, UT TRIPS) • Dynamic management increases utilization by sharing buffer space among multiple VCs • Optimize use of expensive buffer resources • Decrease incremental cost of VCs • Improved area and power efficiency • 25% more throughput or 34% less power [Nicopoulos’06] Han Liu
Sharing • Pros • Economic • Efficient • Cons • Inconvenient • Trouble Han Liu
Boarder Example US HWY5 HWY805 Mexico Han Liu
Buffer Monopolization • Blocked flits from congested VC accumulate in buffer • Effective buffer size reduced for other VCs • Performance degradation (latency / throughput) • Congestion spreads across VCs (flows / apps / VMs / …) VC 0 VC 1 Han Liu
Adaptive Backpressure Goal: • Avoid unproductive use of buffer space in dynamic buffer management • But allow sharing when beneficial Approach: • Match arrival and departure rate for each VC by regulating credit availability (backpressure) • Derive quota from credit round trip times Han Liu
Buffer Monopolization • Want a way to regulate unlimited credits supply to congested VC1 • Give VC0 more credits and buffer space VC 0 VC 1 Han Liu
Quota Motivation (1) Router 0 Router 1 Router 0 Router 1 Tcrt,0 Idle cycle Credit stall time Without congestion, full throughput requires Tcrt,0 credits Insufficient credit supply causes idle cycle downstream Han Liu
Quota Motivation (2) Router 0 Router 1 Router 0 Router 1 Tcrt,0+Tstall Congestion stall Congestion stall Queuing stall Queuing stall Queuing stall Queuing stall Queuing stall Queuing stall Credit stall Excess flits Excess flits Excess drained time Congestion stall causes unproductive buffer occupancy Matching stalls avoids unproductive buffer occupancy Han Liu
Quota Algorithm • VC’s quota value = Throughput * RRTmin • Throughput of upstream router is hard to measure • -> Compute quota values based on observefdRTTfor individual credits Han Liu
Quota Heuristic • Track credit RTT for each output VC • RTT=RTTmin⇒ set quota to RTTmin • No downstream congestion • Allow one flit in each cycle of RTT interval • RTT>RTTmin⇒ subtract difference from RTTmin • Each congestion and queuing stall adds to RTT • Allow one credit stall per downstream stall Han Liu
Quota Equation • Q = max(Tcrt,base - (Tcrt,obs- Tcrt,base), 1 ) = max(2 * Tcrt,base - Tcrt,obs , 1) • When Tcrt,obs is large, Q is small • Qmin = 1 in order to guarantee that quota values can continue to be updated Han Liu
Implementation • Network design determines RTTmin for each link • Track RTT for single in-flight credit per VC • Update quota value upon return • Switch allocator masks all VCs that exceed quota • Simple extension to existing flow control logic • No additional signaling required • < 5% overhead for 16x64b buffer with 4 VCs Han Liu
Evaluation Methodology • BookSim 2.0 • 8x8 2D mesh, 64-bit channels, DOR • 16-slot input buffers, 4 VCs • Combined VC and switch allocation • Synthetic traffic and application benchmarks • Compare ABP to unrestricted sharing Han Liu
Network Stability (1) • For adversarial traffic, throughput in Mesh is unstable at high load • Traffic merging causes starvation • Tree saturation causes widespread congestion • ABP improves stability • Throttles sources that inject at very high rate • Efficient buffer use reduces tree saturation • Faster recovery from transient congestion Han Liu
Network Stability (2) [tornado traffic] 6.3x Han Liu
Network Stability (3) [foreground traffic at 50% injection rate] 3.3x -13% saturation rate Han Liu
Performance Isolation (1) • Inject two classes of traffic into network • Shared buffer space, separate VCs • Sharing causes interference between classes (leads to latency problem) • ABP reduces interference • Contains effects of congestion within a class • Better isolation between workloads, VMs, … Han Liu
Performance Isolation (2) [uniform random foreground traffic] -33% -38% [uniform random background traffic] [hotspot background traffic] Han Liu
Performance Isolation (3) [50% uniform random background traffic] -31% w/o background Han Liu
Application Performance -31% w/o background [12.5% injection rate for streaming traffic] Han Liu
Conclusions • Sharing improves buffer utilization, but can lead to undesired interference effects • Adaptive Backpressureregulates credit flow to avoid unproductive use of shared buffer space • Mitigates performance degradation in presence of adversarial traffic • But maintains key benefits of buffer sharing under benign conditions Han Liu
Question? Thank you for your attention! The End Han Liu