210 likes | 288 Views
LIBRA: Multi-mode On-Chip Network Arbitration for Locality-Oblivious Task Placement. - Master’s degree defense -. Gwangsun Kim Computer Science Department Korea Advanced Institute of Science and Technology 2011. 12. 20. Table of Contents. Motivation LIBRA
E N D
LIBRA: Multi-mode On-Chip Network Arbitration for Locality-Oblivious Task Placement - Master’s degree defense - Gwangsun Kim Computer Science Department Korea Advanced Institute of Science and Technology 2011. 12. 20.
Table of Contents • Motivation • LIBRA • Introduction to Probabilistic Distance-based Arbitration • Virtual Contention-based Arbitration • Hybrid Arbitration • Evaluation • Conclusions
Motivation • On-Chip Network is an important shared resource in CMP. • Fair allocation of shared resource is needed. [Data collected by C. Batten, Y. Pan]
Motivation • Experiment: 16-core CMPRun SPEC benchmark and 15 copies of memory-intensive microbenchmark to create hotspot.The location of SPEC bench is varied. • Round-robin arbiter resultsin a significant unfairness. • Why fairness in OCN matters? • Hard to predict performance (SLA). • Complicates OS design. • Parallel application slowdown. • This work proposes LIBRA,an OCN support for locality-oblivious task placement. Hotspot MC Up to 12x!
Overview of LIBRA • Locality-Oblivious Bandwidth Regulatory Aribter • Libra: constellation of zodiac thatsymbolizes a balance. • Leverages probabilistic distance-based arbitration (MICRO’10) • Consists of two mechanisms: • Virtual contention arbitration (VCA) • Solve with unfairness • Hybrid arbitration • Solve high latency problem • Combination of 1 and 2: multi-mode arbitration
Probabilistic Distance-based Arbitration (PDBA) • Proposed to provide fairness in on-chip networks. • Probabilistic arbitration • Weight is multiplied by contention degree source queue 1 1 1 1 1 1 x2 x2 x1 1 1 2 2 4 4 2 x2 x2 Router 1 Router 2 Router 0
Limitation of Real Contention-based Arbitration • Real contention: when two or more requests contend. • Real contention-based arbitration (RCA): • Non-contention is not accounted for. • In many cases, there is no real contention → unfairness 1 1 4 4 2 Unfair bandwidth allocation!
Virtual Contention-based Arbitration (VCA) • Considers historical non-contention in future arbitration. • Two modes • Virtual contention mode example: Real contention mode Virtual contention mode Increase priority counter by Last weight: 1 1 Priority counter: 0 1 Virtualcontention Last weight: 4 2 Priority counter: 0 4 4
Virtual Contention-based Arbitration Cont’d • Real contention mode example: • If priority of all ports are the same, then do PDBA. Last weight: 1 1 Priority counter: 0 1 Realcontention Last weight: 4 2 2 4>0, so wins. Priority counter: 4 3 Decrement priority counter.
Hybrid Arbiter • VCA increases router critical path → low clock freq. • Observation: fairness matters only at high load. • At low load, there are few contention → RR is fine. • At high load, there are many contention and the impact is huge VCA is needed, but packets are queued up in the buffer → more time for processing. 1 1 Do pre-calculation 1 1 VCA RR 2 2 2 2 Low load: RR has little impact on fairness High load: VCA provides fairness
Hybrid Arbiter Cont’d • If there was no chance for pre-calculation, use RR. • Use VCAwhenever possible.
LIBRA: Multi-mode Arbitration • Operate in one of multiple modes depending on contention type and load. • Contention type: # of requests for the output port • Load: whether pre-calculation is done or not
Methodology • Area and timing evaluation: Synopsys Design Compiler and IC Compiler. • Synthetic simulation using cycle-accurate Booksim simulator. • SPEC CPU 2006 application and microbenchmark simulation using cycle-accurate GEMS + Booksim simulator. Synthetic traffic simulation parameters GEMS simulation parameters
Timing and Area • Baseline (RR): 1.4GHz and 0.07mm2 • LIBRA reduces latency significantly,while introducing low area overhead. [MICRO’10]
Synthetic Traffic Evaluation • Network stability and throughput Uniform random Tornado Bitcomp
Support for Locality-oblivious Task Placement • Configuration • 14 copies of memory-intensive microbenchmark. • SPEC bench. placement: closest or farthest to the hotspot. • LIBRA reduces max. slowdown by 2.7x and 1.8x compared to RR and AGE, respectively.
Analysis on Unfairness of AGE • AGE can be unfair in closed-loop evaluation. : buffer depth : # of in-flight packet from • Assumptions: • All nodes send packets to MC • Ideal age-based arbitration • Steady state , ,
Cost Comparison of QoS Mechanisms • Area overhead comparison: additional area overhead per node (um2) [MICRO’10] [MICRO’09] [MICRO’10] [ISCA’08] LIBRA achieves 38% lower area overhead! (compared to PVC)
Conclusions • Impact of task placement on performance: up to 30x with RR. • This work proposes LIBRA, a multi-mode arbitration. • VCA for providing global fairness. • Hybrid arbitration for reducing latency overhead. • LIBRA can support locality-oblivious task placement. • Analysis on unfairness of age-based arbitration. • LIBRA has 38% lower area overhead compared to PVC.
Q&A Thank you!
Hybrid Arbiter Cont’d • If there was no chance for pre-calculation, use RR. • Use VCAwhenever possible. Pre-calculationstage (PC) Arbitration stage (SAc) < + < X + X