500 likes | 698 Views
CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees. Zefu Dai, Mark Jarvin and Jianwen Zhu. University of Toronto. Background. Consumer Electronics is part of everyday life!. SoC. Mem Contr. DRAM. Background. A portable media player SoC example. Background.
E N D
CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees Zefu Dai, Mark Jarvin and Jianwen Zhu University of Toronto
Background • Consumer Electronics is part of everyday life! SoC Mem Contr. DRAM University of Toronto
Background • A portable media player SoC example University of Toronto
Background • A portable media player SoC example University of Toronto
Background • A portable media player SoC example 1.2 6.4 9.6 164.8 0.09 31.0 156.7 94 MB/s University of Toronto
Background • A portable media player SoC example 1.2 6.4 9.6 164.8 0.09 31.0 156.7 94 MB/s 1000x University of Toronto
Background Give me 10 KB in 1 us, please. • A portable media player SoC example 1.2 6.4 9.6 164.8 0.09 31.0 156.7 94 MB/s University of Toronto
Background Give me 10 KB in 1 us, please. • A portable media player SoC example I want the data NOW!!! 1.2 6.4 9.6 164.8 0.09 31.0 156.7 94 MB/s University of Toronto
Background Give me 10 KB in 1 us, please. • A portable media player SoC example I want the data NOW!!! 1.2 6.4 9.6 164.8 0.09 31.0 156.7 94 MB/s I can only supply a maximum of 6.4 GB every second. University of Toronto
Challenges • Simultaneously satisfy: • Bandwidth requirements • Latency requirements University of Toronto
Previous Work • QoS aware • Bandwidth or latency is heuristically improved • QoS guaranteed • Guaranteed minimum bandwidth and / or latency University of Toronto
Main Ideas • Start with Bandwidth Guaranteed Prioritized Queuing (BGPQ) algorithm • Bandwidth guarantee • Improve it using Credit Borrow and Repay (CBR) mechanism • Minimum latency guarantee University of Toronto
Bandwidth Guaranteed Prioritized Queuing • Combine both the benefits of the Priority Queuing and Weighted Fair Queuing • Credit based Weighted Fair Queuing • Prioritized service for residual bandwidth allocation • Residual bandwidth: • The bandwidth assigned to one user that is unused at a specific point of time University of Toronto
BGPQ Algorithm • Case 1: all queues are busy • No residual bandwidth • Act as WFQ BGPQ Scheduler Initial state: everybody has a credit of zero. 0.0 0.0 0.0 0 Q0 50% Shared Resource Q1 20% Q2 30% Multiplexer University of Toronto
BGPQ Algorithm • Case 1: all queues are busy • No residual bandwidth • Act as WFQ BGPQ Scheduler Step 1: calculate dynamic credit for each queue. 0.5 0.3 0.2 0 Q0 50% Shared Resource Q1 20% Q2 30% Multiplexer University of Toronto
BGPQ Algorithm • Case 1: all queues are busy • No residual bandwidth • Act as WFQ BGPQ Scheduler Step 2: turn on switch box and transfer data from granted queue. 0.5 0.3 0.2 0 Q0 50% Shared Resource Q1 20% Q2 30% Multiplexer University of Toronto
BGPQ Algorithm • Case 1: all queues are busy • No residual bandwidth • Act as WFQ BGPQ Scheduler Step 3: subtract 1 from the credit of granted queue. 0.3 0.2 0 -0.5 One Scheduling cycle is Done!! Sum of credits = 0! Q0 50% Shared Resource Q1 20% Q2 30% Multiplexer University of Toronto
BGPQ Algorithm • Case 2: some queues are empty • Has residual bandwidth • Prioritized service on residual bandwidth BGPQ Scheduler Before new scheduling cycle: Q1 is empty. 0.3 0.2 0 -0.5 Priority: Q0>Q1>Q2 Q0 50% Shared Resource Q1 20% Q2 30% Multiplexer University of Toronto
BGPQ Algorithm • Case 2: some queues are empty • Has residual bandwidth • Prioritized service on residual bandwidth BGPQ Scheduler Step 1: Calculate a dynamic credit for each queue. Credit of empty queue remain unchanged 0.6 0.0 0.2 0 Priority: Q0>Q1>Q2 Q0 50% Shared Resource Q1 20% Q2 30% Multiplexer University of Toronto
BGPQ Algorithm • Case 2: some queues are empty • Has residual bandwidth • Prioritized service on residual bandwidth BGPQ Scheduler Step 2: allocate residual bandwidth to non-empty queue with highest priority. 0.6 0.2 0.2 0 Priority: Q0>Q1>Q2 Q0 50% Shared Resource Q1 20% Q2 30% Multiplexer University of Toronto
BGPQ Algorithm • Case 2: some queues are empty • Has residual bandwidth • Prioritized service on residual bandwidth BGPQ Scheduler Step 3: transfer data from granted queue. 0.6 0.2 0.2 0 Priority: Q0>Q1>Q2 Q0 50% Shared Resource Q1 20% Q2 30% Multiplexer University of Toronto
BGPQ Algorithm • Case 2: some queues are empty • Has residual bandwidth • Prioritized service on residual bandwidth BGPQ Scheduler Step 4: subtract 1 from the credit of granted queue. 0.2 0.2 0 -0.4 Priority: Q0>Q1>Q2 One Scheduling cycle is Done!! Sum of credits = 0! Q0 50% Shared Resource Q1 20% Q2 30% Multiplexer University of Toronto
BGPQ Advantages • BGPQ = WFQ + PQ • bandwidth guarantee • prioritized access to residual bandwidth • Low implementation cost: • 3 adders for credit calculation • 1 comparator tree to find the highest dynamic credit University of Toronto
BGPQ Disadvantage • Low latency, low bandwidth requirement class: • No minimum latency guarantee • Minimum latency: • No need to wait for any request that has lower priority University of Toronto
Latency Problem of BGPQ • Example: • Optimal Scheduling: University of Toronto
Credit Borrow and Repay Mechanism • Borrow • Allow low latency requirement class to borrow the scheduling opportunity from other classes • Repay • Return the credit later when convenient University of Toronto
CBR Mechanism • Case 3: Credit Borrow and Repay • Maintain a debt queue for Q0: a borrowed ID FIFO CBR Scheduler 0.7 0.0 0.3 Step 1: calculate dynamic credit, and allocate the residual bandwidth 0 DebtQ Priority: Q0>Q1>Q2 Q0 10% Shared Resource Q1 20% Q2 70% Multiplexer University of Toronto
CBR Mechanism • Case 3: Credit Borrow and Repay • Maintain a debt queue for Q0 CBR Scheduler 0.7 0.0 0.3 Step 2: re-assign the scheduling opportunity to Q0. And record the borrowed ID. 0 DebtQ Priority: Q0>Q1>Q2 Q0 10% Shared Resource Q1 20% Q2 70% Multiplexer University of Toronto
CBR Mechanism • Case 3: Credit Borrow and Repay • Maintain a debt queue for Q0 CBR Scheduler 0.7 0.0 0.3 Step 3: transfer data 0 DebtQ Priority: Q0>Q1>Q2 Q0 10% Shared Resource Q1 20% Q2 70% Multiplexer University of Toronto
CBR Mechanism • Case 3: Credit Borrow • Maintain a debt queue for Q0 CBR Scheduler 0.0 0.3 Step 4: subtract 1 from original scheduled queue. 0 -0.3 DebtQ Priority: Q0>Q1>Q2 One Scheduling cycle is Done!! Sum of credits = 0! Q0 10% Shared Resource Q1 20% Q2 70% Multiplexer University of Toronto
CBR Mechanism • Case 4: Credit Repay • It is time to repay the credit CBR Scheduler 0.0 0.3 Initial state: Q0 is empty but has debt. It will ‘appear’ to be non-empty 0 -0.3 DebtQ Priority: Q0>Q1>Q2 Q0 10% Shared Resource Q1 20% Q2 70% Multiplexer University of Toronto
CBR Mechanism • Case 4: Credit Repay • It is time to repay the credit CBR Scheduler 0.6 0.0 0.4 Step 1: calculate dynamic credits and allocate the residual bandwidth. 0 DebtQ Priority: Q0>Q1>Q2 Q0 10% Shared Resource Q1 20% Q2 70% Multiplexer University of Toronto
CBR Mechanism • Case 4: Credit Repay • It is time to repay the credit CBR Scheduler 0.6 0.0 0.4 Step 2: return the scheduling opportunity and clear the DebtQ. 0 DebtQ Priority: Q0>Q1>Q2 Q0 10% Shared Resource Q1 20% Q2 70% Multiplexer University of Toronto
CBR Mechanism • Case 4: Credit Repay • It is time to repay the credit CBR Scheduler 0.6 0.0 0.4 Step 3: transfer data. 0 DebtQ Priority: Q0>Q1>Q2 Q0 10% Shared Resource Q1 20% Q2 70% Multiplexer University of Toronto
CBR Mechanism • Case 4: Credit Repay • It is time to repay the credit CBR Scheduler 0.0 0.4 Step 4: subtract 1 from scheduled queue. 0 -0.4 DebtQ Priority: Q0>Q1>Q2 One Scheduling cycle is Done!! Sum of credits = 0! Q0 10% Shared Resource Q1 20% Q2 70% Multiplexer University of Toronto
CBR Mechanism • Minimum Latency Guarantee using CBR • No need to wait for requests in other queues • Worst case: Q0 is not empty while DebtQ is full • No minimum latency guarantee under such case University of Toronto
Implementation in FPGA • CBR MPMC top level diagram • Instantiation-time configurable port number • Run-time programmable priority and bandwidth University of Toronto
Implementation in FPGA Credit calculation circuit Sorting Network and CBR University of Toronto
Implementation Cost • 8 port CBR-MPMC with 16-depth DebtQ • Xilinx Virtex-5 XC5VLX50T • Speedy DDR backend memory controller University of Toronto
Evaluation • Simulation Framework • Cycle accurate C model of MPMC • Simple close-page DDR memory model • Trace capturing and converting method University of Toronto
Evaluation • CPU workload trace file (from B. Jacob) • Cache simulation on standard SPEC2000 integer benchmark Irregular and low bandwidth requirement: 0.4 memory transactions per 1k instructions. University of Toronto
Evaluation • Accelerator Workload • ALPBench suite of parallel multimedia applications University of Toronto
Evaluation • Accelerator Workload • ALPBench suite of parallel multimedia applications Periodically repeated access pattern, high bandwidth requirement: 18.3 memory transactions per 1k instructions. University of Toronto
Results • BGPQ Scheduler • Latency: number of clock cycles • Bandwidth: number of memory transaction per 1k clock cycles University of Toronto
Results • CBR Scheduler with a 16-depth debtQ University of Toronto
Impact of DebtQ Size • Repay conditions: • DebtQ is full • Q0 is empty CBR Scheduler 0.6 0.0 0.4 When DebtQ is full, remaining requests in Q0 will not be served with minimum latency guarantee! 0 DebtQ Priority: Q0>Q1>Q2 Q0 10% Shared Resource Q1 20% Q2 70% Multiplexer University of Toronto
Impact of DebtQ Size • How big is enough for DebtQ? • Determined by instant time bandwidth requirement • Irregular access pattern means: • Large range of DebtQ size requirement • Tradeoff • Resource efficiency VS performance University of Toronto
Results • Impact of debt queue size University of Toronto
Conclusions • CBR scheduler can provide minimum bandwidth and latency guarantees • Low implementation cost, power consumption • We expect its successful use in a wide range of multimedia applications University of Toronto
Questions? CBR Scheduler 0.0 0.3 0 -0.3 DebtQ Priority: Q0>Q1>Q2 Q0 10% Shared Resource Q1 20% Q2 70% Multiplexer University of Toronto