280 likes | 315 Views
Dynamic Scheduling and Dynamic Percolation. Elkin Garcia CAPSL – UD Based on a presentation made by Rishi Khan ET International. Motivation. Static Scheduling is not eable to achieve the maximum performance on a many-core Archtecture (C64)
E N D
Dynamic Scheduling and Dynamic Percolation Elkin Garcia CAPSL – UD Based on a presentation made by Rishi Khan ET International
Motivation • Static Scheduling is not eable to achieve the maximum performance on a many-core Archtecture (C64) • EVEN FOR REGULAR APPLICATIONS LIKE MATRIX MULTIPLICATION
Issues of Static Scheduling • Blocks are not necessarily multiples of the Optimal Tile Size (OTS) • Extra overhead for processing non-optimal sized tiles. • It is worst when many processors share a small fixed amount of on-chip memory
Issues of Static Scheduling (2) • SS does not consider stalls due to arbitration of shared resources. • SS assumes that TUs doing the same amount of work will complete at the same time. • Many-core architectures have plenty of shared resources: • FPUs, crossbar, memory, and I-Caches that can produce unexpected stalls.
Dynamic Scheduling (DS) Balances optimally in presence of shared resources with higher efficiency. • Partition of matrix C only in tiles. • Use of atomic operations for low overhead. X = A B C
Dynamic Percolation (DP) • Assign tasks (codelets) to TU at runtime with low overhead using a lock-free queue: • Computation tasks: Compute optimum tiles of 6x6. • Data movement tasks: Move inputs and outputs between SRAM and DRAM using double buffering.
Codelets • A nonpreemptive set of code that can run to completion once it’s dependencies/Events are met. • Dependencies/Events: • Data Dependencies • Resource Constrains (Threads/BW) • Desired Behavior (Power)
Matrix Multiply in SRAM using Petri nets Size: 192x192 TOKENS PLACE T INIT Clean 1024 Comp1 Low TRANSITION X = A B C
Matrix Multiply in DRAM T T T INIT Clean INIT Clean Done 1024 8 Comp1 Copy1 F Low
Double Buffer Computation Example Start T T T INIT Clean INIT Clean Done 1024 8 Comp1 Copy1 F Low T T T INIT Clean INIT Clean Done 8 1024 ΔT Comp2 Copy2 F Low Rules: Always take highest priority task first If two tasks have the same priority, take the task that was enabled first Otherwise, choose arbitrarily
Double Buffer Computation Example Start T T T INIT Clean INIT Clean Done 1024 8 Comp1 1 Copy1 F Low 1 T T T INIT Clean INIT Clean Done 8 1024 ΔT Comp2 Copy2 F Low High Init Set 1 ΔT Low Rules: Always take highest priority task first If two tasks have the same priority, take the task that was enabled first Otherwise, choose arbitrarily
Double Buffer Computation Example Start T T T INIT Clean INIT Clean Done 1024 8 Comp1 1024 Copy1 F Low T T T INIT Clean INIT Clean Done 8 1024 ΔT Comp2 1 Copy2 F Low High Init Set 2 Low Comp1(1024) Rules: Always take highest priority task first If two tasks have the same priority, take the task that was enabled first Otherwise, choose arbitrarily
Double Buffer Computation Example Start T T T INIT Clean INIT Clean Done 1024 8 Comp1 1020 Copy1 F Low T T T INIT Clean INIT Clean Done 8 1024 ΔT Comp2 1024 Copy2 F Low High Low Comp1(1020) Comp2(1024) Rules: Always take highest priority task first If two tasks have the same priority, take the task that was enabled first Otherwise, choose arbitrarily
Double Buffer Computation Example Start T T T INIT Clean INIT Clean Done 1024 8 Comp1 Copy1 F Low T T T INIT Clean INIT Clean Done 8 1024 ΔT Comp2 1024 Copy2 F Low High Clean Low Comp2(1024) Rules: Always take highest priority task first If two tasks have the same priority, take the task that was enabled first Otherwise, choose arbitrarily
Double Buffer Computation Example Start T T T INIT Clean INIT Clean Done 1024 8 Comp1 Copy1 1 F Low T T T INIT Clean INIT Clean Done 8 1024 ΔT Comp2 1020 Copy2 F Low High Init Copy Set Low Comp2(1020) Rules: Always take highest priority task first If two tasks have the same priority, take the task that was enabled first Otherwise, choose arbitrarily
Double Buffer Computation Example Start T T T INIT Clean INIT Clean Done 1024 8 Comp1 8 Copy1 F Low T T T INIT Clean INIT Clean Done 8 1024 ΔT Comp2 1000 Copy2 F Low High Copy (8) Low Comp2(1000) Rules: Always take highest priority task first If two tasks have the same priority, take the task that was enabled first Otherwise, choose arbitrarily
Double Buffer Computation Example Start T T T INIT Clean INIT Clean Done 1024 8 Comp1 Copy1 F Low T T T INIT Clean INIT Clean Done 8 1024 ΔT Comp2 500 Copy2 F Low High Clean Low Comp2(500) Rules: Always take highest priority task first If two tasks have the same priority, take the task that was enabled first Otherwise, choose arbitrarily
Double Buffer Computation Example Start T T T INIT Clean INIT Clean Done 1024 8 Comp1 Copy1 1 F Low T T T INIT Clean INIT Clean Done 8 1024 ΔT Comp2 495 Copy2 F Low High Done Low Comp2(495) Rules: Always take highest priority task first If two tasks have the same priority, take the task that was enabled first Otherwise, choose arbitrarily
Double Buffer Computation Example Start T T T INIT Clean INIT Clean Done 1024 8 Comp1 1 Copy1 F Low T T T INIT Clean INIT Clean Done 8 1024 ΔT Comp2 490 Copy2 F Low High Init Set 1 Low Comp2(490) Rules: Always take highest priority task first If two tasks have the same priority, take the task that was enabled first Otherwise, choose arbitrarily
Double Buffer Computation Example Start T T T INIT Clean INIT Clean Done 1024 8 Comp1 1024 Copy1 F Low T T T INIT Clean INIT Clean Done 8 1024 ΔT Comp2 480 Copy2 F Low High Low Comp1(1024) Comp2(480) Rules: Always take highest priority task first If two tasks have the same priority, take the task that was enabled first Otherwise, choose arbitrarily
Double Buffer Computation Example Start T T T INIT Clean INIT Clean Done 1024 8 Comp1 1024 Copy1 F Low T T T INIT Clean INIT Clean Done 8 1024 ΔT Comp2 Copy2 F Low High Clean Low Comp1(1024) Rules: Always take highest priority task first If two tasks have the same priority, take the task that was enabled first Otherwise, choose arbitrarily
Double Buffer Computation Example Start T T T INIT Clean INIT Clean Done 1024 8 Comp1 1020 Copy1 F Low T T T INIT Clean INIT Clean Done 8 1024 ΔT Comp2 Copy2 1 F Low High Init Copy Set 2 Low Comp1(1020) Rules: Always take highest priority task first If two tasks have the same priority, take the task that was enabled first Otherwise, choose arbitrarily
The Cool Demo on MxM • Due to Rishi Khan (ETI) UHPC-Portland-Meeting-06-2009
Summary • Static Optimizations increase performance substantially. • Dynamic Scheduling and Dynamic Percolation mitigates the unpredictable effects of resource sharing. • Optimizations implemented are also power efficient.
Acknowledgements • Professor Guang Gao • ETI and CAPSL people that have help on this project (Rishi Khan, Daniel Orozco, Kelly Livingston, Ioannis Venetis) • Members of CAPSL Tutorial Project Part 2