Analysis of QuasiStatic Scheduling Techniques in a Virtualized Reconfigurable Machine

Analysis of QuasiStaticScheduling Techniques in aVirtualized Reconfigurable Machine Yury Markovskiy, Eylon Caspi, Randy Huang, Joseph Yeh, Michael Chu, John Wawrzynek UC Berkeley BRASS Group André DeHon California Institute of Technology

Outline • Hardware Virtualization • SCORE model • Run-time scheduler • Fully Dynamic • Quasi-Static • Results • 7x reduction in scheduling overhead • App performance improved by a factor of 2-7. • Conclusion FPGA 2002

Hardware Virtualization • Traditional Mapping Tools • Expose resource constraints to designer • HW virtualization enables: • App compatibility/longevity across a device family • Automatic performance scaling on larger devices FPGA 2002

Programming Model • Streaming dataflow graph of operators(FSM + datapath) • Dynamic data-dependent behavior • Arbitrary size operators • Run-time representation • Graph of fixed size compute pages • Akin to virtual memory pages • Run-time scheduling is required to handle dynamic page behavior Stream Computation Organized for Reconfigurable Execution (SCORE) (1) • Data-flow based framework • Programming Model • Execution Environment • Hardware Platform FPGA 2002

Hardware Platform • uP/Reconfigurable array hybrid • Array: compute pages(CP) and configurable memory blocks (CMB) • Stream interface between resources • Global Controller manages reconfiguration • Scheduler Operation • Temporal Partitioning • Buffer intermediate results • Resource Allocation/Mapping • Compute pages • Memory segments • Communication channels Stream Computation Organized for Reconfigurable Execution (SCORE) (2) • Array Reconfiguration FPGA 2002

Run-time Scheduler • Run-time scheduling (late binding of resources) • Benefit: automatic performance scaling • Extra burden: scheduler • Complex optimization with multiple simultaneous constraints(CPs, CMBs, and network)  NP-hard problem • Space of scheduling solutions • Range in quality and complexity • Tradeoffs: timeslice vs asynchronous or dynamic vs static • What is the right timeslice size? • Depends on an application’s run-time behavior • Affected by the scheduler overhead (lower bound) FPGA 2002

Problem Statement • SCORE Micro-architecture • Parallel reconfiguration of independent CPs/CMBs • Reconfiguration time is thousands of cycles • Problem • Investigate scheduling cost • Reduce it to a minimum (comparable to reconfiguration time) • Understand its effect on application run-times. FPGA 2002

Version of priority-list scheduling • Availability of input tokens and output space determines the priority • Candidates are chosen by BFS • Fixed timeslice size • Large critical loop Initial Scheduling Solution • Fully Dynamic Scheduler • Perform scheduling operation each timeslice FPGA 2002

Fully Dynamic Scheduler (1) • Two types of overhead: • Scheduler (avg. 124 Kcycles) • Reconfiguration [array global controller] (avg. 3.5 Kcycles) • Average overhead per timeslice > 127 Kcycles FPGA 2002

Fully Dynamic Scheduler (2) • Total Execution Time • Scheduler Overhead is on average 36% of execution time • Timeslice Size = 250Kcycles. FPGA 2002

Pre-compute Schedule from • Graph topology • Back annotations (I/O rates) • Generate script of configuration commands. Static • Small Run-time Critical Loop: • Query Array • Issue Script Commands Quasi Quasi-Static Scheduler • Timeslice size • Dynamically controlled by array hardware stall detect. • Hardware continuously (or at small intervals) monitors array activity. FPGA 2002

Results (1) • A low overhead scheduling solution • Scheduler overhead (avg. 14Kcycles) • Reconfiguration (avg. 4Kcycles) • 7x average reduction in overhead FPGA 2002

Results (2) • 4.5x average application speedup • Reduction in overhead AND • Improvement in scheduling quality FPGA 2002

Results Summary • Tested applications: • Image de/compression – consist of both dynamic and static rate operators. • All demonstrate similar speedups under Quasi-Static scheduler. • Performance improvements can be attributed to: • Reduced scheduler overhead • Improved scheduling quality: • Global rather than local (BFS) view as in dynamic scheduler • Reduction of the lower bound of timeslice size • Expands the space of apps well suited for execution under a virtualized hardware • Retained powerful semantics of dynamic data-dependent dataflow FPGA 2002

Conclusion • Run-time scheduler • Required for automatic scaling under hardware virtualization • Run-time overhead sets lower bound on the size of scheduling step (response time): • Restricting applicability of virtualized hardware • Makes this model impractical for some apps • Low overhead run-time scheduling is achievable: • Without semantic restrictions • With higher (or comparable) scheduling quality. • 7x reduction in overhead and simultaneous • Performance improvement of 2-7x. • OS is a viable alternative to manual scheduling. FPGA 2002

Thanks to: DARPA, Xilinx and STMicro For more information http://brass.cs.berkeley.edu/SCORE Thank You FPGA 2002

Analysis of QuasiStatic Scheduling Techniques in a Virtualized Reconfigurable Machine

Analysis of QuasiStatic Scheduling Techniques in a Virtualized Reconfigurable Machine

Presentation Transcript

IE 573 Theory of Machine Scheduling

Securing a Virtualized Environment

ABC in Techniques of Stylistics Analysis

Static Scheduling Techniques

IE 573 THEORY of MACHINE SCHEDULING

Techniques for truthful scheduling

Machine Scheduling Price of Anarchy

Reconfigurable Inspection Machine (RIM)

VGRIS: Virtualized GPU Resource Isolation and Scheduling in Cloud Gaming

NETWORK SCHEDULING TECHNIQUES

Analysis of : Operator Scheduling in a Data Stream Manager

Parallel Machine Scheduling

Analysis of cooperation in multi-organization Scheduling

Software Licensing in a Virtualized Environment

IRRIGATION SCHEDULING AND TECHNIQUES IN GRAPES

Explain the types of scheduling techniques

Software Licensing in a Virtualized Environment

IRRIGATION SCHEDULING AND TECHNIQUES IN POTATO

Software Testing in a Virtualized World

Reconfigurable Inspection Machine (RIM)

Static Scheduling Techniques

Lecture 8: Machine learning techniques in sequence analysis Introduction Methods