1 / 16

Analysis of QuasiStatic Scheduling Techniques in a Virtualized Reconfigurable Machine

Analysis of QuasiStatic Scheduling Techniques in a Virtualized Reconfigurable Machine. Yury Markovskiy, Eylon Caspi, Randy Huang, Joseph Yeh, Michael Chu, John Wawrzynek UC Berkeley BRASS Group Andr é DeHon California Institute of Technology. Outline. Hardware Virtualization SCORE model

tstacey
Download Presentation

Analysis of QuasiStatic Scheduling Techniques in a Virtualized Reconfigurable Machine

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Analysis of QuasiStaticScheduling Techniques in aVirtualized Reconfigurable Machine Yury Markovskiy, Eylon Caspi, Randy Huang, Joseph Yeh, Michael Chu, John Wawrzynek UC Berkeley BRASS Group André DeHon California Institute of Technology

  2. Outline • Hardware Virtualization • SCORE model • Run-time scheduler • Fully Dynamic • Quasi-Static • Results • 7x reduction in scheduling overhead • App performance improved by a factor of 2-7. • Conclusion FPGA 2002

  3. Hardware Virtualization • Traditional Mapping Tools • Expose resource constraints to designer • HW virtualization enables: • App compatibility/longevity across a device family • Automatic performance scaling on larger devices FPGA 2002

  4. Programming Model • Streaming dataflow graph of operators(FSM + datapath) • Dynamic data-dependent behavior • Arbitrary size operators • Run-time representation • Graph of fixed size compute pages • Akin to virtual memory pages • Run-time scheduling is required to handle dynamic page behavior Stream Computation Organized for Reconfigurable Execution (SCORE) (1) • Data-flow based framework • Programming Model • Execution Environment • Hardware Platform FPGA 2002

  5. Hardware Platform • uP/Reconfigurable array hybrid • Array: compute pages(CP) and configurable memory blocks (CMB) • Stream interface between resources • Global Controller manages reconfiguration • Scheduler Operation • Temporal Partitioning • Buffer intermediate results • Resource Allocation/Mapping • Compute pages • Memory segments • Communication channels Stream Computation Organized for Reconfigurable Execution (SCORE) (2) • Array Reconfiguration FPGA 2002

  6. Run-time Scheduler • Run-time scheduling (late binding of resources) • Benefit: automatic performance scaling • Extra burden: scheduler • Complex optimization with multiple simultaneous constraints(CPs, CMBs, and network)  NP-hard problem • Space of scheduling solutions • Range in quality and complexity • Tradeoffs: timeslice vs asynchronous or dynamic vs static • What is the right timeslice size? • Depends on an application’s run-time behavior • Affected by the scheduler overhead (lower bound) FPGA 2002

  7. Problem Statement • SCORE Micro-architecture • Parallel reconfiguration of independent CPs/CMBs • Reconfiguration time is thousands of cycles • Problem • Investigate scheduling cost • Reduce it to a minimum (comparable to reconfiguration time) • Understand its effect on application run-times. FPGA 2002

  8. Version of priority-list scheduling • Availability of input tokens and output space determines the priority • Candidates are chosen by BFS • Fixed timeslice size • Large critical loop Initial Scheduling Solution • Fully Dynamic Scheduler • Perform scheduling operation each timeslice FPGA 2002

  9. Fully Dynamic Scheduler (1) • Two types of overhead: • Scheduler (avg. 124 Kcycles) • Reconfiguration [array global controller] (avg. 3.5 Kcycles) • Average overhead per timeslice > 127 Kcycles FPGA 2002

  10. Fully Dynamic Scheduler (2) • Total Execution Time • Scheduler Overhead is on average 36% of execution time • Timeslice Size = 250Kcycles. FPGA 2002

  11. Pre-compute Schedule from • Graph topology • Back annotations (I/O rates) • Generate script of configuration commands. Static • Small Run-time Critical Loop: • Query Array • Issue Script Commands Quasi Quasi-Static Scheduler • Timeslice size • Dynamically controlled by array hardware stall detect. • Hardware continuously (or at small intervals) monitors array activity. FPGA 2002

  12. Results (1) • A low overhead scheduling solution • Scheduler overhead (avg. 14Kcycles) • Reconfiguration (avg. 4Kcycles) • 7x average reduction in overhead FPGA 2002

  13. Results (2) • 4.5x average application speedup • Reduction in overhead AND • Improvement in scheduling quality FPGA 2002

  14. Results Summary • Tested applications: • Image de/compression – consist of both dynamic and static rate operators. • All demonstrate similar speedups under Quasi-Static scheduler. • Performance improvements can be attributed to: • Reduced scheduler overhead • Improved scheduling quality: • Global rather than local (BFS) view as in dynamic scheduler • Reduction of the lower bound of timeslice size • Expands the space of apps well suited for execution under a virtualized hardware • Retained powerful semantics of dynamic data-dependent dataflow FPGA 2002

  15. Conclusion • Run-time scheduler • Required for automatic scaling under hardware virtualization • Run-time overhead sets lower bound on the size of scheduling step (response time): • Restricting applicability of virtualized hardware • Makes this model impractical for some apps • Low overhead run-time scheduling is achievable: • Without semantic restrictions • With higher (or comparable) scheduling quality. • 7x reduction in overhead and simultaneous • Performance improvement of 2-7x. • OS is a viable alternative to manual scheduling. FPGA 2002

  16. Thanks to: DARPA, Xilinx and STMicro For more information http://brass.cs.berkeley.edu/SCORE Thank You FPGA 2002

More Related