300 likes | 384 Views
Asymmetry Aware Scheduling Algorithms for Asymmetric Processors. Nagesh Lakshminarayana Sushma Rao Hyesoon Kim Computer Science Georgia Institute of Technology. Outline. Background and Problem Application characteristics on AMP/SMP LJFPF Policy CJFPF Policy Conclusion. PE B. PE B.
E N D
Asymmetry Aware Scheduling Algorithms for Asymmetric Processors Nagesh Lakshminarayana Sushma Rao Hyesoon Kim Computer Science Georgia Institute of Technology
Outline • Background and Problem • Application characteristics on AMP/SMP • LJFPF Policy • CJFPF Policy • Conclusion
PEB PEB PEA Interconnect PEB PEB Heterogeneous Architectures • A particularly interesting class of parallel machines is Heterogeneous Architecture: • Multiple types of Processing Elements (PEs) available on the same machine
Special accelerator Multicore CPU + GPU IBM Cell processor Heterogeneous Architectures • Heterogeneous architectures are becoming very common: Focus of this talk Fast core Fast core Slow core Slow core Slow core Slow core Asymmetric Processors
Scheduling Problem: Multiple applications Non-scalable applications Fast core Fast core Slow core Slow core Slow Core Slow core Slow core Scalable applications Fast Core
Scheduling Problem: Multi-threaded application Fast core Fast core Slow core Slow core Slow core Slow core
Problem How to schedule multi-threaded applications on Asymmetric Multiprocessors (AMP)?
Outline • Background and Problem • Application characteristics on AMP/SMP • LJFPF Policy • CJFPF Policy • Conclusion
Experimental Methodology • Use a 1.87GHz two-socket Quad-core machine to measure the performance • Use SpeedStep technology to emulate an AMP
Slow-Limited Applications Fast core Fast core Slow core Slow core Slow core Slow core barrier
Middle-perf Benchmarks Similar to a slow-limited benchmark but sequential section is much longer barrier
Unstable Benchmarks barrier barrier Asymmetric workloads Lots of barriers
Outline • Background and Problem • Applications on AMP/SMP • LJFPF Policy • CJFPF Policy • Conclusion
LJFPF Policy • Longest Job to a Fast Processor First Slow core Fast core Slow core Fast core barrier
How Does the Scheduler Know • Length of work? • Current mechanism: application sends the information • On-going work: Prediction mechanism
Evaluation • Matrix Multiplication Sequential version Parallel version Symmetric workload Parallel version Asymmetric workload
Real Application • ITK (Medical image processing tool kit) • Open source but a real application
Evaluation: MultiRegistration • Kernel loop has 50 iterations 50 % 8 ≠0 • Divide 50 iterations into 7, 7, 7, 7, 6, 6, 5, 5
Outline • Background and Problem • Application characteristics on AMP/SMP • LJFPF Policy • CJFPF Policy • Conclusion
Critical Section Lock Lock
Critical Section Limited Workloads Case (a) Case (b) Critical section Useful work waiting
Critical Section Effects Half-half performs similar to all-fast
CJFPF Policy • Critical Job to a Fast Processor First Policy Fast core Slow core Slow core Slow core
CJFPF Results Longer critical section The benefit of the CJFPF policy decreases
Conclusion • We evaluated the characteristics of multi-threaded applications on AMPs. • Barriers and critical sections are important factors. • Propose two new scheduling policies: Longest job to fast core first (LJFPF), critical job to fast core first (CJFPF) • Scheduling polices improve performance for asymmetric workloads. • Future work • Develop a prediction mechanism • Evaluate symmetric workloads on AMPs • Other kinds of heterogeneous architectures