1 / 33

Age-Based Scheduling for Asymmetric Multiprocessors

Evaluating the potential of age-based scheduling in improving performance on asymmetric multiprocessors and overcoming challenges in thread scheduling.

blubaugh
Download Presentation

Age-Based Scheduling for Asymmetric Multiprocessors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Age Based Scheduling for Asymmetric MultiprocessorsNagesh B Lakshminarayana, Jaekyu Lee & Hyesoon Kim

  2. Outline Background and Motivation Age Based Scheduling Evaluation Conclusion 2

  3. Heterogeneous Architectures where all cores have same ISA but different performance PEA PEB PEB PEB PEB Asymmetric (Chip) Multiprocessors Heterogeneous Architecture

  4. Asymmetric (Chip) Multiprocessors Potential for better performance than SMPs occupying same area and consuming same power Core0 Core1 Core0 Core2 Core3 Core1 Core2 Core3 Symmetric Chip Multiprocessor (SMP/CMP) Asymmetric Chip Multiprocessor (AMP/ACMP) 4

  5. AMPs present new challenges • Thread Scheduling is one among them 5

  6. Scheduling in Multiprocessor OSes • Thread Assignment • assign to least loaded core • Load Balancing • make load on all cores uniform • Idle Balancing • move threads from busy cores to idle core

  7. Scheduling in Multiprocessor OSes • Assume that all cores are identical • Results in bad performance and application instability Parsec benchmarks on a (real) AMP using the Linux Scheduler

  8. Problem with current Scheduling Not taking advantage of fast core

  9. Outline • Background and Motivation • Age Based Scheduling (ABS) • Evaluation • Conclusion

  10. main thread fork barrier … barrier barrier barrier join Motivation for Age Based Scheduling • Many compute-intensive multithreaded applications follow fork-join model • Milestones (barriers) in thread execution … … … … Application Model

  11. Symmetry of Applications • Threads created together are symmetric • Based on instruction count • Degree of Symmetry = Std Dev / Average Degree of Symmetry of Parsec Benchmarks (Symmetric benchmarks are benchmarks with degree of symmetry <= 0.1)

  12. Insight • Difficult to predict absolute execution duration, so predict relative execution duration barrier T4 T1 T2 T3 exe_dur (T1) = exe_dur (T2) = exe_dur (T3) = exe_dur (T4) execution duration = ? barrier 12

  13. Putting together • Applications follow fork-join model with milestones in between • Many applications are symmetric • Easy to predict relative execution duration to next milestone Age Based Scheduling 13

  14. What is Age? Age is the progress made by a thread towards its next milestone 14

  15. execution Age Calculation • Threads created together have the same age • As a thread executes, it ages • Reset age when milestone crossed tA = X tA = 0 tA = 30 tA = 0 creation milestone (barrier) milestone (termination) tB = 0 tB = 50 tB = 0 tA – age of thread A tB – age of thread B X – Unknown, assumed to be a large value

  16. Age Based Scheduling Algorithm To make a Scheduling decision: • Calculate remaining execution durationto next milestone based on age • Assign threads with longerremaining execution durations to fast core – Longest Job to Fast Core First (LJFCF)

  17. Application of LJFCF • Apply whenever • Thread is created • A core becomes idle • Reassignment timer expires (for load balancing) 17

  18. execution Working of the Algorithm tA = 0 tA = 30 Age at barrier = X T1 creation milestone (barrier) milestone (termination) rem_exe = (X – 30) 18

  19. execution Remaining Execution Duration (I) • Track progress of threads • Using Prediction [AGE] • Predict all threads have same inter-milestone distance tA = X tA = 0 tA = 0 tA = X milestone (termination) creation milestone (barrier) tB = 0 tB = X tA – age of thread A tB – age of thread B

  20. execution Remaining Execution Duration (II) • Using Profiling [AGE(PROF)] • threads have different inter-milestone distances calculated based on a metric obtained by profiling tA = X tA = 0 tA = X tA = 0 creation milestone (barrier) milestone (termination) tB = 0 tB = rX tA – age of thread A tB – age of thread B r is from profiler Only one r value for each thread

  21. Working of the Algorithm C A A C D B fast slow slow slow rem_exeA = 50 rem_exeB = 70 rem_exeC = 90 rem_exeD = 30 rem_exeC = 90 rem_exeA = 50 21

  22. Benefit of Age Based Scheduling • Asymmetry aware • Utilizes all cores • Gives all threads opportunities to run on fast cores

  23. Implementation • OS • Track progress using Performance Counters • Disable counter on Interrupts • Compiler (AGE[PROF]) • Passing profiled information • one value for each thread

  24. Outline • Background and Motivation • Age Based Scheduling • Evaluation • Conclusion

  25. Evaluation • Simulation based experiments • Trace + execution hybrid simulator • Lock, barriers are modeled • Context switch and migration overhead simulated • 10 ms time slice for each thread • Machine configuration • 1 fast, 7 slow, 8:1 speed ratio (others are in the paper) • Benchmarks • Symmetric • Parsec (simmedium input) • Asymmetric • Splash-2 • OMPSCR • SuperLU

  26. Comparisons with Other Policies

  27. LJFCF vs Other Policies (I) • Parsec Baseline: SCALEDLD * - Default Linux Policy which performs considerable worse than other policies is not shown

  28. LJFCF vs Other Policies (II) • Asymmetric Benchmarks Baseline: SCALEDLD 28

  29. Idle Cycles • Linux Scheduler – Most of the idle cycles contributed by fast core • SCALEDLD – keeps same thread(s) on fast core • AGE – assigns different threads to fast core

  30. Different AMP Configurations X/1 : Ratio of speeds of Fast and Slow cores is X:1 • Need for asymmetry aware scheduling increases as cores become more asymmetric • AGE based policies show more improvement over SCALEDLD as asymmetry increases

  31. Outline • Background and Motivation • Age Based Scheduling • Evaluation • Conclusion

  32. Conclusion • Age based scheduling (ABS) for Asymmetric Multiprocessors • ABS assumes threads created at the same time are symmetric • ABS assigns threads to cores based on their predicted remaining execution durations • Predictions are made based on Age of threads • Improvement of 10.4% (Pred) and 13.2% (Prof) for Parsec and 7.6% (Pred) and 9.4% (Prof) for Asymmetric benchmarks over Li’s mechanism

  33. THANK YOU

More Related