1 / 23

Cache Utilization-Aware Scheduling for Multicore Processors

Cache Utilization-Aware Scheduling for Multicore Processors . Presenter: Chi-Wei Fang YunTech University, Taiwan Authors: Edward T.-H. Chu, Wen-wei Lu . 2012 IEEE Asia Pacific Conference on Circuits and Systems. 1. 1. Outline. Introduction Contribution CUAS Experiment Conclusion.

steffi
Download Presentation

Cache Utilization-Aware Scheduling for Multicore Processors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cache Utilization-Aware Scheduling for Multicore Processors Presenter: Chi-Wei Fang YunTech University, Taiwan Authors: Edward T.-H. Chu, Wen-wei Lu 2012 IEEE Asia Pacific Conference on Circuits and Systems 1 1

  2. Outline Introduction Contribution CUAS Experiment Conclusion

  3. Introduction Due to the limitation of semiconductor process processor speed is not expected to have a significant raise In order to further improve the capability of processor, chip multiprocessor (CMP) has become widespread in today’s computer systems

  4. Introduction Intel® Core™2 Quad Processor Q8400 architecture In most multicore processors, the last level cache is shared among cores to reduce possible resource underutilization. As figure shows, in the Intel® Core™2 Quad Processor,  L2 caches are shared among cores

  5. Introduction • When the tasks running on different cores read and write shared cache intensively, excessive cache miss may occur and result in performance degradation • Reduce the shared cache contention of multicore systems becomes an important design issue • J.Mars designed cipe[1] to classifies the tasks according to its abilities of anti-interference • Which is defined as the performance lose when the task competes shared cache with other tasks

  6. Introduction If tasks have similar anti-interference abilities, it becomes difficult for the methods to generate a proper task assignment In addition, how a task interferences co-scheduled task depends on how aggressively the task accesses cache A task with little anti-interference ability may or may not seriously interferences co-scheduled applications.

  7. Motivation • The optimal algorithm exhaustively searches all possible task assignments and selects the one with the smallest total execution time • Because of the gap between existing methods and the optimal algorithm, there is an apparent need to designa task scheduling policy to reduce shared cache contention and improve system performance

  8. Outline Introduction Contribution CUAS Experiment Conclusion

  9. Contribution Cache utilization aware scheduling (CUAS) • Goal:maximize the difference of unhealthy level of each core that shares the same cache while balancing workload among cores • We define the unhealthy level of a core as the sum of unhealthy scores of tasks running on this core • CUAS includes two parts • Application classifier • Task scheduler  CUAS can reduce cache contention

  10. Outline Introduction Contribution CUAS Experiment Conclusion

  11. CUAS classification We designed two micro-benchmarks to measure the anti-interference and interference ability of a task • Attack (ATT) • Strong interference ability • Randomly and intensively pollutes all cache lines • Defend (DEF) • Strong anti-interference ability • Sequentially read and writeeach cache line

  12. CUAS classification • Based on the results of co-scheduling with ATT and DEF, we grade each task’s anti-interference and interference ability ATT DEF Task Task L2 cache L2 cache Anti-interference ability Interference ability

  13. CUAS classification There are three formula to calculate the unhealthy score of task (1) (1) (1) (1) (1) (1) (1) (1) (1) (1) A’d is the execution time of DEF when it is co-scheduled with the task Ad is the execution time of DEF execute solely (2) (2) (2) (2) (2) (2) (2) A’i is the execution time of ATT when it is co-scheduled with task Ai is the execution time of task execute solely A’i is the execution time of ATT when it is co-scheduled with task Ai is the execution time of task execute solely A’i is the execution time of ATT when it is co-scheduled with task Ai is the execution time of task execute solely A’i is the execution time of ATT when it is co-scheduled with task Ai is the execution time of task execute solely A’i is the execution time of ATT when it is co-scheduled with task Ai is the execution time of task execute solely (3) (3) (3) (3) The unhealthy score of a task is the sum of its I and AI. A task with higher unhealthy scores will have more negative impact on system performance

  14. The goal of CUAS scheduler Maximize the unhealthy scores gap between the cores that share the same cache Balance the workload  among cores

  15. CUAS steps Calculate the number of tasks for each core  We first assign the a‘th largest unhealth score tasks to the core0 of the first cache In order to avoid the unhealth tasks effect each other,we assign the next a’th largest unhealth core tasks to the core0 of another cache We assign the tasks from cache n to cache 1 in the next turn

  16. CUAS scheduling Core 0 Core 1 Core 0 Core 1 1 2 3 4 6 5 L2 cache L2 cache Cache 1 Cache 2 Classify result Scheduling by classification result

  17. Outline • Introduction • Contribution • CUAS • Experiment • Conclusion

  18. Experiment • We adopted Intel Core2 Quad Q8400 CPU for our experiment • Four cores are arranged into two groups of two cores and each group shares a 2MB L2 cache • Adopted SPEC CPU2006 benchmark for evaluation

  19. Experiment The classify result of CUAS The reduction of total execution time at most46%

  20. Outline • Introduction • Contribution • CUAS • Experiment • Conclusion

  21. Conclusion • In this work, we design a novel task scheduling, called CUAS,  to reduce shared cache contention based on two indexes, intra-core cache contention and task interference ability, that primarily determine the utilization of shared cached • CUAS first classifies tasks according to their anti-interference ability and interference ability. • CUAS then distributes tasks to cores based on the effect of inter-core and intra-core cache contention

  22. Conclusion • Our experiment results shows that CUAS can significantly reduce shared cache contention and reduce total execution time at most 46% compared to existing methods

  23. Thanks for attention Embedded Operating System Lab at Yuntech University http://eos.yuntech.edu.tw/eoslab/ Supported by NSC 100-2219-E-224-001

More Related