1 / 24

Process Scheduling in Multiprocessor and Multithreaded Systems

Process Scheduling in Multiprocessor and Multithreaded Systems. Matt Davis CS535 4/7/2003. Outline. Multiprocessor Systems Issues in MP Scheduling How to Allocate Processors Cache Affinity Linux MP Scheduling Simultaneous Multithreaded Systems Issues in SMT Scheduling

brittania
Download Presentation

Process Scheduling in Multiprocessor and Multithreaded Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Process Scheduling in Multiprocessor and Multithreaded Systems Matt Davis CS535 4/7/2003

  2. Outline • Multiprocessor Systems • Issues in MP Scheduling • How to Allocate Processors • Cache Affinity • Linux MP Scheduling • Simultaneous Multithreaded Systems • Issues in SMT Scheduling • Symbiotic Jobscheduling • SMT and Priorities • Linux SMT Scheduling • Conclusions

  3. CPU CPU CPU Shared Memory CPU CPU CPU CPU CPU Multiprocessor Systems • Symmetric Multiprocessing (SMP): • One copy of OS in memory, any CPU can use it • OS must ensure that multiple processors cannot access shared data structures at the same time Shared Memory Multiprocessors

  4. Issues in MP Scheduling • Starvation • Number of active parallel threads < number of allocated processors • Overhead • CPU time used to transfer and start various portions of the application • Contention • Multiple threads attempt to use same shared resource • Latency • Delay in communication between processors and I/O devices

  5. How to allocate processors • Allocate proportional to average parallelism • Other factors: • System load • Variable parallelism • Min/Max parallelism • Acquire/relinquish processors based on current program needs

  6. Cache Affinity • While a program runs, data needed is placed in local cache • When job is rescheduled, it will likely access some of the same data • Scheduling jobs where they have “affinity” improves performance by reducing cache penalties

  7. Cache Affinity (cont) • Tradeoff between processor reallocation and cost of reallocation • Utilization versus cache behavior • Scheduling policies: • Equipartition: constant number of processors allocated evenly to all jobs. Low overhead. • Dynamic: constantly reallocates jobs to maximize utilization. High utilization.

  8. Cache Affinity (cont) • Vaswani and Zahoran, 1991 • When a processor becomes available, allocate it to runnable process that was last run on processor, or higher priority job • If a job requests additional processors, allocate critical tasks on processor with highest affinity • If an allocated processor becomes idle, hold it for a small amount of time in case task with affinity comes along

  9. Vaswani and Zahoran, 1991 • Results showed that utilization was dominant effect on performance, not cache affinity • But their algorithm did not degrade performance • Predicted that as processor speeds increase, significance of cache affinity will also increase • Later studies validated their predictions

  10. Linux 2.5 MP Scheduling • Each processor responsible for scheduling own tasks • schedule() • After process switch, check if new process should be transferred to other CPU running lower priority task • reschedule_idle() • Cache affinity • Affinity mask stored in /proc/pid/affinity • sched_setaffinity(), sched_getaffinity()

  11. Thread 1 Thread 2 Time ALU FPU BP Mem What is SMT? • Simultaneous Multithreading • aka HyperThreading® • Issue instructions from multiple threads simultaneously on a superscalar processor

  12. Thread 1 Thread 2 Processor 1 Processor 2 Operating System Operating System Why SMT? • Technique to exploit parallelism in and between programs with minimal additions in chip resources • Operating system treats SMT processor as two separate processors*

  13. Issues With SMT Scheduling • *Not really separate processors: • Share same caches • MP scheduling attempts to avoid idle processors • SMT-aware scheduler must differentiate between physical and logical processors

  14. Symbiotic Jobscheduling • Recent studies from U of Washington • Origin of early research into SMT • OS coschedules jobs to run on hardware threads • # of coscheduled jobs <= SMT level • Occasionally swap out running set to ensure fairness

  15. Symbiotic Jobscheduling (cont) • Shared system resources: • Functional units, caches, TLB’s, etc… • Coscheduled jobs may interact well… • Few resource conflicts, high utilization • Or they may interact poorly • Many resource conflicts, lower utilization • Choice of coscheduled jobs can have large impact on system performance

  16. Symbiotic Jobscheduling (cont) • Improve symbiosis by coscheduling jobs that get along well • Two phases of SOS (Sample, Optimize, Symbios) jobscheduler: • Sample – Gather data on current performance • Symbios – Use computed scheduling configuration

  17. Symbiotic Jobscheduling (cont) • Sample phase: • Periodically alter coscheduled job mix • Record system utilization from hardware performance counter registers • Symbios phase: • Pick job mix that had the highest utilization • Trade-off between sampling often or infrequently

  18. How to Measure Utilization? • IPC not necessarily best predictor: • IPC can have high variations throughout process • High-IPC threads may unfairly take system resources from low-IPC threads • Other predictors: low # conflicts, high cache hit rate, diverse instruction mix • Balance: schedule with lowest deviation in IPC between coschedules is considered best

  19. What About Priorities? • Scheduler estimates the “natural” IPC of job • If a high-priority jobs is not meeting the desired IPC, it will be exclusively scheduled on CPU • Provides a truer implementation of priority: • Normal schedulers only guarantee proportional resource sharing, assumes no interaction between jobs

  20. Another Priority Algorithm: • SMT hardware fetches instructions to issue from queue • Scheduler can bias fetching algorithm to give preference to high-priority threads • Hardware already exists, minimal modifications

  21. Symbiosis Performance Results • Without priorities: • Up to 17% improvement • Software-enforced priorities: • Up to 20%, average 8% • Hardware-based priorities: • Up to 30%, average 15%

  22. Linux 2.5 SMT Scheduling • Immediate reschedule forced when HT CPU is executing two idle processes • HT-aware affinity: processes prefer same physical CPU • HT-aware load-balancing: distinguish logical and physical CPU in resource allocation

  23. Conclusions • Intelligent allocation of resources can improve performance in parallel systems • Dynamic scheduling of processors in MP systems produces better utilization as processor speeds increase • Cache affinity can help improve throughput • Symbiotic coscheduling of tasks in SMT systems can improve average response time

  24. Resources • Kenneth Sevcik, “Characterizations of Parallelism in Applications and Their Use in Scheduling” • Raj Vaswani and John Zahoran, “The Implications of Cache Affinity on Processor Scheduling for Multiprogrammed, Shared Memory Multiprocessors” • Allan Snavely et al., “Symbiotic Jobscheduling with Priorities for a Simultaneous Multithreading Processor” • Linux MP cache affinity, http://www.tech9.net/rml/linux • Linux Hyperthreading Scheduler, http://www.kernel.org/pub/linux/kernel/people/rusty/Hyperthread_Scheduler_Modifications.html • Daniel Bovet and Marco Cesati, Understanding the Linux Kernel

More Related