Process Scheduling in Multiprocessor and Multithreaded Systems

Process Scheduling in Multiprocessor and Multithreaded Systems Matt Davis CS535 4/7/2003

Outline • Multiprocessor Systems • Issues in MP Scheduling • How to Allocate Processors • Cache Affinity • Linux MP Scheduling • Simultaneous Multithreaded Systems • Issues in SMT Scheduling • Symbiotic Jobscheduling • SMT and Priorities • Linux SMT Scheduling • Conclusions

CPU CPU CPU Shared Memory CPU CPU CPU CPU CPU Multiprocessor Systems • Symmetric Multiprocessing (SMP): • One copy of OS in memory, any CPU can use it • OS must ensure that multiple processors cannot access shared data structures at the same time Shared Memory Multiprocessors

Issues in MP Scheduling • Starvation • Number of active parallel threads < number of allocated processors • Overhead • CPU time used to transfer and start various portions of the application • Contention • Multiple threads attempt to use same shared resource • Latency • Delay in communication between processors and I/O devices

How to allocate processors • Allocate proportional to average parallelism • Other factors: • System load • Variable parallelism • Min/Max parallelism • Acquire/relinquish processors based on current program needs

Cache Affinity • While a program runs, data needed is placed in local cache • When job is rescheduled, it will likely access some of the same data • Scheduling jobs where they have “affinity” improves performance by reducing cache penalties

Cache Affinity (cont) • Tradeoff between processor reallocation and cost of reallocation • Utilization versus cache behavior • Scheduling policies: • Equipartition: constant number of processors allocated evenly to all jobs. Low overhead. • Dynamic: constantly reallocates jobs to maximize utilization. High utilization.

Cache Affinity (cont) • Vaswani and Zahoran, 1991 • When a processor becomes available, allocate it to runnable process that was last run on processor, or higher priority job • If a job requests additional processors, allocate critical tasks on processor with highest affinity • If an allocated processor becomes idle, hold it for a small amount of time in case task with affinity comes along

Vaswani and Zahoran, 1991 • Results showed that utilization was dominant effect on performance, not cache affinity • But their algorithm did not degrade performance • Predicted that as processor speeds increase, significance of cache affinity will also increase • Later studies validated their predictions

Linux 2.5 MP Scheduling • Each processor responsible for scheduling own tasks • schedule() • After process switch, check if new process should be transferred to other CPU running lower priority task • reschedule_idle() • Cache affinity • Affinity mask stored in /proc/pid/affinity • sched_setaffinity(), sched_getaffinity()

Thread 1 Thread 2 Time ALU FPU BP Mem What is SMT? • Simultaneous Multithreading • aka HyperThreading® • Issue instructions from multiple threads simultaneously on a superscalar processor

Thread 1 Thread 2 Processor 1 Processor 2 Operating System Operating System Why SMT? • Technique to exploit parallelism in and between programs with minimal additions in chip resources • Operating system treats SMT processor as two separate processors*

Issues With SMT Scheduling • *Not really separate processors: • Share same caches • MP scheduling attempts to avoid idle processors • SMT-aware scheduler must differentiate between physical and logical processors

Symbiotic Jobscheduling • Recent studies from U of Washington • Origin of early research into SMT • OS coschedules jobs to run on hardware threads • # of coscheduled jobs <= SMT level • Occasionally swap out running set to ensure fairness

Symbiotic Jobscheduling (cont) • Shared system resources: • Functional units, caches, TLB’s, etc… • Coscheduled jobs may interact well… • Few resource conflicts, high utilization • Or they may interact poorly • Many resource conflicts, lower utilization • Choice of coscheduled jobs can have large impact on system performance

Symbiotic Jobscheduling (cont) • Improve symbiosis by coscheduling jobs that get along well • Two phases of SOS (Sample, Optimize, Symbios) jobscheduler: • Sample – Gather data on current performance • Symbios – Use computed scheduling configuration

Symbiotic Jobscheduling (cont) • Sample phase: • Periodically alter coscheduled job mix • Record system utilization from hardware performance counter registers • Symbios phase: • Pick job mix that had the highest utilization • Trade-off between sampling often or infrequently

How to Measure Utilization? • IPC not necessarily best predictor: • IPC can have high variations throughout process • High-IPC threads may unfairly take system resources from low-IPC threads • Other predictors: low # conflicts, high cache hit rate, diverse instruction mix • Balance: schedule with lowest deviation in IPC between coschedules is considered best

What About Priorities? • Scheduler estimates the “natural” IPC of job • If a high-priority jobs is not meeting the desired IPC, it will be exclusively scheduled on CPU • Provides a truer implementation of priority: • Normal schedulers only guarantee proportional resource sharing, assumes no interaction between jobs

Another Priority Algorithm: • SMT hardware fetches instructions to issue from queue • Scheduler can bias fetching algorithm to give preference to high-priority threads • Hardware already exists, minimal modifications

Symbiosis Performance Results • Without priorities: • Up to 17% improvement • Software-enforced priorities: • Up to 20%, average 8% • Hardware-based priorities: • Up to 30%, average 15%

Linux 2.5 SMT Scheduling • Immediate reschedule forced when HT CPU is executing two idle processes • HT-aware affinity: processes prefer same physical CPU • HT-aware load-balancing: distinguish logical and physical CPU in resource allocation

Conclusions • Intelligent allocation of resources can improve performance in parallel systems • Dynamic scheduling of processors in MP systems produces better utilization as processor speeds increase • Cache affinity can help improve throughput • Symbiotic coscheduling of tasks in SMT systems can improve average response time

Resources • Kenneth Sevcik, “Characterizations of Parallelism in Applications and Their Use in Scheduling” • Raj Vaswani and John Zahoran, “The Implications of Cache Affinity on Processor Scheduling for Multiprogrammed, Shared Memory Multiprocessors” • Allan Snavely et al., “Symbiotic Jobscheduling with Priorities for a Simultaneous Multithreading Processor” • Linux MP cache affinity, http://www.tech9.net/rml/linux • Linux Hyperthreading Scheduler, http://www.kernel.org/pub/linux/kernel/people/rusty/Hyperthread_Scheduler_Modifications.html • Daniel Bovet and Marco Cesati, Understanding the Linux Kernel

Process Scheduling in Multiprocessor and Multithreaded Systems

Process Scheduling in Multiprocessor and Multithreaded Systems

Presentation Transcript

Multiprocessor and Real-Time Scheduling

Multiprocessor Scheduling

Chapter 2 Process ， thread, and scheduling —— Solaris Multithreaded Process

Multiprocessor Real-time Scheduling

Bottleneck Identification and Scheduling in Multithreaded Applications

UET Multiprocessor Scheduling Problems

Multiprocessor Scheduling

Scheduling for Multithreaded Chip Multiprocessors (Multithreaded CMPs)

Multiprocessor Scheduling

Multiprocessor scheduling

Multiprocessor and Real-Time Scheduling

Multiprocessor Scheduling

Multiprocessor and Real-Time Scheduling

Multiprocessor Scheduling

Multiprocessor Scheduling

Multiprocessor and Real-Time Scheduling

Multiprocessor Scheduling

Multiprocessor and Real-Time Scheduling

Multiprocessor Scheduling (Advanced)

Bottleneck Identification and Scheduling in Multithreaded Applications