110 likes | 120 Views
This article explores the difficulties and solutions related to multiprocessor scheduling in the context of modern computer systems. Topics covered include cache affinity, single-queue scheduling, multi-queue scheduling, and Linux multiprocessor schedulers.
E N D
Multiprocessor Scheduling (Advanced) Minwoo Kwak (mwkwak@davinci.snu.ac.kr) School of Computer Science and Engineering Seoul National University
Introduction • After years of existence only in the high-end of the computing spectrum, multiprocessor systems are increasingly commonplace. • Ex) desktops, laptops, mobile devices • There are many difficulties that arise with the arrival of more thana single CPU. • Just adding more CPUs does not make the single application run faster. • Multiprocessor scheduling issue also should be addressed. • CRUX: How to schedule jobs on multiple CPUs • How should the OS schedule jobs on multiple CPUs? • What new problems arise? • Do the same old techniques work, or are new ideas required?
Background: Uniprocessor Architecture Ex) A program that issuses an explicit load instruction to fetch a value 1. fetch the data from memory (it takes a long time) 2. put a copy of the loaded data into the cache for the next use 3. check the cache first and fetch the data from cache (it takes a short time) CPU small & fast Cache Memory large & slow • To understand the new issues about multiprocessors scheduling, the difference between single and multi-CPU HW should be examined. • Which centers around the use of HW caches and exactly how data is shared • Single-CPU with cache
Background: Multiprocessor Architecture CPU1 CPU0 Cache Cache D D D’ small & fast Addr: A D Memory large & slow Ex) A program that issuses load and store instructions 1. The program running on CPU0 reads a data item (w/ value D) at address A. 2. The program then modifies the value of address A (just updating its cache). 3. OS decides to stop running program and move it to CPU1. 4. CPU1 re-reads the value at A and gets the old value D instead of the correct value D’ Multi-CPUs with caches and sharing memory General problem: cache coherence issue • Basic soluction is provided by the hardware (i.e., bus snooping) • Write-Invalidate • Write-Broadcast (Update)
Don’t Forget Synchronization int List_Pop() { Node_t *tmp = head; int value = head->value; head = head->next; free(tmp); return value; } thread1 thread2 thread1 3 : head tmp Double free 5 thread2 tmp 7 • When accessing shared data items across CPUs, mutual exclusion primitives (i.e., locks) should be used to guarantee correctness. • Simple example: to remove an element from a shared linked list • Solution: locking • Allocating a simple mutex and then adding a lock at the beginning of the routine and an unlock at the end
Cache Affinity • One final issues arises in building a multiprocessor scheduler, known as cache affinity. • “If possible, the same for me.” • A process when run on a particular CPU, builds up a fair bit of state in the caches (and TLBs) of the CPU. • The next time the process runs, it is often advantageous to run it on the same CPU (-some of its state is already present in the cache). • If, instead, one runs a process on a different CPU each time, the performance of the process will be worse (-have to reload the state every time). • Multiprocessor scheduler should consider cache affinity when making its scheduling decisions. • Preferring to keep a process on the same CPU if at all possible
Single-Queue Scheduling A B C D E Queue A E D C B A E A A A CPU0 … (repeat) … C B B A E D B B E B CPU1 … (repeat) … C B A E D C C C E C CPU2 … (repeat) … D C B A E D D D D E CPU3 … (repeat) … • Single-Queue Multiprocessor Scheduling (SQMS) • Advantage: simplicity • Shortcomings • Lack of scalability : locks can greatly reduce performance as the # of CPU grows. • Cache affinity issue http://www.flameia.com/index.php?ct=icons&ip=4
Multi-Queue Scheduling A C B D Queue0 Queue1 A A C C A A C C A A C C CPU0 … B B D D B B D D B B D D CPU1 … • Multi-Queue Multiprocessor Scheduling (MQMS) • Each scheduling queues will follow a particular discipline. (e.g., round robin) • When a job enters the system, it is placed on exactly on scheduling queue, according to some heuristic. (e.g., random, picking one with fewer jobs than others) • It is scheduled essentially independently, thus avoiding the problems of information sharing and synchronization found in the single-queue approach.
Load Balancing onMQMS • Solution • After both A and C finishes A B D Queue0 (empty) Queue1 A A A A B A B A B B B B A A A A A A A A A A A A (Idle) CPU0 … … B D B D D D D D A D A D B B D D B B D D B B D D … … CPU1 • Multi-Queue Multiprocessor Scheduling (MQMS) • Advantages: scalability, cache affinity • Shortcoming: load imbalance • Example • After one of the jobs (say C) finishes
Linux Multiprocessor Schedulers • In the Linux community, no common solution had approached to building a multiprocessor scheduler. • Over time, three scheduler arose: • The O(1) scheduler • A Priority-based scheduler which using multiple queues • It changes a process’s priority over time and then scheduling those with highest priority in order to meet various scheduling objectives. • The Completely Fair Scheduler (CFS) • A deterministic proportional-share approach which using multiple queues • The BrainF*ck scheduler (BFS) • A proportional-share approach which using the only single queue
Summary • Multiprocessor systems are increasingly commonplace. • How to schedule jobs on multiple CPUs is an important issue. • There are various approaches to multiprocessor scheduling. • Building a general purpose scheduler remains a daunting task, as small code changes can lead to large behavioral differences.