Multi-processor Scheduling

Multi-processor Scheduling • Two implementation choices • Single, global ready queue • Per-processor run queue • Which is better?

Queue-per-processor • Advantages of queue per processor • Promotes processor affinity (better cache locality) • Removes a centralized bottleneck • Which runs in global memory • Supported by default in Linux 2.6 • Java 1.6 support: a double-ended queue (java.util.Deque) • Use a bounded buffer per consumer • If nothing in a consumer’s queue, steal work from somebody else • If too much in the queue, push work somewhere else

Thread Implementation Issues Andrew Whitaker

Where do Threads Come From? • A few choices: • The operating system • A user-mode library • Some combination of the two…

Option #1: Kernel Threads • Threads implemented inside the OS • Thread operations (creation, deletion, yield) are system calls • Scheduling handled by the OS scheduler • Described as “one-to-one” • One user thread mapped to one kernel thread • Every invocation of Thread.start() creates a kernel thread process OS threads

Option #2: User threads • Implemented as a library inside a process • All operations (creation, destruction, yield) are normal procedure calls • Described as “many-to-one” • Many user-perceived threads map to a single OS process/thread process OS thread

Process Address Space Review • Every process has a user stack and a program counter • In addition, each process has a kernel stack and program counter • (not shown here) stack SP heap (dynamic allocated mem) static data (data segment) code (text segment) PC

Threaded Address Space User address space (for both user and kernel threads) • Every thread always has its own user stack and program counter • For both user, kernel threads • For user threads, there is only a single kernel stack, program counter, PCB, etc. thread 1 stack SP (T1) thread 2 stack SP (T2) thread 3 stack SP (T3) heap (dynamic allocated mem) static data (data segment) PC (T2) code (text segment) PC (T1) PC (T3)

User Threads vs. Kernel Threads • User threads are faster • Operations do not pass through the OS • But, user threads suffer from: • Lack of physical parallelism • Only run on a single processor! • Poor performance with I/O • A single blocking operation stalls the entire application • For these reasons, most (all?) major OS’s provide some form of kernel threads

When Would User Threads Be Useful? • The  calculator? • The web server? • The Fibonacci GUI?

Option #3: Two-level Model • OS supports native multi-threading • And, a user library maps multiple user threads to a single kernel thread • “Many-to-many” • Potentially captures the best of both worlds • Cheap thread operations • Parallelism process OS threads

Problems with Many-to-Many Threads • Lack of coordination between user and kernel schedulers • “Left hand not talking to the right” • Specific problems • Poor performance • e.g., the OS preempts a thread holding a crucial lock • Deadlock • Given K kernel threads, at most K user threads can block • Other runnable threads are starved out!

Scheduler Activations, UW 1991 • Add a layer of communication between kernel and user schedulers • Examples: • Kernel tells user-mode that a task has blocked • User scheduler can re-use this execution context • Kernel tells user-mode that a task is ready to resume • Allows the user scheduler to alter the user-thread/kernel-thread mapping • Supported by newest release of NetBSD

Implementation Spilling Over into the Interface • In practice, programmers have learned to live with expensive kernel threads • For example, thread pools • Re-use a static set of threads throughout the lifetime of the program

Locks • Used for implementing critical sections • Modern languages (Java, C#) implicitly acquire and release locks interface Lock { public void acquire(); // only one thread allowed between an // acquire and a release public void release(); }

Two Varieties of Locks • Spin locks • Threads busy wait until the lock is freed • Thread stays in the ready/running state • Blocking locks • Threads yield the processor until the lock is freed • Thread transitions to the blocked state

Why Use Spin Locks? • Spin Locks can be faster • No context switching required • Sometimes, blocking is not an option • For example, in the kernel scheduler implementation • Spin locks are neverused on a uniprocessor

Multiple threads can acquire this lock! Bogus Spin Lock Implementation class SpinLock implements Lock { private volatile boolean isLocked = false; public void acquire() { while (isLocked) { ; } // busy wait isLocked = true; } public void release() { isLocked = false; } }

Hardware Support for Locking • Problem: Lack of atomicity in testing and setting the isLocked flag • Solution: Hardware-supported atomic instructions • e.g., atomic test-and-set • Java conveniently abstracts these primitives (AtomicInteger, and friends)

Corrected Spin Lock class SpinLock implements Lock { private final AtomicBoolean isLocked = new AtomicBoolean (false); public void acquire() { // get the old value, set a new value while (isLocked.getAndSet(true)) { ; } } public void release() { assert (isLocked.get() == true); isLocked.set(false); } }

Problem: must ensure thread-safe access to the wait queue! Blocking Locks: Acquire Implementation • Atomically test-and-set locked status • If lock is already held: • Set thread state to blocked • Add PCB (task_struct) to a wait queue • Invoke the scheduler

Disabling Interrupts • Prevents the processor from being interrupted • Serves as a coarse-grained lock • Must be used with extreme care • No I/O or timers can be processed

Thread-safe Blocking Locks • Atomically test-and-set locked status • If lock is already held: • Set thread state to blocked • Disable interrupts • Add PCB (task_struct) to a wait queue • Invoke the scheduler • Next task re-enables interrupts

Disabling Interrupts on a Multiprocessor • Disabling interrupts can be done locally or globally (for all processors) • Global disabling is extremely heavyweight • Linux: spin_lock_irq • Disable interrupts on the local processor • Grab a spin lock to lock out other processors

Preview For Next Week publicclass Example extends Thread { private staticint x = 1; private static int y = 1; private static boolean ready = false; publicstaticvoid main(String[] args) { Thread t = new new Example(); t.start(); x = 2; y = 2; ready = true; } publicvoid run() { while (! ready) Thread.yield(); // give up the processor System.out.println(“x= “ + x + “y= “ + y); } }

What Does This Program Print? • Answer: it’s a race condition. Many different outputs are possible • x=2, y=2 • x=1,y=2 • x=2,y=1 • x=1,y=1 • Or, the program may print nothing! • The ready loop runs forever

Multi-processor Scheduling

Multi-processor Scheduling

Presentation Transcript

CHAPTER 2 PROCESSOR SCHEDULING PART I

Multi-Core/Processor

Multi-core Processor

Multi-processor SoCs

MULTI-AGENT BASED SCHEDULING

Processor Scheduling

CHAPTER 2 PROCESSOR SCHEDULING PART II

Power Aware Scheduling for AND/OR Graphs in Multi-Processor Real-Time Systems

Multi-Site Scheduling

The Multi Micro Processor

Chapter 5 Processor Scheduling

CHAPTER 2 PROCESSOR SCHEDULING PART III

Chapter 8 – Processor Scheduling

Processor Scheduling

CHAPTER 2 PROCESSOR SCHEDULING PART IV

Condor and Multi-core Scheduling

CHAPTER 2 PROCESSOR SCHEDULING PART V

Chapter 8 – Processor Scheduling

Multi-core Processor

CHAPTER 2 PROCESSOR SCHEDULING PART V

CHAPTER 2 PROCESSOR SCHEDULING PART I

GF - MVP1000 Multi Image Video Processor