1 / 72

User-level Techniques on uniprocessors and SMPs Focusing on Thread Scheduling

User-level Techniques on uniprocessors and SMPs Focusing on Thread Scheduling. CSM211 Kurt Debattista. Literature. Any good operating system book – dinosaur book Vahalia U. Unix Internals. Prentice Hall, 1996

diata
Download Presentation

User-level Techniques on uniprocessors and SMPs Focusing on Thread Scheduling

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. User-level Techniques on uniprocessors and SMPsFocusing on Thread Scheduling CSM211 Kurt Debattista

  2. Literature • Any good operating system book – dinosaur book • Vahalia U. Unix Internals. Prentice Hall, 1996 • Moores J. CCSP - A portable CSP-based run-time system supporting C and occam. Volume 57, Concurrent Systems Engineering Series, pages 147-168, IOS Press, April 1999 • System’s software research group http://www.cs.um.edu.mt/~ssrg Literature on thread scheduling on uniprocessors, SMPs, avoiding blocking system calls (Vella, Borg, Cordina, Debattista)

  3. Overview • Processes • Threads • Kernel threads • User threads • User-level Memory Management • User-level Thread Scheduling • Multiprocessor hardware • SMP synchronisation • SMP thread schedulers

  4. Time

  5. Processes • UNIX like operating systems provide for concurrency by means of time-slicing for long-lived, infrequently communicating processes • Concurrency is usually an illusion (uniprocessor) • Parallelism (SMPs, distributed systems) • Within a single application?

  6. Processes – Vertical switch • Every time we enter the kernel (system call) we incur a vertical switch • Save current context • Change to kernel stack • Change back

  7. Processes – Horizontal switch • Context switch from one processes to another – horizontal switch • Enter kernel (vertical switch) • Dispatch next process • Change memory protection boundaries • Restore new process context

  8. Processes - Creation • Creating a new process • Enter kernel (vertical switch) • Allocating memory for all process structures • Init new tables • Update file tables • Copy parent process context • All operations on processes in the order of hundreds of microseconds and sometimes milliseconds

  9. Multithreading • Programming style used to represent concurrency within a single application • A thread is an independent instance of execution of a program represented by a PC, a register set (context) and a stack

  10. Multithreading (2) • In multithreaded applications threads co-exist in the same address space • A traditional process could be considered an application composed of a single thread • Example – web server • Share the same data between threads • Spawn threads • Communicate through shared memory

  11. Multithreading (3) • Two types of threads • Kernel-level threads • User-level threads

  12. Kernel threads • Kernel threads are threads that the kernel is aware of • The kernel is responsible for the creation of the threads and schedules them just like any other process

  13. Advantages Concurrency within a single application No memory boundaries IPC through shared memory avoiding kernel access Kernel interaction Disadvantages Thread creation (vertical switch) Context switch (horizontal switch) Kernel interaction Kernel threads (2)

  14. Kernel threads (3) • Thread management in the order of tens of microseconds (sometimes hundreds) • Creation • Context switch • Kernel based IPC • Fast shared memory IPC in the order of hundreds (even tens) of nanoseconds

  15. User threads • Kernel is unaware of threads • All scheduling takes place at the user level • All scheduler data structures exist in the user-level address space

  16. User-level Thread Scheduling

  17. User-level thread scheduling • Library • Pre-emption and cooperative multithreading • Pre-emption (like UNIX) • Cooperative (think Windows 3.1) • Performance in the order of tens of nanoseconds

  18. User-level thread library • Application is linked with a thread library that manages threads at run-time • Library launches the main() function and is responsible for all thread management

  19. Fast multithreaded C library • Scheduler initialisation and shutdown • Structures for scheduler • Run queue(s) • Thread descriptor • Communication constructs • Provide functions for thread creation, execution, IPC, yield, destroy

  20. Scheduler intialisation • Traditional main() function is part of library, the application programmer uses cthread_main() as main function • Initialise all scheduler data structures • cthread_main() is in itself a thread

  21. Scheduler structures • Thread run queue • Fast FIFO queue • Priority based scheduling • Multiple queues • Priority queue • Thread descriptor • PC and set of registers (jmp_buf) • Stack (pointer to chunk of memory)

  22. Communication structures • Structures for communications • Semaphores • Channels • Barrier • others

  23. Thread API • Provide functions for threads • Initialisation - cthread_init() • Execution – cthread_run() • Yield – cthread_yield() • Barrier – cthread_join() • Termination - automatic

  24. Context switch • Use functions setjmp() and longjmp() to save and restore context (jmp_buf) • setjmp() saves the current context • longjmp() restores the context

  25. User-level Thread Scheduling • Thread scheduling is abstracted from the kernel • Thread management occurs at the user level • No expensive system calls • Mini-kernel on top of the OS kernel • Ideal for fine-grained multithreading

  26. User-level Thread Schedulers • Successful user-level thread schedulers exist in the form of • CCSP • KRoC • MESH • Mach Cthreads • smash • Sun OS threads

  27. Thread Management Issues • Interaction with operating system • Blocking kernel threads • Multiplexing kernel threads • Scheduler activations (Anderson) • System call wrappers • See Borg’s thesis for more info. (SSRG homepage) • Thread co-opeation • Automatic yield insertion (Barnes) • Active context switching (Moores)

  28. Blocking Kernel Threads • In single kernel-threaded schedulers, when the kernel thread blocks on a blocking system call the entire scheduler blocks • Multiplexing kernel threads • Reduces the problem (though the problem is still there) • Increases the amount of horizontal switching

  29. Blocking Kernel Threads (2) • Partial solutions • Allocate kernel thread for particular functions (e.g. Keyboard I/O) • Horizontal switching • System Call Wrappers • Wrap all (or required) blocking system calls with wrappers that launch a separate kernel thread • Require wrapper for each call • Ideal when wrappers already exist (e.g. occam) • Incurs horizontal switching overhead • Blocking system call might NOT block

  30. Scheduler Activations • Scheduler activations offer interaction between user-level space and kernel space • Scheduler activations are the executing context on which threads run (like kernel threads) • When an activation blocks the kernel creates a new activation and informs the user space that an activation is blocked (an upcall) • Moreover a new activation is created to continue execution • One of the most effective solution but must be implemented at the kernel level • Also useful for removing the extended spinning problem on multiprogrammed multiprocessor systems

  31. Scheduler Activations (2)

  32. Web Server Example

  33. User-level Memory Management • Memory management can also benefit from user-level techniques • Replace malloc()/free() with faster user-level versions • Remove system calls sbrk() • Remove page faults

  34. User-level Memory Management (2) • Ideal for allocating/de-allocating memory for similar data structures • No fragmentation • Pre-allocate large chunk using malloc( ) whenever required • User-level data structures handle allocation/de-allocation • Results • malloc()/free 323ns • ul_malloc()/ul_free() 16ns

  35. User-level Memory Management (3) • More complex allocation/de-allocation possible through building complete user-level memory manager • Avoid page faults (physical memory allocation) • Enable direct SMP support • Results • malloc()/free() 323ns • ul_malloc()/ul_free() 100ns (ca.)

  36. Multiprocessor Hardware (Flynn) • Single Instruction Single Data (SISD) • Uniprocessor machines • Single Instruction Multiple Data (SIMD) • Array computers • Multiple Instruction Single Data (MISD) • Pipelined vector processors (?) • Multiple Instruction Multiple Data (MIMD) • General purpose parallel computers

  37. Memory Models • Uniform Memory Access (UMA) • Each CPU has equal access to memory and I/O devices • Non-Uniform Memory Access (NUMA) • Each CPU has local memory and is capable of accessing memory local to other CPUs • No Remote Memory Access (NORMA) • CPUs with local memory connected over a high-speed network

  38. UMA

  39. NUMA

  40. Hybrid NUMA

  41. NORMA

  42. Symmetric Multiprocessors • The UMA memory model is probably the most common • The tightly-coupled, shared memory, symmetric multiprocessor (SMP) (Schimmel) • CPUs, I/O and memory are interconnected over a high speed bus • All units are located at a close physical distance from each other

  43. Symmetric Multiprocessors (2) • Main memory consists of one single global memory module • Each CPU usually has access to local memory in terms of a local cache • Memory access is symmetric • Fair access is ensured • Cache

  44. Caching • Data and instruction buffer • Diminish relatively slow speeds between memory and processor • Cache consistency on multiprocessors • Write through protocol • Snoopy caches • False sharing

  45. Mutual exclusion Lock-free Non-blocking Busy - waiting Blocking Wait-free Synchronisation • Synchronisation primitives serve to • provide access control to shared resources • event ordering • Valois describes the relationship of synchronisation methods

  46. Synchronisation Support • Synchronisation on multiprocessors relies on hardware support • Atomic read and write instructions • “Read-modify-write” instructions • swap • test and set • compare and swap • load linked / store conditional • double compare and swap • Herlihy’s hierarchy

  47. Mutual Exclusion • Blocking or busy-waiting • Rule of thumb is to busy-wait if time expected to wait is less than the time required to block and resume a process • In fine grain multithreaded environments critical sections are small so, busy-waiting is usually preferred

  48. Spin locks • Spin locks are mostly likely the simplest locking primitives • A spin lock is a variable that is in one of two states (usually 1 or 0) • Two operations act on a spin lock • Spin / acquire lock • Spin release

  49. Spin Lock Implementation Acquire spin lock spin: lock ; lock bus for btsl btsl lock, 0 ; bit test and set jnc cont ; continue if carry is 0 jmp spin ; else go to spin cont: Release spin lock lock ; lock bus for btrl btrl lock, 0 ; release lock

  50. Test and Test and Set Lock • Segall and Rudolph present a spin lock that does not monopolise the bus Acquire spin lock – Segall and Rudolph spin: lock ; lock bus for btsl btsl lock, 0 ; bit test and set jnc cont ; continue if carry is 0 loop; btl lock, 0 ; test only jc loop ; loop if carry is set jmp spin ; else go to spin cont:

More Related