690 likes | 707 Views
Learn about threads and processes, multithreaded applications design, implicit threading approaches, Windows and Linux thread representation, and multicore programming challenges and benefits. Explore user and kernel threads, thread libraries, and Amdahl’s Law for performance analysis.
E N D
Chapter 4: Threads • Overview • Multicore Programming • Multithreading Models • Thread Libraries • Implicit Threading • Threading Issues • Operating System Examples
Objectives • Identify the basic components of a thread, and contrast threads and processes • Describe the benefits and challenges of designng multithreaded applications • Illustrate different approaches to implicit threading including thread pools, fork-join, and Grand Central Dispatch • Describe how the Windows and Linux operating systems represent threads • Design multithreaded applications using the Pthreads, Java, and Windows threading APIs
Motivation • Most modern applications are multithreaded • Threads run within application • Multiple tasks with the application can be implemented by separate threads • A thread to Update display • A thread to Fetch data • A thread to Spell checking • A thread to Answer a network request • Process creation is heavy-weight while thread creation is light-weight • Can simplify code, increase efficiency • Kernels are generally multithreaded
Benefits of Threads • Responsiveness – may allow continued execution if part of a process is blocked, especially important for user interfaces • Resource Sharing – threads share resources of process, easier than shared memory or message passing • Economy – cheaper than process creation, thread switching lower overhead than context switching • Scalability – process can take advantage of multiprocessor architectures
Multicore Programming • Multicore or multiprocessor systems are putting pressure on programmers, the challenges include: • Dividing activities • Balance load • Data splitting • Data dependency • Testing and debugging • Parallelism implies a system can perform more than one task simultaneously • Concurrency supports more than one task making progress • Single processor / core, scheduler providing concurrency • Pseudo parallelism • Multiple tasks are interleaved making it look parallel
Concurrency vs. Parallelism • Concurrent execution on single-core system: • Parallelism on a multi-core system:
Multicore Programming (Cont.) • Types of parallelism • Data parallelism – distributes subsets of the same data across multiple cores, same operation on each • Task parallelism – distributing threads across cores, each thread performing unique operation • As # of threads grows, so does architectural support for threading • CPUs have cores as well as hardware threads • IBM Power Systems S924 — 4U, 2× POWER9 SMT8, 8-12 cores per processor, up to 4 TB DDR4 RAM, PowerVM running AIX/IBM i/Linux. • INTEL 8th generation, Kaby Lake & Coffee Lake, 2-6 cores per processor, 1-2 SMT per core. Hyperthreading on the higher end. Native support for DDR4
Amdahl’s Law • Identifies performance gains from adding additional cores to an application that has both serial and parallel components • S is serial portion; N processing cores • That is, if application is 75% parallel / 25% serial, moving from 1 to 2 cores results in speedup of 1.6 times • As N approaches infinity, speedup approaches 1 / S Serial portion of an application has a disproportionate effect on performance gained by adding additional cores. • But does the law take into account contemporary multicore systems?
Amdhal’s Law • S = .25 and N = 2 1/(.25 + (1-.25/2)) 1/(.25 + (.75/2)) 1/(.25 + .375) 1/.625 • 1.6 speedup
User Threads and Kernel Threads • User threads - management done by user-level threads library • Three primary thread libraries: • POSIX Pthreads • Windows threads • Java threads • Kernel threads - Supported by the Kernel • Examples – virtually all general purpose operating systems, including: • Windows • Linux • Mac OS X • iOS • Android
Multithreading Models • Multithreading Models refer to mapping user threads to kernel threads. • There are three common Multithreading Models for Kernel threads • Many-to-One • One-to-One • Many-to-Many
Many-to-One • Many user-level threads mapped to single kernel thread • One thread blocking causes all to block • Multiple threads may not run in parallel on muticore system because only one may be in kernel at a time • Few systems currently use this model • Examples: • Solaris Green Threads • GNU Portable Threads
One-to-One • Each user-level thread maps to kernel thread • Creating a user-level thread creates a kernel thread • More concurrency than many-to-one • Number of threads per process sometimes restricted due to overhead • Examples • Windows • Linux
Many-to-Many Model • Allows many user level threads to be mapped to many kernel threads • Allows the operating system to create a sufficient number of kernel threads • Like thread pooling, associates user thread to kernel thread during scheduling • Windows with the ThreadFiber package, otherwise not common.
Two-level Model • Similar to M:M, except that it allows a user thread to be bound to kernel thread • Examples • IRIX • HP-UX • Tru64 UNIX • Solaris 8 and earlier
Thread Libraries • Thread libraryprovides programmer with API for creating and managing threads • Two primary ways of implementing • Library entirely in user space • Kernel-level library supported by the OS
Pthreads • May be provided either as user-level or kernel-level threads • A POSIX standard (IEEE 1003.1c) API for thread creation and synchronization • Specification, not implementation • API specifies behavior of the thread library, implementation is up to development of the library • Common in UNIX operating systems (Solaris, Linux, Mac OS X) • Let’s see an example on the next slide • A Triangular number is the sum of the numbers from 1-n • A program using multiple threads creates a thread that is passed the n value and then waits for the thread to return.
Pthreads API • pthread_t // thread data including id. • pthread_attr_t // Opaque struct. Uses functions to get/set attrs • pthread_attr_init(pthread_attr* attrs); • Returns the default thread attrs. • pthread_create(pthread_t* thread, const pthread_attr_t* attrs, void *(*start_routine) (void *), void *arg); • pthread_join(pthread_t thread, void** retval); • Blocks until the given thread returns. If retval is non-NULL, then the pthread_exit value is returned in retval • pthread_exit(void* retval); • Exits the thread, if the main thread calls pthread_exit rather than exit, the children can continue executing.
Java Threads • Java threads are managed by the JVM • Typically implemented using the threads model provided by underlying OS • Java threads may be created by: • Extending Thread class • Implementing the Runnable interface
Java Threads Implementing Runnable interface: Creating a thread: Waiting on a thread:
Java Executor Framework • Rather than explicitly creating threads, Java also allows thread creation around the Executor interface: • Should be execute(Callable command) as well. • The Executor is used as follows:
Threads • What is shared between threads? • How does a thread context switch differ from a process context switch?
Threads • What is shared between threads? • Text – code is shared • Heap – is shared • Global data - is shared • Stack is not shared • Thread local storage – behaves like static, global but not shared. • How does a thread context switch differ from a process context switch? • If the kernel schedules at the thread level, then all context switches are thread context switches, however… stack pointer, program counter and registers must be saved in both cases. • Threads share text, heap and global data which means that a thread context switch can leave all caches and virtual memory alone. • A process context switch must empty caches and reload virtual memory • A new thread/process must be selected to resume/run.
Thread Models • What are the four thread models?
Thread Models • What are the four thread models? • One to Many • Every process has exactly one thread, Threads are controlled at the user level by the thread library. Also known as co-routines. • One to One • Every user thread is represented by a kernel thread. Process Control Block is actually a thread control block (or task control block). • Many to Many • Every user thread runs on a kernel thread but kernel threads are assigned when scheduled. • Two level Model • Most threads follow the Many to Many model but high priority threads are assigned on a One to One model.
Implicit Threading • Growing in popularity as numbers of threads increase, program correctness more difficult with explicit threads • Creation and management of threads done by compilers and run-time libraries rather than programmers • Five methods explored • Thread Pools • Fork-Join • OpenMP • Grand Central Dispatch • Intel Threading Building Blocks
Thread Pools • Create a number of threads in a pool where they await work • Advantages: • Usually slightly faster to service a request with an existing thread than create a new thread • Allows the number of threads in the application(s) to be bound to the size of the pool • Separating task to be performed from mechanics of creating task allows different strategies for running task • i.e.Tasks could be scheduled to run periodically • Windows API supports thread pools:
Java Thread Pools • Three factory methods for creating thread pools in Executors class:
Fork-Join Parallelism • Multiple threads (tasks) are forked, and then joined.
Fork-Join Parallelism • General algorithm for fork-join strategy:
Fork-Join Parallelism in Java • The ForkJoinTask is an abstract base class • RecursiveTask and RecursiveAction classes extend ForkJoinTask • RecursiveTask returns a result (via the return value from the compute() method) • RecursiveAction does not return a result
OpenMP • Set of compiler directives and an API for C, C++, FORTRAN • Provides support for parallel programming in shared-memory environments • Identifies parallel regions – blocks of code that can run in parallel #pragma omp parallel Create as many threads as there are cores #pragma omp parallel for for(i=0;i<N;i++) { c[i] = a[i] + b[i]; } Run for loop in parallel