450 likes | 648 Views
Concurrency. Motivations. To capture the logical structure of a problem Servers, graphical applications To exploit extra processors, for speed Ubiquitous multi-core processors To cope with separate physical devices Internet applications. HTC vs HPC. High throughput computing
E N D
Motivations • To capture the logical structure of a problem • Servers, graphical applications • To exploit extra processors, for speed • Ubiquitous multi-core processors • To cope with separate physical devices • Internet applications
HTC vs HPC • High throughput computing • Environments that can deliver large amounts of processing capacity over long periods of time • High performance computing • Uses supercomputers and computer clusters to solve advanced computation problems • DACTAL • Condor • Concurrent application
Concurrency • Any system in which two or more tasks may be underway at the same time (at an unpredictable point in their execution) • Parallel: more than one task physically active • Requires multiple processors • Distributed: processors are associated with people or devices that are physically separated from one another in the real world
Levels of Concurrency • Instruction level • Two or more machine instructions • Statement level • Two of more source language statements • Unit level • Two or more subprogram units • Program level • Two or more programs
Fundamental Concepts • A task or process is a program unit that can be in concurrent execution with other program units • Tasks differ from ordinary subprograms in that: • A task may be implicitly started • When a program unit starts the execution of a task, it is not necessarily suspended • When a task’s execution is completed, control may not return to the caller • Tasks usually work together
Task Categories • Heavyweight tasks • Execute in their own address space and have their own run-time stacks • Lightweight tasks • All run in the same address space and use the same run-time stack • A task is disjoint if it does not communicate with or affect the execution of any other task in the program in any way
Synchronization • A mechanism that controls the order in which tasks execute • Cooperation: Task A must wait for task B to complete some specific activity before task A can continue its execution • e.g. the producer-consumer problem • Competition: Two or more tasks must use some resource that cannot be simultaneously used • e.g. a shared counter, dining philosophers • Competition is usually provided by mutually exclusive access
producer1 consumer1 consumer2 producer2 … There are M producers that put items into a fixed-sized buffer. The buffer is shared with N consumers that remove items from the buffer. producerM consumerN The Producer-Consumer Problem There are a number of “classic” synchronization problems, one of which is the producer-consumer problem... • The problem is to devise a solution that synchronizes the producer & consumer accesses to the buffer. Accesses to the buffer must be synchronized, because if multiple producers and/or consumers access it simultaneously, values may get lost, retrieved twice, etc. In the bounded-buffer version, the buffer has some fixed-capacity N.
Five philosophers sit at a table, alternating between eating noodles and thinking. In order to eat, a philosopher must have two chopsticks. However, there is a single chopstick between each pair of plates, so if one is eating, neither neighbor can eat. A philosopher puts down both chopsticks when thinking. philosopher := [ [true] whileTrue: [ self get: left. self get: right. self eat. self release: left. self release: right. self think. ] ] Deadlock! Dining Philosophers • Devise a solution that ensures: • no philosopher starves; and • a hungry philosopher is only prevented from eating by his neighbor(s).
philosopher := [ [true] whileTrue: [ [self have: left and: right] whileFalse: [ self get: left. [right notInUse] ifTrue: [self get: right] ifFalse: [self release: left] ]. self eat. self release: left. self release: right. self think. ] ] Livelock! How about this instead?
Liveness and Deadlock • Liveness is a characteristic that a program unit may or may not have • In sequential code, it means the unit will eventually complete its execution • In a concurrent environment, a task can easily lose its liveness • If all tasks in a concurrent environment lose their liveness, it is called deadlock (or livelock)
Race conditions • When the resulting value of a variable, when two different thread of a program are writing to it, will differ depending on which thread writes to it first • Transient errors, hard to debug • E.g. c = c + 1 • Solution: acquire access to the shared resource before execution can continue • Issues: lockout, starvation load c add 1 store c
Task Execution States • Assuming some mechanism for synchronization (e.g. a scheduler), tasks can be in a variety of states: • New - created but not yet started • Runnable or ready - ready to run but not currently running (no available processor) • Running • Blocked - has been running, but cannot now continue (usually waiting for some event to occur) • Dead - no longer active in any sense
Design Issues • Competition and cooperation synchronization • Controlling task scheduling • How and when tasks start and end execution • Alternatives: • Semaphores • Monitors • Message Passing
Semaphores • Simple mechanism that can be used to provide synchronization of tasks • Devised by EdsgerDijkstra in 1965 for competition synchronization, but can also be used for cooperation synchronization • A data structure consisting of an integer and a queue that stores task descriptors • A task descriptor is a data structure that stores all the relevant information about the execution state of a task
Semaphore operations • Two atomic operations, P and V • Consider a semaphore s: • P (from the Dutch “passeren”) • P(s) – if s > 0 then assign s = s – 1; otherwise block (enqueue) the thread that calls P • Often referred to as “wait” • V (from the Dutch “vrygeren/vrijgeven”) • V(s) – if a thread T is blocked on the s, then wake up T; otherwise assign s = s + 1 • Often referred to as “signal”
wantBothSticks := Semaphore new. philosopher := [ [true] whileTrue: [ [self haveBothSticks] whileFalse: [ wantBothSticks wait. left available and right available ifTrue: [ self get: left. self get: right. ]. wantBothSticks signal. ]. self eat. self release: left. self release: right. self think. ] ] Dining Philosophers
The trouble with semaphores • No way to statically check for the correctness of their use • Leaving out a single wait or signal event can create many different issues • Getting them just right can be tricky • Per Brinch Hansen (1973): • “The semaphore is an elegant synchronization tool for an ideal programmer who never makes mistakes.”
Locks and Condition Variables • A semaphore may be used for either of two purposes: • Mutual exclusion: guarding access to a critical section • Synchronization: making processes suspend/resume • This dual use can lead to confusion: it may be unclear which role a semaphore is playing in a given computation… • For this reason, newer languages may provide distinct constructs for each role: • Locks: guarding access to a critical section • Condition Variables: making processes suspend/resume • Locks provide for mutually-exclusive access to shared memory; condition variables provide for thread/process synchronization.
sharedLock.acquire(); // access sharedObj sharedLock.release(); Lock sharedLock; Object sharedObj; sharedLock.acquire(); // access sharedObj sharedLock.release(); Locks • Like a Semaphore, a lock has two associated operations: • acquire() • try to lock the lock; if it is already locked, suspend execution • release() • unlock the lock; awaken a waiting thread (if any) • These can be used to ‘guard’ a critical section: • A Java class has a hidden lock accessible via the synchronized keyword
Condition Variables • A Condition is a predefined type available in some languages that can be used to declare variables for synchronization. • When a thread needs to suspend execution inside a critical section until some condition is met, a Condition can be used. • There are three operations for a Condition: • wait() • suspend immediately; enter a queue of waiting threads • signal(), aka notify() in Java • awaken a waiting thread (usually the first in the queue), if any • broadcast(), aka notifyAll() in Java • awaken all waiting threads, if any • Java has no Condition class, but every Java class has an anonymous condition-variable that can be manipulated via wait, notify & notifyAll
Monitor motivation • A Java class has a hidden lock accessible via the synchronized keyword • Deadlocks/livelocks/non-mutual-exclusion are easy to produce • Just as control structures were “higher level” than the goto, language designers began looking for higher level ways to synchronize processes • In 1973, Brinch-Hansen and Hoare proposed the monitor, a class whose methods are automatically accessed in a mutually-exclusive manner. • A monitor prevents simultaneous access by multiple threads
Monitors • The idea: encapsulate the shared data and its operations to restrict access • A monitor is an abstract data type for shared data • Shared data is resident in the monitor (rather than in the client units) • All access resident in the monitor • Monitor implementation guarantee synchronized access by allowing only one access at a time • Calls to monitor procedures are implicitly queued if the monitor is busy at the time of the call
put(obj) public (interface) get(obj) … hidden lock entry queue notEmpty … private … notFull myHead myTail mySize N myValues … Monitor Visualization The compiler ‘wraps’ calls to put() and get() as follows: buf.lock.acquire(); … call to put or get buf.lock.release(); If the lock is locked, the thread enters the entry queue Each condition variable has its own internal queue, in which waiting threads wait to be signaled…
Evaluation of Monitors • A better way to provide competition synchronization than are semaphores • Equally powerful as semaphores: • Semaphores can be used to implement monitors • Monitors can be used to implement semaphores • Support for cooperation synchronization is very similar as with semaphores, so it has the same reliability issues
Distributed Synchronization • Semaphores, locks, condition variables, monitors, are shared-memory constructs, and so only useful on a tightly-coupled multiprocessor. • They are of no use on a distributed multiprocessor • On a distributed multiprocessor, processes can communicate via message-passing -- using send() and receive() primitives. • If the message-passing system has no storage, then the send/receive operations must be synchronized: 2. Receiver (ready) 1. Sender (ready) 3. message (transmitted) • If the message-passing system has storage to buffer the message, then the send() can proceed asynchronously: 3. Receiver (not ready) 1. Sender (ready) 2. message (buffered) The receiver can then retrieve the message when it is ready...
Tasks • In 1980, Ada introduced the task, with 3 characteristics: • its own thread of control; • its own execution state; and • mutually exclusive subprograms (entry procedures) Entry procedures are self-synchronizing subprograms that another task can invoke for task-to-task communication. If task t has an entry procedure p, then another task t2 can execute t.p( argument-list ); In order for p to execute, t must execute: accept p ( parameter-list ); - If t executes acceptp and t2 has not called p, t will automatically wait; - If t2 calls p and t has not accepted p, t2 will automatically wait.
t2’s argument-list is evaluated and passed to t.p’s parameters [suspend] begin … end p; • t2 suspends • t executes the body of p, using its parameter values [resume] • return-values (or out or in out parameters) are passed back to t2 • t continues execution; t2 resumes execution Rendezvous When t and t2 are both ready, p executes: t t2 time t.p (args) accept p(params) This interaction is called a rendezvous between t and t2. It does not depend on shared memory, so t1 and t2 can be on a uniprocessor, a tightly-coupled or a distributed multiprocessor.
Example Problem • How can we rewrite what’s below to complete more quickly? procedure sumArray is N: constant integer := 1000000; type RealArray is array(1..N) of float; anArray: RealArray; function sum(a: RealArray; first, last: integer) return float is result: float := 0.0; begin for i in first..last loop result := result + a(i); end loop; return result; end sum; begin -- code to fill anArray with values omitted put( sum(anArray, 1, N) ); end sumArray;
task type PartialAdder entry SumSlice(Start: in Integer; Stop: in Integer); entry GetSum(Result: out float); end PartialAdder; Divide-And-Conquer via Tasks procedure parallelSumArray is -- declarations of N, RealArray, anArray, Sum() as before … -- continued on next slide… task body ArraySliceAdder is i, j: Integer; Answer: Float; begin accept SumSlice(Start: in Integer; Stop: in Integer) do i:= Start; j:= Stop; -- get ready end SumSlice; Answer := Sum(anArray, i, j); -- do the work accept GetSum(Result: out float) do Result := Answer; -- report outcome end GetSum; end ArraySliceAdder;
Divide-And-Conquer via Tasks (ii) -- continued from previous slide … firstHalfSum, secondHalfSum: Integer; T1, T2 : ArraySliceAdder; -- T1, T2 start & wait on accept begin -- code to fill anArray with values omitted end parallelSumArray; T1.SumSlice(1, N/2); -- start T1 on 1st half T2.SumSlice(N/2 + 1, N); -- start T2 on 2nd half T1.GetSum( firstHalfSum ); -- get 1st half sum from T1 T2.GetSum( secondHalfSum ); -- get 2nd half sum from T2 put( firstHalfSum + secondHalfSum ); -- we’re done! Using two tasks T1 and T2, this parallelSumArray version requires roughly 1/2 the time required by sumArray (on a multiprocessor). Using three tasks, the time will be roughly 1/3 the time of sumArray. …
task consumer; task body consumer is it: Item; begin loop buf.get(it); -- consume Item it end loop; end consumer; Producer-Consumer in Ada procedure ProducerConsumer is buf: Buffer; it: Item; begin -- producer task loop -- produce an Item in it buf.put(it); end loop; end ProducerConsumer; To give the producer and consumer separate threads, we can define the behavior of one in the ‘main’ procedure: and the behavior of the other in a separate task: We can then build a Buffertask with put() and get() as (auto-synchronizing) entry procedures...
The body of the task is a loop that accepts calls to put() and get() in strict alternation. Capacity-1 Buffer task type Buffer is entry get(it: out Item); entry put(it: in Item); end Buffer; task body Buffer is B: Item; begin loop accept put(it: in Item) do B:= it; end put; accept get(it: out Item) do it := B; end get; end loop; end Buffer; • A single-value buffer is easy to build using an Adatask-type: As a task-type, variables of this type (e.g., buf) will automatically have their own thread of execution. This causes myBuffer to alternate between being empty and nonempty.
Ada provides the select-when statement to guard an accept, and perform it if and only if a given condition is true Capacity-N Buffer -- task declaration is as before … task body Buffer is N: constant integer := 1024; package B is new Queue(N, Items); begin loop select when not B.isFull => accept put(it: in Item) do B.append(it); end put; or when not B.isEmpty => accept get(it: out Item) do it := B.first; B.delete; end get; end select; end loop; end Buffer; • An N-value buffer is a bit more work: We can accept any call to get() so long as we are not empty, and any call to put() so long as we are not full.
The Importance of Clusters • Scientific computation is increasingly performed on clusters • Cost-effective: Created from commodity parts • Scientists want more computational power • Cluster computational power is easy to increase by adding processors Cluster size keeps increasing!
Clusters Are Not Perfect • Failure rates are increasing • The number of moving parts is growing (processors, network connections, disks, etc.) • Mean Time Between Failures (MTBF) is shrinking • Anecdotal: every 20 minutes for Google’s cluster • How can we deal with these failures?
Options for Fault-Tolerance • Redundancy in space • Each participating process has a backup process • Expensive! • Redundancy in time • Processes save state and then rollback for recovery • Lighter-weight fault tolerance
Today’s Answer: Redundancy in Time • Programmers place checkpoints • Small checkpoint size • Synchronous • Every process checkpoints in the same place in the code • Global synchronization before and after checkpoints
What’s the Problem? • Future systems will be larger • Checkpointing will hurt program performance • Many processes checkpointing synchronously will result in network and file system contention • Checkpointing to local disk not viable • Application programmers are only willing to pay 1% overhead for fault-tolerance • The solution: • Avoid synchronous checkpoints
0 1 Understanding Staggered Checkpointing Today: More processes, more data, synchronous checkpoints Contention! Not so fast… There is communication! State is inconsistent--- it could not have existed That’s easy! We’ll stagger the checkpoints…. Tomorrow: State is consistent---it could have existed No problem! Send not saved X VALID Recovery line Processes X Recovery line[Randall 75] Receive is saved checkpoint with contention Receive not saved checkpoint … 64K X 2 Send is saved Time
0 [1,0,0] [2,0,0] [3,2,0] [4,5,2] 1 [2,3,2] [2,4,2] [1,1,0] [1,2,0] [2,5,2] 2 [2,0,1] [2,0,2] [2,4,3] Identify All Possible Valid Recovery Lines There are so many! Processes Time
procedure A; begin -- do something resume B; -- do something resume B; -- do something -- … end A; procedure B; begin -- do something resume A; -- do something resume A; -- … end B; Coroutine • A coroutine is two or more procedures that share a single thread of execution, each exercising mutual control over the other:
Summary • Concurrent computations consist of multiple entities. • Processes in Smalltalk • Tasks in Ada • Threads in Java • OS-dependent in C++ • On a shared-memory multiprocessor: • The Semaphore was the first synchronization primitive • Locks and condition variables separated a semaphore’s mutual-exclusion usage from its synchronization usage • Monitors are higher-level, self-synchronizing objects • Java classes have an associated (simplified) monitor • On a distributed system: • Ada tasks provide self-synchronizing entry procedures