1 / 44

Concurrency

Concurrency. Motivations. To capture the logical structure of a problem Servers, graphical applications To exploit extra processors, for speed Ubiquitous multi-core processors To cope with separate physical devices Internet applications. HTC vs HPC. High throughput computing

rosie
Download Presentation

Concurrency

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Concurrency

  2. Motivations • To capture the logical structure of a problem • Servers, graphical applications • To exploit extra processors, for speed • Ubiquitous multi-core processors • To cope with separate physical devices • Internet applications

  3. HTC vs HPC • High throughput computing • Environments that can deliver large amounts of processing capacity over long periods of time • High performance computing • Uses supercomputers and computer clusters to solve advanced computation problems • DACTAL • Condor • Concurrent application

  4. Concurrency • Any system in which two or more tasks may be underway at the same time (at an unpredictable point in their execution) • Parallel: more than one task physically active • Requires multiple processors • Distributed: processors are associated with people or devices that are physically separated from one another in the real world

  5. Levels of Concurrency • Instruction level • Two or more machine instructions • Statement level • Two of more source language statements • Unit level • Two or more subprogram units • Program level • Two or more programs

  6. Fundamental Concepts • A task or process is a program unit that can be in concurrent execution with other program units • Tasks differ from ordinary subprograms in that: • A task may be implicitly started • When a program unit starts the execution of a task, it is not necessarily suspended • When a task’s execution is completed, control may not return to the caller • Tasks usually work together

  7. Task Categories • Heavyweight tasks • Execute in their own address space and have their own run-time stacks • Lightweight tasks • All run in the same address space and use the same run-time stack • A task is disjoint if it does not communicate with or affect the execution of any other task in the program in any way

  8. Synchronization • A mechanism that controls the order in which tasks execute • Cooperation: Task A must wait for task B to complete some specific activity before task A can continue its execution • e.g. the producer-consumer problem • Competition: Two or more tasks must use some resource that cannot be simultaneously used • e.g. a shared counter, dining philosophers • Competition is usually provided by mutually exclusive access

  9. producer1 consumer1 consumer2 producer2 … There are M producers that put items into a fixed-sized buffer. The buffer is shared with N consumers that remove items from the buffer. producerM consumerN The Producer-Consumer Problem There are a number of “classic” synchronization problems, one of which is the producer-consumer problem... • The problem is to devise a solution that synchronizes the producer & consumer accesses to the buffer. Accesses to the buffer must be synchronized, because if multiple producers and/or consumers access it simultaneously, values may get lost, retrieved twice, etc. In the bounded-buffer version, the buffer has some fixed-capacity N.

  10. Five philosophers sit at a table, alternating between eating noodles and thinking. In order to eat, a philosopher must have two chopsticks. However, there is a single chopstick between each pair of plates, so if one is eating, neither neighbor can eat. A philosopher puts down both chopsticks when thinking. philosopher := [ [true] whileTrue: [ self get: left. self get: right. self eat. self release: left. self release: right. self think. ] ] Deadlock! Dining Philosophers • Devise a solution that ensures: • no philosopher starves; and • a hungry philosopher is only prevented from eating by his neighbor(s).

  11. philosopher := [ [true] whileTrue: [ [self have: left and: right] whileFalse: [ self get: left. [right notInUse] ifTrue: [self get: right] ifFalse: [self release: left] ]. self eat. self release: left. self release: right. self think. ] ] Livelock! How about this instead?

  12. Liveness and Deadlock • Liveness is a characteristic that a program unit may or may not have • In sequential code, it means the unit will eventually complete its execution • In a concurrent environment, a task can easily lose its liveness • If all tasks in a concurrent environment lose their liveness, it is called deadlock (or livelock)

  13. Race conditions • When the resulting value of a variable, when two different thread of a program are writing to it, will differ depending on which thread writes to it first • Transient errors, hard to debug • E.g. c = c + 1 • Solution: acquire access to the shared resource before execution can continue • Issues: lockout, starvation load c add 1 store c

  14. Task Execution States • Assuming some mechanism for synchronization (e.g. a scheduler), tasks can be in a variety of states: • New - created but not yet started • Runnable or ready - ready to run but not currently running (no available processor) • Running • Blocked - has been running, but cannot now continue (usually waiting for some event to occur) • Dead - no longer active in any sense

  15. Design Issues • Competition and cooperation synchronization • Controlling task scheduling • How and when tasks start and end execution • Alternatives: • Semaphores • Monitors • Message Passing

  16. Semaphores • Simple mechanism that can be used to provide synchronization of tasks • Devised by EdsgerDijkstra in 1965 for competition synchronization, but can also be used for cooperation synchronization • A data structure consisting of an integer and a queue that stores task descriptors • A task descriptor is a data structure that stores all the relevant information about the execution state of a task

  17. Semaphore operations • Two atomic operations, P and V • Consider a semaphore s: • P (from the Dutch “passeren”) • P(s) – if s > 0 then assign s = s – 1; otherwise block (enqueue) the thread that calls P • Often referred to as “wait” • V (from the Dutch “vrygeren/vrijgeven”) • V(s) – if a thread T is blocked on the s, then wake up T; otherwise assign s = s + 1 • Often referred to as “signal”

  18. wantBothSticks := Semaphore new. philosopher := [ [true] whileTrue: [ [self haveBothSticks] whileFalse: [ wantBothSticks wait. left available and right available ifTrue: [ self get: left. self get: right. ]. wantBothSticks signal. ]. self eat. self release: left. self release: right. self think. ] ] Dining Philosophers

  19. The trouble with semaphores • No way to statically check for the correctness of their use • Leaving out a single wait or signal event can create many different issues • Getting them just right can be tricky • Per Brinch Hansen (1973): • “The semaphore is an elegant synchronization tool for an ideal programmer who never makes mistakes.”

  20. Locks and Condition Variables • A semaphore may be used for either of two purposes: • Mutual exclusion: guarding access to a critical section • Synchronization: making processes suspend/resume • This dual use can lead to confusion: it may be unclear which role a semaphore is playing in a given computation… • For this reason, newer languages may provide distinct constructs for each role: • Locks: guarding access to a critical section • Condition Variables: making processes suspend/resume • Locks provide for mutually-exclusive access to shared memory; condition variables provide for thread/process synchronization.

  21. sharedLock.acquire(); // access sharedObj sharedLock.release(); Lock sharedLock; Object sharedObj; sharedLock.acquire(); // access sharedObj sharedLock.release(); Locks • Like a Semaphore, a lock has two associated operations: • acquire() • try to lock the lock; if it is already locked, suspend execution • release() • unlock the lock; awaken a waiting thread (if any) • These can be used to ‘guard’ a critical section: • A Java class has a hidden lock accessible via the synchronized keyword

  22. Condition Variables • A Condition is a predefined type available in some languages that can be used to declare variables for synchronization. • When a thread needs to suspend execution inside a critical section until some condition is met, a Condition can be used. • There are three operations for a Condition: • wait() • suspend immediately; enter a queue of waiting threads • signal(), aka notify() in Java • awaken a waiting thread (usually the first in the queue), if any • broadcast(), aka notifyAll() in Java • awaken all waiting threads, if any • Java has no Condition class, but every Java class has an anonymous condition-variable that can be manipulated via wait, notify & notifyAll

  23. Monitor motivation • A Java class has a hidden lock accessible via the synchronized keyword • Deadlocks/livelocks/non-mutual-exclusion are easy to produce • Just as control structures were “higher level” than the goto, language designers began looking for higher level ways to synchronize processes • In 1973, Brinch-Hansen and Hoare proposed the monitor, a class whose methods are automatically accessed in a mutually-exclusive manner. • A monitor prevents simultaneous access by multiple threads

  24. Monitors • The idea: encapsulate the shared data and its operations to restrict access • A monitor is an abstract data type for shared data • Shared data is resident in the monitor (rather than in the client units) • All access resident in the monitor • Monitor implementation guarantee synchronized access by allowing only one access at a time • Calls to monitor procedures are implicitly queued if the monitor is busy at the time of the call

  25. put(obj) public (interface) get(obj) … hidden lock entry queue notEmpty … private … notFull myHead myTail mySize N myValues … Monitor Visualization The compiler ‘wraps’ calls to put() and get() as follows: buf.lock.acquire(); … call to put or get buf.lock.release(); If the lock is locked, the thread enters the entry queue Each condition variable has its own internal queue, in which waiting threads wait to be signaled…

  26. Evaluation of Monitors • A better way to provide competition synchronization than are semaphores • Equally powerful as semaphores: • Semaphores can be used to implement monitors • Monitors can be used to implement semaphores • Support for cooperation synchronization is very similar as with semaphores, so it has the same reliability issues

  27. Distributed Synchronization • Semaphores, locks, condition variables, monitors, are shared-memory constructs, and so only useful on a tightly-coupled multiprocessor. • They are of no use on a distributed multiprocessor • On a distributed multiprocessor, processes can communicate via message-passing -- using send() and receive() primitives. • If the message-passing system has no storage, then the send/receive operations must be synchronized: 2. Receiver (ready) 1. Sender (ready) 3. message (transmitted) • If the message-passing system has storage to buffer the message, then the send() can proceed asynchronously: 3. Receiver (not ready) 1. Sender (ready) 2. message (buffered) The receiver can then retrieve the message when it is ready...

  28. Tasks • In 1980, Ada introduced the task, with 3 characteristics: • its own thread of control; • its own execution state; and • mutually exclusive subprograms (entry procedures) Entry procedures are self-synchronizing subprograms that another task can invoke for task-to-task communication. If task t has an entry procedure p, then another task t2 can execute t.p( argument-list ); In order for p to execute, t must execute: accept p ( parameter-list ); - If t executes acceptp and t2 has not called p, t will automatically wait; - If t2 calls p and t has not accepted p, t2 will automatically wait.

  29. t2’s argument-list is evaluated and passed to t.p’s parameters [suspend] begin … end p; • t2 suspends • t executes the body of p, using its parameter values [resume] • return-values (or out or in out parameters) are passed back to t2 • t continues execution; t2 resumes execution Rendezvous When t and t2 are both ready, p executes: t t2 time t.p (args) accept p(params) This interaction is called a rendezvous between t and t2. It does not depend on shared memory, so t1 and t2 can be on a uniprocessor, a tightly-coupled or a distributed multiprocessor.

  30. Example Problem • How can we rewrite what’s below to complete more quickly? procedure sumArray is N: constant integer := 1000000; type RealArray is array(1..N) of float; anArray: RealArray; function sum(a: RealArray; first, last: integer) return float is result: float := 0.0; begin for i in first..last loop result := result + a(i); end loop; return result; end sum; begin -- code to fill anArray with values omitted put( sum(anArray, 1, N) ); end sumArray;

  31. task type PartialAdder entry SumSlice(Start: in Integer; Stop: in Integer); entry GetSum(Result: out float); end PartialAdder; Divide-And-Conquer via Tasks procedure parallelSumArray is -- declarations of N, RealArray, anArray, Sum() as before … -- continued on next slide… task body ArraySliceAdder is i, j: Integer; Answer: Float; begin accept SumSlice(Start: in Integer; Stop: in Integer) do i:= Start; j:= Stop; -- get ready end SumSlice; Answer := Sum(anArray, i, j); -- do the work accept GetSum(Result: out float) do Result := Answer; -- report outcome end GetSum; end ArraySliceAdder;

  32. Divide-And-Conquer via Tasks (ii) -- continued from previous slide … firstHalfSum, secondHalfSum: Integer; T1, T2 : ArraySliceAdder; -- T1, T2 start & wait on accept begin -- code to fill anArray with values omitted end parallelSumArray; T1.SumSlice(1, N/2); -- start T1 on 1st half T2.SumSlice(N/2 + 1, N); -- start T2 on 2nd half T1.GetSum( firstHalfSum ); -- get 1st half sum from T1 T2.GetSum( secondHalfSum ); -- get 2nd half sum from T2 put( firstHalfSum + secondHalfSum ); -- we’re done! Using two tasks T1 and T2, this parallelSumArray version requires roughly 1/2 the time required by sumArray (on a multiprocessor). Using three tasks, the time will be roughly 1/3 the time of sumArray. …

  33. task consumer; task body consumer is it: Item; begin loop buf.get(it); -- consume Item it end loop; end consumer; Producer-Consumer in Ada procedure ProducerConsumer is buf: Buffer; it: Item; begin -- producer task loop -- produce an Item in it buf.put(it); end loop; end ProducerConsumer; To give the producer and consumer separate threads, we can define the behavior of one in the ‘main’ procedure: and the behavior of the other in a separate task: We can then build a Buffertask with put() and get() as (auto-synchronizing) entry procedures...

  34. The body of the task is a loop that accepts calls to put() and get() in strict alternation. Capacity-1 Buffer task type Buffer is entry get(it: out Item); entry put(it: in Item); end Buffer; task body Buffer is B: Item; begin loop accept put(it: in Item) do B:= it; end put; accept get(it: out Item) do it := B; end get; end loop; end Buffer; • A single-value buffer is easy to build using an Adatask-type: As a task-type, variables of this type (e.g., buf) will automatically have their own thread of execution. This causes myBuffer to alternate between being empty and nonempty.

  35. Ada provides the select-when statement to guard an accept, and perform it if and only if a given condition is true Capacity-N Buffer -- task declaration is as before … task body Buffer is N: constant integer := 1024; package B is new Queue(N, Items); begin loop select when not B.isFull => accept put(it: in Item) do B.append(it); end put; or when not B.isEmpty => accept get(it: out Item) do it := B.first; B.delete; end get; end select; end loop; end Buffer; • An N-value buffer is a bit more work: We can accept any call to get() so long as we are not empty, and any call to put() so long as we are not full.

  36. The Importance of Clusters • Scientific computation is increasingly performed on clusters • Cost-effective: Created from commodity parts • Scientists want more computational power • Cluster computational power is easy to increase by adding processors  Cluster size keeps increasing!

  37. Clusters Are Not Perfect • Failure rates are increasing • The number of moving parts is growing (processors, network connections, disks, etc.) • Mean Time Between Failures (MTBF) is shrinking • Anecdotal: every 20 minutes for Google’s cluster • How can we deal with these failures?

  38. Options for Fault-Tolerance • Redundancy in space • Each participating process has a backup process • Expensive! • Redundancy in time • Processes save state and then rollback for recovery • Lighter-weight fault tolerance

  39. Today’s Answer: Redundancy in Time • Programmers place checkpoints • Small checkpoint size • Synchronous • Every process checkpoints in the same place in the code • Global synchronization before and after checkpoints

  40. What’s the Problem? • Future systems will be larger • Checkpointing will hurt program performance • Many processes checkpointing synchronously will result in network and file system contention • Checkpointing to local disk not viable • Application programmers are only willing to pay 1% overhead for fault-tolerance • The solution: • Avoid synchronous checkpoints

  41. 0 1 Understanding Staggered Checkpointing Today: More processes, more data, synchronous checkpoints Contention! Not so fast… There is communication! State is inconsistent--- it could not have existed That’s easy! We’ll stagger the checkpoints…. Tomorrow: State is consistent---it could have existed No problem! Send not saved X VALID Recovery line Processes X Recovery line[Randall 75] Receive is saved checkpoint with contention Receive not saved checkpoint … 64K X 2 Send is saved Time

  42. 0 [1,0,0] [2,0,0] [3,2,0] [4,5,2] 1 [2,3,2] [2,4,2] [1,1,0] [1,2,0] [2,5,2] 2 [2,0,1] [2,0,2] [2,4,3] Identify All Possible Valid Recovery Lines There are so many! Processes Time

  43. procedure A; begin -- do something resume B; -- do something resume B; -- do something -- … end A; procedure B; begin -- do something resume A; -- do something resume A; -- … end B; Coroutine • A coroutine is two or more procedures that share a single thread of execution, each exercising mutual control over the other:

  44. Summary • Concurrent computations consist of multiple entities. • Processes in Smalltalk • Tasks in Ada • Threads in Java • OS-dependent in C++ • On a shared-memory multiprocessor: • The Semaphore was the first synchronization primitive • Locks and condition variables separated a semaphore’s mutual-exclusion usage from its synchronization usage • Monitors are higher-level, self-synchronizing objects • Java classes have an associated (simplified) monitor • On a distributed system: • Ada tasks provide self-synchronizing entry procedures

More Related