340 likes | 483 Views
Learn about deadlock in distributed systems, causes, resource allocation graphs, handling approaches, prevention, avoidance, detection, recovery, and current research. Understand conditions for deadlocks and methods for handling them effectively.
E N D
DEADLOCKS IN DISTRIBUTED SYSTEMS Radhika Pasumarthi
Outline: • Definition • Fundamental causes of deadlocks • Resource allocation graphs and wait for Graph • Approaches to handling deadlocks • Deadlock prevention • Deadlock avoidance • Deadlock detection and recovery • Current Research • References
What is Deadlock? • A set of processes is in a deadlock state if every process in the set is waiting for an event (release) that can only be caused by some other process in the same set. • Two common places where deadlocks may occur are with processes in an operating system (distributed or centralized) and with transactions in a database. • The resources may be either physical or logical. • Physical resources: Printers, Tape Drivers, Memory space, CPU cycles. • Logical resources: Files, Semaphores, Monitors. • The simplest example is process A is waiting for a resource held by process B and process B is waiting for a resource held by process A. Several processes can be involved in a deadlock when there exists a cycle of processes waiting for each other. process A waits for B which waits for C which waits for A.
Conditions for a deadlock to occur • Mutual Exclusion: At least one of the resources is non-sharable (i.e only a limited number of processes can use it at a time and it is requested by a process while it is being used by another one, the requesting process has to wait until the resource is released). • Hold and Wait: There must be at least one process that is holding at least one resource and waiting for other resources that are being hold by other processes. • No preemption: No resource can be preempted before the holding process completes its task with that resource. If a process that is holding some resources requests another resource and that resource cannot be allocated it, then it must release all resources that are currently allocated to it. 4. Circular Wait: There exists a set of processes (p1,p2….pn) such that P1 is waiting for a resource held by p2 P2 is waiting for a resource held by p3 ………. Pn-1 is waiting for a resource held by pn. Pn is waiting for a resource held by p1.
One protocol to ensure that the circular wait condition never holds is “impose a linear ordering of all resources types .“ then, each process can only request resources in an increasing order of priority. For eg. Set priorities r1=1,r2=2,r3=3 and r4=4 .with these priorities, if process P wants to use r1 and r3,it should first request r1 ,then r3.
As an example, consider the traffic deadlock in the following figure
Example (Contd..) • Consider each section of the street as a resource. • Mutual exclusion condition applies, since only one vehicle can be on a section of the street at a time. • Hold-and-wait condition applies, since each vehicle is occupying a section of the street, and waiting to move on to the next section of the street. • No-preemptive condition applies, since a section of the street that is a section of the street that is occupied by a vehicle cannot be taken away from it. • Circular wait condition applies, since each vehicle is waiting on the next vehicle to move. That is, each vehicle in the traffic is waiting for a section of street held by the next vehicle in the traffic. • The simple rule to avoid traffic deadlock is that a vehicle should only enter an intersection if it is assured that it will not have to stop inside the intersection.
Resource Allocation Graphs • Resource allocation graphs are drawn in order to see the allocation relations of processes and resources easily. In these graphs, processes are represented by circles and resources are represented by boxes. Resource boxes have some number of dots inside indicating available number of that resource, that is number of instances. • If the resource allocation graph contains no cycles then there is no deadlock in the system at that instance. • If the resource allocation graph contains a cycle then a deadlock may exist. • If there is a cycle, and the cycle involves only resources which have a single instance, then a deadlock has occurred.
Resource allocation Graphs (contd..) process [reusable] resources with multiplicity 2 : Request edge from process Pi to Rj Pi Rj Assignment edge from resource Rj to process Pi Pi Rj R1 P1 P1 holds 2 copies of resource R1, and P2 holds one copy of resource R1 and request one copy of resource R2 R2 P2
There are three cycles, so a deadlock may exists. Actually p1, p2 and p3 are deadlocked There is a cycle, however there is no deadlock. If p4 releases r2, r2 may be allocated to p3, which breaks the cycle.
Resource Allocation Graph & wait for Graph • If all resources have only a single instance then we can define a deadlock detection algorithm that uses a variant of resource-allocation graph, called a wait for graph. P5 P5 R1 R4 R3 P2 P3 P2 P3 P1 P1 P4 P4 R5 R2 Resource allocation graph Corresponding wait for Graph
Methods for handling Deadlocks • The Ostrich algorithm • Deadlock prevention • Deadlock detection and recovery. • Deadlock avoidance
Methods for handling deadlocks (contd..) • Deadlock detection • The basic approach is similar to the deadlock detection in single processor systems • Basically analyze whether there’s a cycle in the wait for graph • In distributed systems, the wait for graph is not on any single processor • Need different algorithms to simulate the analysis • Deadlock avoidance • Impractical in single processor systems • Impractical in distributed systems • Deadlock prevention • Linear ordering • For static resources • Resources to be accessed are known in advance • Ordering accesses by timestamps • otherwise
Deadlock detection – Centralized Algorithm • A central site is responsible of collecting all the information to build the global WFG • every node report to the central site • When a process waits to enter a CS • When a process enters a CS • When a process releases a CS
Chandy-misra-Haas algorithm(1983) • Determine the dependency among processes • If Pi is waiting for a resource which is occupied by Pj, then Pi is said to depend on Pj • If Pi depends on Pj and they are in the same home node,then Pi locally depends on Pj, otherwise Pi remotely depends on Pj • If Pi locally depends on Pj and Pj locally depends on Pk, then Pi locally depends on Pk 2.If Pi locally depends on itself, then declare deadlock 3.If Pi locally depends on Pj and Pj remotely depends on Pk, then send probe(i,j,k) to the home node of Pk
Upon receiving probe(i,j,k) • If Pk does not depend on any process then do nothing • else if k = i then declare deadlock • else if Dk(i)= true (has processed this before) • Possibly Pi depends on Pk on two different paths • In this case do nothing • else if Pk locally depends on Pm and Pm remotely depends on Pn then send probe(i,m,n) to home node on Pn
Path Pushing Algorithm • Each node performs local analysis of the wait for graph, if deadlocks detected, then break them locally. • If the dependency path involves external nodes then • Send the wait for graph to the external nodes • The wait for graph will be propagated and explained • Deadlock can be detected if there is one • Expansion diminishes if encounters nodes with no dependency to other nodes
Diffusion computation based algorithm • Based on dependency model: or dependency • Node X is waiting for node Y or node Z • A node is blocked only if all its blocking paths are blocked • Examples: • X requested R1 or R2, R1 is held by Y ,R2 is held by Z • X has two threads T1 and T2, T1 requested resource R1, T2 requested resource R2,and X can continue as long as T1 or T2 can continue. Y X Z
Algorithm (contd…) • Variables • DSx=the set of processes Px has or-dependency on • numx: the no of reply messages px received, initialized 0 • waitx: true whenever px is fully blocked, = false otherwise • recvx: whether px received the query already, initialized false • senderx: the sender that sent a query to px without a reply yet(only one) • Algorithm • Similar to edge chasing algorithm • Send query along all wait-for edges • We assume that there is only one query going on, for general case, we just need to use more variables to distinguish different queries • In this algorithm, we also need to send reply • Any node, if received replies from all edges during the computation, is involved in a deadlock situation
Algorithm (contd…) • A blocked process Pi initiates a diffusion computation If waiti then send query(i,i,j) to all Pj DSi; • When Pk receives the message query(i,j,k) from Pj if waitk and not recvk then { send query(i,k,m) to all pm DSk; senderk := Pj;} if waitk and recvk then(two possible cases) send reply(i,k,j) to Pj ; 2 paths cyclic Y Y W X W X z Z In this case, sending a reply back along the current path will still leave the original path open In this case, of course need to send reply back
Diffusion computation based algorithm(contd..) • When Pk receives the message reply(i,j,k) from Pj if wait then numk=numk +1 if numk= DSk then { if (i≠ k) then declare a deadlock; else send message reply(i,k,senderk) to senderk; } • Probe from Y arrives at W first • Probe from Z arrives at W second • W sends reply to Z • W is not blocked • Z is not blocked till W is blocked • Z should not declare deadlock even though it receives the reply from its only wait Y W X z
Deadlock Prevention • Deadlock Prevention is to use resources in such a way that we cannot get into deadlocks. In real life we may decide that left turns are too dangerous, so we only do right turns. It takes longer to get there but it works. In terms of deadlocks, we may constrain our use of resources so that we do not have to worry about deadlocks. Here we explore this idea with two examples.
Linear Ordering of Resources acquire(A); acquire(B); acquire(C); use C use A and C use A, B, C release(A); release(B); acquire(E); use C and E release(C); release(E); acquire(D); use D release(D);
Hierarchical Ordering of Resources • Another strategy we may use in the case that resources are hierarchically structured is to lock them in hierarchical order. We assume that the resources are organized in a tree (or a forest) representing containment. We can lock any node or group of nodes in the tree. The resources we are interested in are nodes in the tree, usually leaves. Then the following rule will guarantee avoidance of deadlocks. • HO: The nodes currently locked by a process must lay [simultaneously and at all times until the desired resources are acquired] on all paths from the root to the desired resources. • Here is an example of use of this rule, locking a single resource at a time.
Hierarchical Ordering of Resources • Then if a process wants to use the resources e, f, i, k it uses in sequence the commands • lock(a); lock(b); lock(h); unlock(a); lock(d); unlock(b); lock(i); lock(j); unlock(h); lock(k); unlock(j); lock(e); lock(f); unlock(d);
Wait die Vs Wound-wait WAIT-DIE POLICY WOUND-WAIT POLICY Wants resource Holds resource Wants resource Young process Old process Older process Young process Preempt It will wait Holds resource Wants resource Holds resource Wants resource Young process Old process Old process Young process It Dies It will wait
Wait-die • As we have pointed out before, killing a transaction is relatively harmless, since by definition it can be restarted safely later. • Wait-die: • If an old process wants a resource held by a young process, the old one will wait. • If a young process wants a resource held by an old process, the young process will be killed. • Observation: The young process, after being killed, will then start up again, and be killed again. This cycle may go on many times before the old one release the resource.
Wound-wait • When a conflict arises, instead of killing the process making the request, we can kill the resource owner. Without transactions, killing a process might have severe consequences. With transactions, these effects will vanish magically when the transaction dies. • Wound-wait: (we allow preemption & ancestor worship) • Once we are assuming the existence of transactions, we can do something that had previously been forbidden: take resources away from running processes. • If an old process wants a resource held by a young process, the old one will preempt the young process -- wounded and killed, restarts and wait. • If a young process wants a resource held by an old process, the young process will wait.
Current/future research • Tetsuya Maruta, Sen’ichi IOnoda, Yoshitomo Ikkai, Takashi Kobayashi, Norihisa Komoda(2000) - A Deadlock Detection ,Algorithm for Business Processes Workflow Models. • Pattern based deadlock detection • Soojung Lee(2001) - Efficient Generalized Deadlock Detection and Resolution in Distributed Systems • Uses spanning tree of probes and replies from a starting node to find deadlocks. • Jonghun Park (2004) developed an “Order-based Deadlock Prevention Protocol with Parallel Requests” • High performance protocol that is looking very promising for web services
References • Prentice Hall, 1995 A. Tanenbaum, “Distributed Operating Systems” • http://ieeexplore.ieee.org. • McGraw-Hill, (January 1, 1994) Mukesh Singhal, “Advanced Concepts In Operating Systems “,