Deadlock and Livelock

Deadlock and Livelock Z. Lu / A. Jantsch / I. Sander Chapter 14

Overview Interconnect Network Introduction Topology (Regular, irregular) Deadlock, Livelock Router Architecture (pipelined, classic) Routing (algorithms, mechanics) Network Interface (message passing, shared memory) Flow Control (Circuit, packet switching (SAF, wormhole, virtual channel) Performance Analysis and QoS Implementation Evaluation Summary Concepts SoC Architecture

Deadlock • How to define it? • Definitions • Examples in circuit switching network, packet-buffer and flit-buffer flow control networks. • How to illustrate it? • Graph representation for resource dependency and deadlock configuration • How to solve it? SoC Architecture

Deadlock • Deadlock can occur in an interconnection network, when a group of agents (usually packets) cannot make progress, because they are waiting on each other to release resource (buffers, channels) • If a sequence of waiting agents form a cycle the network is deadlocked SoC Architecture

A Deadlock Example • Connection A holds channels u and v and wants to acquire channel w • Connection B holds channels w and x and wants to acquire channel u • Since neither connection A nor connection B will release their channels there is a deadlock in the network How would you solve this deadlock problem? Circuit Switched Network SoC Architecture

Deadlock • Deadlock paralyzes the network, which can have catastrophic consequences • Two possible solutions • Avoid deadlocks • Recover from deadlocks • Almost all networks today use deadlock avoidance SoC Architecture

Agents and Resources • Depending on the type of connections different agents and resources are involved Is deadlock a problem for other bufferless flow control schemes? SoC Architecture

Wait-For and Hold-Relations • Agents and resources are related by wait-forand hold relations • Agent A • Holds resources u, v • Waits for resource w • Agent B • Holds resources w, x • Waits for resource w SoC Architecture

Wait-For and Hold-Relations • If an agent holds a resource than the resource can be viewed as “waiting” for the agent to release it • Thus each hold relation induces a wait-for relation in the opposite direction holds A u waits for A u SoC Architecture

Wait-For and Hold-Relations • Replacing the holds with wait-for in the opposite direction, the lower figure is generated. • The arrows in the figure reveal acycle that shows that the configuration is deadlocked. SoC Architecture

Cycle in the wait-for graph means deadlock • Deadlock occurs, if • Agents hold and do not release a resource, while waiting for another resource • A cycle exists between waiting agents, such that there exists a set of agents A0, A1, …, An-1, where the agent Ai holds resource Ri while waiting for resource R(i+1 mod n) for i = 0, 1, …, n-1 Condition 1 is not sufficient. Give an example. Conditions 1 and 2 are sufficient. Give examples. SoC Architecture

Resource Dependences • Whenever it is possible for an agent holding Ri to wait on Ri+1, we say that a resource dependence exists from Ri to Ri+1, and denote it as Ri ≻Ri+1 SoC Architecture

Resource Dependences • A cycle in the resource dependence graph indicates that it is possible for a deadlock to occur. • A cycle in this graph is a necessary, but not sufficient condition for deadlock. u ≻ ≻ x v ≻ ≻ w Resource Dependence Graph SoC Architecture

Deadlock with packet-buffer flow control • Again the four node network is taken as example, but this time packet-buffer flow controlwith a single packet buffer per node is used • Agents are packets • Shared resources are buffers • A packet holds a buffer i while acquiring the next buffer (i+1) mod 4 • The resource dependence graph says that deadlock might occur. B0 B1 B2 B3 Resource Dependence Graph Why this time channels are not used as resources for deadlock analysis? SoC Architecture

Deadlock with packet-buffer flow control Wait-for Graph • The upper configuration is deadlocked • The lower one is not! Packet 3 can acquire buffer 0 P0 P1 P2 P3 B0 B1 B2 B3 Deadlock Configuration! P0 P1 P2 P3 B0 B1 B2 B3 Configuration not deadlocked SoC Architecture

Deadlock with flit-buffer flow control • Again the four node network is taken as example, but this time flit-buffer flow controlwith two virtual channels per physical channel is used • Agents are packets • Shared resources are virtual channels • The resource dependence graph says that deadlock might occur How is the resource dependency graph obtained? Resource Dependence Graph Can we use flits rather than packets as agents to do deadlock analysis? SoC Architecture

Deadlock with flit-buffer flow control • The example shows a deadlocked configuration • Packet P0 holds virtual channel u0 and v0 and tries to acquire w0 • Packet P1 holds virtual channel w0 and x0 and tries to acquire u0 • Though there are free virtual channel resources, there is a cycle formed in the network. SoC Architecture

Deadlock Avoidance • Deadlock can be avoided by eliminating cycles in the resource dependence graph. • This can be done by imposing a partial order on the resources and then insisting that agents take these resources in ascending order • Then there is no possibility for a cycle, since in any cycle at least one agent that holds a higher-number resource must wait on a lower-numbered resource, but this is not allowed by the ordering relation. SoC Architecture

Deadlock Avoidance • What network design aspects may contribute to the deadlock problem? • Topology • Flow control • Routing • Solutions • General (topology-independent) solutions • Topology-specific solutions SoC Architecture

Deadlock Avoidance Techniques • Resource allocation • Distance classes • Dateline classes • Restrict physical routes • Dimension-order routing • The turn model for k-ary-n-mesh networks • Topology-dependent SoC Architecture

Distance Classes • Resources are grouped into numbered classes and • Restrict allocation of resources so that they can be only acquired in ascending order SoC Architecture

Distance Classes Example • A packet at distance i from its source node needs to allocate a resource from class i • At the source, we inject packets into resource class 0. • Uphill-only resource allocation rule: At each hop, the packet acquires a resource of the next higher class. Each node contains 5 buffer classes from bottom to up. As packets A and B progress, their buffer classes increase. SoC Architecture

Distance Classes Example • Using distance classes the resource dependence graph can look like this • There are no cycles => deadlock cannot occur! How many classes do we need per node? A four-node ring network using buffer classes based on distance. Each node i has 4 classes, with buffer Bji handling traffic at node i that has taken j hops toward its destination. SoC Architecture

Distance Classes • Distance classes provide a very general way to order resources in any topology. • Distance Classes are very costly, since they require a number of buffers (or virtual channels) proportional to the diameter of the network • However, for some topologies the cost can be reduced significantly because of specific topology properties SoC Architecture

Dateline Classes • For a ring, the number of needed classes can be much reduced. • Each node has only two buffers • Class “0” buffer: B0i • Class “1” buffer: B1i • Packets enter the ring in node B0i • When they cross the dateline, they are placed into buffer B1i until they reach their destination • Result is an acyclic graph => Deadlock cannot occur! Node 3 also sends packets to node 0. Does this result in deadlock? Why is the dateline class specific for ring? Does it work for a 4- node array (ring without wrap-around connections)? SoC Architecture

Restricted Physical Routes • Dividing the network into different classes allows to create a deadlock free network, but can be very costly due to the large number of resources needed. • An alternative is to restrict the routing function with the objective to generate a resource dependence graph that is acyclic. SoC Architecture

Dimension Order Routing Will a different numbering order change the correctness? What if numbering left, right, down and up? • Dimension Order Routing guarantees deadlock-freedom in k-ary n-meshes • Within the first dimension (here x) a packet traveling in +x/-x direction can only wait on a channel on the +x/-x, +y, and -y direction • In the second dimension a packet traveling on the +y/-y direction can only wait on a channel on the +y/-y direction • These relations can be used to number the channels, so that every packet follows increasingly numbered channels Enumeration of a 3x3 mesh in dimension-order routing. Right-going channels are numbered first, then left, up and down. SoC Architecture

The Turn Model • A more general model for Mesh-networks is the “Turn Model” • The eight possible turns in a 2D-Mesh can be combined to create 2 abstract cycles. • In order to avoid deadlock, at least one turn must be removed for each cycle. Counterclockwise (left turns) Clockwise (right turns) SoC Architecture

Deadlock Situation • Four packets travelling in different directions try to trun left and wind up in a cirular wait. • If any one of the packets had not turned, deadlock would have been avoided. • The algorithm should not prohibit more turns than necessary. Otherwise, its adaptiveness would be reduced. Source: “The turn model for Adaptive Routing” by J. Glass and M. NI SoC Architecture

Dimension Order Routing • Only the following turns are allowed (x-y routing) in dimension order routing (x-direction is routed first) Counterclockwise (left turns) Clockwise (right turns) SoC Architecture

The Turn Model • Assume we remove the N-W (+y,-x) turn in the counter-clockwise graph • In the clockwise direction either the S-W (-y,-x), N-E(+y, +x) or E-S (+x, -y) turn can be eliminated in order to yield a deadlock-free network, resulting in 3 deadlock-free algorithms. Counterclockwise North (+y) West (-x) East (+x) South (-y) + Clockwise West-First North-Last Negative-Last S-W (-y,-x) N-E(+y, +x) E-S (+x, -y) SoC Architecture

The Three Deadlock-free Algorithms • West-first: Because all turns to the west are prohibited. A packet must start out in that direction in order to travel west. • North-last: Because all turns when travelling north are prohibited, a packet should only travel north when that is the last direction it needs to travel. • Negative-first: Because all turns from a positive direction to a negative direction are prohibited, a packet must start out in a negative direction in order to travel in a negative direction. Counterclockwise North (+y) West (-x) East (+x) South (-y) + Clockwise West-First North-Last Negative-Last S-W (-y,-x) N-E(+y, +x) E-S (+x, -y) SoC Architecture

The Turn Model West First 1 • In the West(-x)-First model a packet has first to make all its west hops (1) • Packets shall be routed up (clockwise) or down (counterclockwise) (2) • Packet shall then be routed to the east (+x) (3) • Packets shall be routed down (clockwise) or up (counterclockwise) if needed (4) • This scheme shall be reflected by the numbering! 3 4 2 2 4 3 1 Channel ordering induced by the west-first turn model. SoC Architecture

The Turn ModelNot all turns are equal • Assume we remove the N-W (+y,-x) turn in the counter-clockwise graph • Eleminating turn W-N (-x, +y) is not deadlock free. Counterclockwise North (+y) West (-x) East (+x) South (-y) + Clockwise Why this combination is not working? West-First North-Last Negative-Last Disallowed S-W (-y,-x) N-E(+y, +x) E-S (+x, -y) W-N (-x, +y) SoC Architecture

Deadlock withAdaptive Routing • Adaptive routing networks may be deadlock free even in the presence of dependence cycles • There must be a non-zero probability for a packet to escape a dependence cycle • Deadlock free networks can be achieved more efficiently because fewer restrictions (on resource usage, on routes) are required SoC Architecture

Deadlock Recovery • Deadlock recovery needs less resources than deadlock avoidance, if it occurs rarely • A deadlock must be first detected • A cycle in the “wait-for” graph indicates a deadlock • Detection is often done by means of timeout counters • … and then the deadlock must be resolved • Either packets or connections are removed from the network • Packets that are deadlocked can enter an “escape buffer” that is used to resolve the deadlock • Worst case timing is often unbounded SoC Architecture

Livelock • In livelock packets continue to move through the network, but do not make progress to their destination • Livelock occurs if packets are allowed to take non-minimal routes through a network • It can be avoided by limiting the number of times a packet can be misrouted • It occurs in dropping flow control, if a packet always gets dropped even after re-enter. SoC Architecture

22 20 02 13 12 10 33 32 01 11 21 31 30 Fully Adaptive RoutingLive-Lock • Fully-Adaptive Routing may result in live-lock! • Mechanisms must be added to prevent livelock • Misrouting may only be allowed a fixed number of times 03 23 00 Can a minimal adaptive routing result in livelock? SoC Architecture

Node table routingAvoid livelock North Livelock may be a problem ! West East For example, a packet passing through node 00 destined for node 11 South Is livelock a problem also for source routing? SoC Architecture

Livelock • There are two main techniques to avoid livelock: • Deterministic Avoidance • A state is added to a packet to ensure its progress • Misroute count or age of packet • Packet with higher age or misroute count wins arbitration • Probabilistic Avoidance • If it can be guaranteed that the probability of packet delivery approaches one for infinite time, there is a guarantee to avoid livelock • Network can be considered livelock free, if there is a non-zero probability of a packet moving towards its destination (and the sum of these probabilities must approach one for infinite time) What problem does this approach have? SoC Architecture

Summary • Deadlock means no agent can move forward due to cyclic dependence on resources • Cycles in the resource dependency graph means deadlock is possible • Cycles in the wait-for graph means deadlock • Deadlocks can be avoided • by ordering resources: distance and dateline classes • by restricting routes, e.g. the turn model • Deadlock detection with timouts • Livelock in adaptive, non-minimal routing SoC Architecture

Deadlock and Livelock