620 likes | 738 Views
A Distributed Algorithm for Minimum Weight Spanning Trees By Gallager , Humblet,Spira (GHS). Distributed Algorithms 2014 Igor Zarivach. Agenda. Introduction Review of spanning trees Description of GHS algorithm Algorithm execution on ring topology Complexity analysis.
E N D
A Distributed Algorithm for Minimum Weight Spanning Trees By Gallager, Humblet,Spira (GHS) Distributed Algorithms 2014 Igor Zarivach
Agenda • Introduction • Review of spanning trees • Description of GHS algorithm • Algorithm execution on ring topology • Complexity analysis
Dijkstra prize in 2004 • An elegant and efficient distributed algorithm for finding a minimum spanning tree in an asynchronous network. • The problem is important, both theoretically and practically • Major algorithmic breakthrough on many fronts: • It solved the fundamental problem of symmetry breaking (or leader election) in the setting of a general graph • the algorithm has a surprisingly low message complexity for this important problem. • Techniques for multicasting, and for query and reply. • Beauty and elegance of the algorithm and its presentation. • An exceptional degree of asynchrony among the nodes. • Its structure is very intuitive and is easy to comprehend. • The algorithm is sufficiently complicated and interesting and is a challenge problem for formal verification methods. • Finding a proof is still very much an open problem in protocol verification and formal methods. • In summary, this paper is a genuine milestone in the area of asynchronous network algorithms; it has changed this field completely, in terms of both algorithmics and analysis techniques.
Problem Statement • Given: The input graph G(V,E) is a connected undirected graph with N nodes, and E edges with distinct finite weight. • Need to find asynchronous distributed algorithm which determines the minimum spanning tree (MST) of the graph.
Minimum (Weight) Spanning Tree 3 3 4 4 2 2 5 5 1 1 6 7 8 11 9 9 12 12 16 13 10 10 14 15
Applications • Efficient broadcasting in networks • Establishing connectivity after nodes failure • Leader election
Model • Communication • Asynchronous communication • Message passing • Messages can pass on an edge in both directions concurrently • Computation • processors represented by nodes • Assumption: Distinct weights on edges (will see why) • A processor knows a weight of edges connected to him • A processor knows its unique ID • One or more nodes can start the algorithm • Failures • Messages arrive in-order with no errors • No processor faults
Definitions Fragment 2 root branch • Fragment: a subtree of MST • Branch: edge in MST, edge in fragment • Outgoing edge: edge between different fragments • Fragment’s MWOE: Minimum weight outgoing edge MWOE Fragment 1 Fragment 3 outgoing edge
Two properties of MSTs Fragment F Fragment F’ • Property 1: Given a fragment F of a MST, let e be a minimum-weight outgoing edge of F, then joining e and its adjacent non-fragment node to Fyields another fragment F’of an MST. • Property 2: If all the edges of a connected graph have different weights, then the MST of the graph is unique. branch e MWOE e
Algorithm GHS High Level Fragment F’ • Each fragment finds its MWOE asynchronously • When MWOE is found, the fragment attempts to combine with the fragment on the other end • We will show how and when to combine the fragments so the algorithm is correct and has good message complexity Fragment F
Distinct weight edges Will the algorithm work for equal weight edges? • If edges are not distinct, but nodes have distinct identities , then Let , • We get distinct weight edges by , ties broken by s • If both edges and nodes are not distinct, there is no distributed algorithm to find MST • Any two edges are MST, but no way to break the symmetry 5 5 5 11
Design - Fragment • Each fragment behaves asynchronously and independently • Initially, every fragment consists of a single node • Upon termination, there will be only one fragment • Each fragment will have a leader, which initiates fragment operations • Leader starts operation by broadcast • Every node replies to the leader by convergecast • The spanning tree is used for communication • When two fragments are merged, spanning tree is updated Fragment F’
Design - Node • Node has a pointer to the next node in the path to the leader (father) • Node knows to which fragment it belongs Fragment F’
Fragment F’ Design - Union Fragment F • Fragment finds its MWOE , • merges into neighbor fragment • becomes a subtree of the bigger tree • becomes a new root of • Nodes of update their father accordingly • sets its father to Fragment G
Fragment F’ Problem 1 - Cycle Fragment F • and might merge concurrently over common MWOE • We get a cycle of length two Solution Fragment G • Both and become leaders of G • If we need one leader, can break symmetry by unique IDs
Problem 2 – Unbalanced fragments • Choosing MWOE and updating father pointers is message complexity • Worst case: • Size of is nodes • Mergewith other fragments of size 1 • We get message complexity, but can get if sizes are equal Solution • Merge only smaller fragment to larger fragment • Update father pointers of smaller fragment • We need to estimate the size of the fragment!
Fragment size estimation (Level) • It is hard to estimate the size of distributed tree • Use Level as the estimation for a tree size of at least nodes • Each fragment has a Level • Level 0 – only one node • Level k > 0 – at least nodes • Lemma: If Fragment F Level is then F has at least nodes • We want to guarantee the Lemma for all fragments • Level doesn’t represent the size correctly, Level L can have much more than nodes!!!
Design - Union • The algorithm will guarantee that every fragment MWOE leads to such that • Level () Level () • Otherwise, if Level () Level () • will wait for to grow in Level • Waiting can lead to deadlocks!! • Smaller fragments never wait for larger,they are immediately absorbed into the larger neighbor F Level 3 F can’t find MWOE, and waits F’ Level 2
Design - Union • Define the “core” edge of the fragment’s tree: • “The edge along which the most recent Merge occurred.” • Lemma: changes once per Level • Fragment is Level 1 • Node 1 and 2 merged on common MWOE • Node 3 then absorbed • Fragment is Level 2 • Fragments and merged on common MWOE • Node 4 then absorbed
Fragment Names and Leaders • We need to distinguish between fragments • Levels are not unique • Use for fragment identification • Fragment name: () • Leaders: two nodes adjacent to the
Example Connect Connect Connect 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 1 1 1 3 3 3 4 4 4 2 2 2 Connect Connect Level 1 Level 1 Level 1 Level 1 Connect Test Test
Example 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 1 1 1 1 3 3 3 3 4 4 4 4 2 2 2 2 Level 1 Level 1 Level 2 Level 1 Level 1 Level 1 Level 1 Connect Test Test Reject Accept Connect Connect
Specific messages: • Initiate: Broadcast from leader to find MWOE; contains fragment identity. • Report: Convergecast MWOE responses back to leader. • Test: Asks whether an edge is outgoing. • Accept/Reject: Answers to test. • Change-core: Sent from leader to endpoint of MWOE. • Connect: Sent across the MWOE, to connect fragments. • We say merge occurs when connect message has been sent both ways on the edge (2 nodes must have same level). • We say absorb occurs when connect message has been sent on the edge from a lower-level to a higher-level node.
Description: Find MWOE – Level 0 • A single node fragment • The node is in state • The node awakens or receives a message • The node chooses its MWOE from all adjacent edges • Sends Connect(Level=0) over • Sets state to
Description: Find MWOE – Level L Fragment F • Two Level (L-1) fragments merge over common MWOE • MWOE is a new • New Level L fragment has identity • Leaders broadcast Initiate() to all nodes • Initiate() contains identity, Level and state Find • Initiate() is passed to all (L-1) Level fragments waiting to connect to nodes in G • G nodes start Test-Accept-Reject protocol to find MWOE • When a node finds MWOE, Report is convergecasted to leaders MWOE e
Description: Find MWOE – Level L (continued) Fragment F • Convergecast of Report(W) on fragment inbound edges • W() is defined as follows • is leaf: W is MWOE adjacent to or infinity • is internal node: W is min(MWOE()), is a node in subtreerooted at u) • Every G node remembers the edge leading to the MWOE in its subtree (best edge) • Best edges create a path from to the node • Leaders send Report messages on the core, one of them sends Change-core on • Every node on updates inbound edge to point to • sends Connect(L) over MWOE e
Test-Accept-Reject Protocol • Bookkeeping: Each node keeps a list of incident edges in order of weight, classified as: • Branch(in the MST), • Rejected(leads to same fragment), or • Basic(not yet classified). • Node tests only Basicedges, sequentially in order of weight: • Sends Testmessage, with (core, Level); recipient compares. • If same (, Level), sendsReject(same fragment), and reclassifies edge as Rejected. • If (core, Level) pairs are unequal and Level() Level() then sends Accept(different fragment). does not reclassify the edge. • If Level() < Level() then delays responding, until Level() Level(). • This is the Waiting… which can lead to Deadlocks F’ F
Merge • Suppose F and F have the same MWOE and Level • Level() Level() • Both and send Connect() over one in each direction • becomes a new of Level fragment • Nodes and send Initiate(,,) F F’
Absorb • Suppose F absorbs into fragment F via an edge , while F is working on determining its MWOE. • Level() Level() • Node sends Connect() • Node immediately sends Initiate(,,) • : • If has not yet reported its local MWOE, send Initiate(Find) • Otherwise, send Initiate(Found). We will see why new fragment’ MWOE can’t be from . F’ F
Correctness Given Properties 1 and 2, it is sufficient to verify: • MWOE is correctly chosen by every fragment • No deadlocks due to Waits
MWOE Correctness (Async Absorb) Case: absorbs into after reported MWOE(). We need to prove that MWOE() is valid after Absorb. Claim 1: Reported MWOE() cannot be the edge (,). Proof: • Since MWOE() has already been reported, it must lead to a node with Level Level(). • But the level of is still < level(), when the absorb occurs. • So MWOE() is a different edge, one whose weight < weight(,). Claim 2: MWOE for combined component is not outgoing from a node in . Proof: • (,) is the MWOE of , so there are no edges outgoing from with weight < weight(,). • So no edges outgoing from F with weight < already-reported MWOE(). • So MWOE of combined fragment isn’t outgoing from F. F’ F
Liveness Fragment Digraph 2 Lemma: After any finite sequence of merges and absorbs, either the forest consists of one tree (so we’re done), or some merge or absorb is enabled Proof: • Consider the current “fragment digraph”: • Nodes represent fragments • Directed edges represent MWOEs • There is an edge with minimal weight not yet in a forest => Then there must be some pair , whose MWOEs point to each other. • We can combine fragments, using either merge or absorb: • If same level, merge, else absorb. • So, merging and absorbing are enough to proceed. • If one of , Waits, it Waits for smaller Level fragment only • But lowest Level fragment is NEVER blocked and can grow by Merge or Absorb 1 4 3 6 5
Simulation – Ring 3 • Communication • Odd link – 1 cycle • Even link – 2 cycles 1 2 1 2 3
Initialization Events Code 3 1 2 1 2 3 State Network
Step 1 Events Code 3 1 2 1 2 3 State Network
Step 2 Events Code 3 1 2 1 2 3 State Network
Step 3 Events Code 3 1 2 1 2 3 State Network
Step 4 Events Code 3 1 2 1 2 3 State Network
Step 5 Events Code 3 1 2 1 2 3 State Network
Step 6 Events Code 3 1 2 1 2 3 State Network
Step 7 Events Code 3 1 2 1 2 3 State Network
Step 8 Events Code 3 1 2 1 2 3 State Network
Step 9 Events Code 3 1 2 1 2 3 State Network