270 likes | 284 Views
Explore the innovative solutions of Tapestry - a self-organizing, robust, and scalable infrastructure for efficient content location and delivery in the presence of heavy loads and node/link failures.
E N D
Tapestry:A Resilient Global-scale Overlay for Service Deployment Ben Y. Zhao, Ling Huang, Jeremy Stribling, Sean C. Rhea, Anthony D. Joseph, and John Kubiatowicz IEEE Journal on Selected Areas in Communications, January 2004
Need for global knowledge to construct neighbor table Static architecture - no provision for node addition or deletion Single root for an object is a bottleneck. It is a single point of failure Tapestry provides innovative solutions to some of the bottlenecks of classical Plaxton routing. Limitations of Plaxton routing
Tapestry Tapestry is a self-organizingrobust scalable wide-area infrastructure for efficient location and delivery of contents in presence of heavy load and node or link failure. It is the backbone of the Oceanstore, a persistent wide-area storage system.
Major features of Tapestry Most contemporary P2P systems did not take take network distances into account when constructing their routing overlay; thus, a single overlay hop may span the diameter of the network. In contrast, Tapestry constructs locally optimal routing tables from initialization and subsequently maintains them.
Major features of Tapestry DHT based systems fix the number and location of object replicas. In contrast, Tapestry allows applications to place objects according to their needs. Tapestry “publishes” location pointers throughout the network to facilitate efficient routing to those objects with low network stretch.
Major features of Tapestry Unlike classical Plaxton routing, Tapestry allows node insertion and deletion to support a dynamic environment. These algorithms are fairly efficient.
Routing and Location Namespace (both nodes and objects) 160 bits using the hash function SHA-1 Each object has has one or more roots H (ObjectID) = RootID Suffix routing from A to B At hth hop, arrive at nearest node Nh that shares suffix with B of length h digits • Example: 5324 routes to 0629 via5324 --> 2349 --> 1429 --> 7629 --> 0629
Tapestry API PUBLISH OBJECT : Publish (make available) object on the local node. UNPUBLISH OBJECT : Best-effort attempt to remove location mappings for an object . ROUTE TO OBJECT : Routes message to location of an object with GUID (globally unique id). ROUTE TO NODE : Route message to application on the exact node .
Requirements for Insert and Delete • No central directory can be used • No hot spot or single point of failure • Reduced danger/threat of DoS. • Must be fast (should contact only a few nodes) • Keep objects available
Node Insertion • Need-to-know nodes are notified of, since the inserted node fills a null entry in their routing tables. • The new node might become the root for some existing objects. References to those objects must be moved to maintain object availability. • The algorithms must construct a near optimal routing table for the inserted node. Nodes near the inserted node are notified, and such nodes may consider using it in their routing tables as an optimization.
Choosing Root in Tapestry Compute H (ObjectID) = RootID. Attempt to route to this node (which should be the root)without knowing if it exists. If it exists, then it becomes the root. But otherwise • Whenever null entry encountered, choose the next “higher” non-null pointer entry (thus, if xx53 does not exist, try xx63), or a secondary entry • If current node S is only non-null pointer in the rest of map, terminate route, and choose S as the root
04345 & 54345 Acknowledged Multicast AlgorithmLocates & Contacts all nodes with a given suffix • Popular tool: Useful in node insertion. Create a tree based on IDs • Starting node knows when all nodes reached The node then sends to any ?0345, any ?1345, any ?2345, etc. if possible ??345 ?1345 ?4345 ?4345 sends to 04345, 54345… if they exist 04345 54345
Three Parts of Insertion • Establish pointers from surrogates to new node. • Notify the need-to-know nodes • Create routing tables & notify other nodes
01234 01334 ????4 ???34 79334 39334 Gate Finding the surrogates surrogates • The new node sends a join message to a surrogate • The primary surrogate multicasts to all other surrogates with similar suffix. • Each surrogate establishes a pointer to the new node. • When all pointers are established, continue new node
Need-to-know nodes • “Need-to-know” = a node with a hole in neighbor table filled by new node • If 01234 is new node, and no 234s existed, must notify ???34 nodes • Acknowledged multicast to all matching nodes • During this time, object requests may go either to new node or former surrogate, but that’s okay • Once done, delete pointers from surrogates.
Constructing the Neighbor Table via a nearest neighbor search • Suppose we have an algorithm A for finding the three nearest neighbors for a given node. • To fill in a slot, apply A to the subnetwork of three nodes that could fill that slot. (For ????1, run A on network of nodes ending in 1, and pick nearest neighbor.)
01234 01334 Finding Nearest Neighbor j-list is the closest k=O(log n) nodes matching in j digits • Let G be such that surrogate matches new node in last j digits of node ID • G sends j-list to new node; new node pings all nodes on j-list. • If one is closer, goto that node. If not, done with this level, and let j = j-1 and goto A. 32134 61524 11111
Is this the nearest node?Yes, with high probability under an assumption New node • Pink circle = ball around new node of radius d(G, new node) • Progress = find any node in pink circle • Consider the ball around the G containing all its j-list. Two cases: • Black ball contain pink ball; found closest node • High overlap between pink ball and G-ball so unlikely pink ball empty while G-ball has k nodes G, matches in j digits
out-neighbors 11111 xxxx1 exiting node 12345 xxx45 xxxx5 11115 54321 In-neighbors Deleting a node
Planned Delete • Notify its out-neighbors: Exiting node says “I’m no longer pointing to you” • To in-neighbors: Exiting node says it is going and proposes at least one replacement. • Exiting node republishes all objects ptrs it stores • Objects rooted at exiting node get new roots
republish republish Republish-On-Delete republish republish republish
Unplanned Delete • Planned delete relied exiting node’s neighbor table. • List of out-neighbors • List of in-neighbors • Closest matching node for each level. • Can we reconstruct this information? • Not easily • Fortunately, we probably don’t need to.
Lazy Handling of Unplanned Delete • A notices B is dead, A fixes its own state • A removes B from routing tables • If removing B produces a hole, A must fill the hole, or be sure that the hole cannot be filled—use acknowledged multicast • A republishes all objects with next hop = B. • Use republish-on-delete as before
3 4 2 NodeID 0x43FE 1 4 3 2 1 3 4 4 3 2 3 4 2 3 1 2 1 2 3 1 Tapestry MeshIncremental suffix-based routing (slide borrowed from the original authors) NodeID 0x79FE NodeID 0x23FE NodeID 0x993E NodeID 0x43FE NodeID 0x73FE NodeID 0x44FE NodeID 0xF990 NodeID 0x035E NodeID 0x04FE NodeID 0x13FE NodeID 0x555E NodeID 0xABFE NodeID 0x9990 NodeID 0x239E NodeID 0x73FF NodeID 0x1290 NodeID 0x423E
Fault detection • Soft-state vs. explicit fault-recovery - Soft-state periodic republish is more attractive • Expected additional traffic for periodic republish is low • Redundant roots for better resilience • Object names hashed w/ small salts i.e. multiple names/roots • Queries and publishing utilize all roots in parallel
Summary of Insertion steps Step 1: Build up N’s routing maps • Send messages to each hop along path from gateway to current node N • The ith hop along the path sends its ith level route table to N • N optimizes those tables where necessary Step 2: Move appropriate data from N’ to N Step 3: Use back pointers from N’ to find nodes which have null entries for N’s ID, tell them to add new entry to N Step 4: Notify local neighbors to modify paths to route through N where appropriate
3 2 1 4 3 2 1 3 4 4 3 2 3 4 2 3 1 2 1 3 1 Dynamic Insertion Exampleborrowed from the original slides 4 NodeID 0x779FE NodeID 0xA23FE NodeID 0x6993E NodeID 0x243FE NodeID 0x243FE NodeID 0x973FE NodeID 0x244FE NodeID 0x4F990 NodeID 0xC035E NodeID 0x704FE NodeID 0x913FE NodeID 0xB555E NodeID 0x0ABFE NodeID 0x09990 NodeID 0x5239E NodeID 0x71290 Gateway 0xD73FF NEW 0x143FE