260 likes | 425 Views
Tuesday, February 03, 2009. “Work expands to fill the time available for its completion.” Parkinson’s 1st Law. Recap. Introduction Coupling Loose | Tight SOA Purpose and Problem History P2P Design Characteristics Overlay routing vs. IP routing Dangers and Attacks
E N D
Tuesday, February 03, 2009 “Work expands to fill the time available for its completion.” • Parkinson’s 1st Law
Recap • Introduction • Coupling • Loose | Tight • SOA • Purpose and Problem • History • P2P Design Characteristics • Overlay routing vs. IP routing • Dangers and Attacks • Current Peer-Peer Concerns
Where are we ? • Introduction • Napster and its legacy –self study • Peer-to-Peer middleware –self study • Routing overlays • Overlay case studies: Pastry, Tapestry • Application case studies: Squirrel, OceanStore, Ivy • Summary
Routing Overlay • In P2P we cannot maintain the database at all the client nodes, giving the location of all the resources • Resource location knowledge must be partitioned and distributed • Each node is made responsible for maintaining • detailed knowledge of the locations of nodes and objects in a portion of the namespace • As well as general knowledge of the topology of the entire name space • High degree of replication of this knowledge is necessary to ensure dependability in the face of the volatile availability of hosts and intermittent network connectivity. • A distributed algorithm known as routing overlays takes responsibility for locating nodes and objects
Distribution of information in a routing overlay Routing overlay takes the responsibility for locating nodes and objects D’s A’s B’s C’s
Routing Overlay • Ensures that any node can access any object by routing each request through a sequence of nodes, exploiting knowledge at each of them to locate the destination object using pure names • It also maintains the knowledge of location of all the replicas of the object and deliver request to nearest live node • GUID is computed from all or part of the state using a hash function with very high probability, unique. • Uniqueness is verified by searching for another object with the same GUID.
Tasks of Routing Overlay • Main task of Routing Overlay • Client wishing to invoke an operation on an object submits a request including the object’s GUID to the routing overlay, which routes the request to a node at which a replica of the object resides • Other task of Routing Overlay • Node wishing to make new object available to a P2P service computes a GUID for the object and announces it to the routing overlay, which then ensures that the object is reachable by all other clients. • When clients request the removal of the object from the service the routing overlay must make them unavailable. • Nodes may join or leave the service, when a node joins the service, the routing overlay arranges for it to assume some of the responsibilities of other nodes. When a node leaves its responsibilities are distributed amongst the other nodes.
Basic programming interface for a distributed hash table (DHT) as implemented by the PAST API over Pastry • put(GUID, data) The data is stored in replicas at all nodes responsible for the object identified by GUID. • remove(GUID) Deletes all references to GUID and the associated data. • value = get(GUID) The data associated with GUID is retrieved from one of the nodes responsible for it.
Basic programming interface for distributed object location and routing (DOLR) as implemented by Tapestry • publish(GUID) GUID can be computed from the object (or some part of it, e.g. its name). This function makes the node performing a publish operation by the host for the object corresponding to GUID. • unpublish(GUID) Makes the object corresponding to GUID inaccessible. • sendToObj(msg, GUID, [n]) Following the object-oriented paradigm, an invocation message is sent to an object in order to access it. This might be a request to open a TCP connection for data transfer or to return a message containing all or part of the object’s state. The final optional parameter [n], if present, requests the delivery of the same message to n replicas of the object.
Overlay Case Study: Pastry • It is a message routing infrastructure deployed in several applications • All the nodes & objects are assigned 128-bit GUIDs • For nodes: computed by applying a secure hash function to public key of the node • For objects: computed by applying a secure hash function to the objects name or some part of its stored state • Resulting GUIDs are randomly distributed in the range 0 to 2128 -1 • Clashes between GUIDs for different nodes or objects are extremely unlikely, still pastry can detect & mange this unlikely event • In a network with N participating nodes the Pastry algo will correctly route a message addressed to any GUID in O(log N) steps • If GUID identifies a node which is active, message is delivered to that node otherwise delivered to a active node with closet numeric GUID • Prefix routing
Pastry • Routing steps involve the use of an underlying transport protocol (normally UDP) to transfer the message to a Pastry node that is closer to its destination • Closeness in pastry refers to an entirely artificial space – the space of GUIDs • Real transport of message across internet between two pastry nodes may require lots of IP hops. • For better path option, pastry uses locality metric on network distance in the underlying network (hop count, two way latency) to select appropriate neighbors when setting up the routing tables used at each node • Pastry id fully self organizing • New nodes get info form neighbors to construct the table • Nodes can detect the absence of the node and can update the table • ClasseslessInterdomain Routing for IP Packets
Pastry: Routing Algo • Explanation in two stages • Stage 01: simplified form of the algo which routes messages correctly but inefficiently without a routing table • Each active node stores a leaf set –a vector L (of size 2l) containing the GUIDs and IP addresses of the nodes whose GUIDs are numerically closest on either side of its own. • Leaf sets are maintained by Pastry as nodes join and leave. • Stage 02: full routing algo which routes request to any node in O(log N) messages • NUPastery library provides an implementation of Pastry algorithm • Overlay Weaver toolkit provides multiple routing algorithms, Chord, Kademlia, Koorde, Pastry and Tapestry
Figure 10.6: Circular routing alone is correct but inefficient Based on Rowstron and Druschel [2001] The dots depict live nodes. The space is considered as circular: node 0 is adjacent to node (2128-1). The diagram illustrates the routing of a message from node 65A1FC to D46A1C using leaf set information alone, assuming leaf sets of size 8 (l=4). This is a degenerate type of routing that would scale very poorly; it is not used in practice.
Pastry: Routing Algo • Each pastry node maintains a tree structured routing table giving GUIDs and IP addresses for a set of nodes spread through out the entire range of 2128 possible values, with increased density of coverage for GUID numerically close to its own
Routing tables • structured • GUIDs are viewed as hexadecimal values • IP addresses for set of nodes spread hexadecimal values & tables classifies GUIDs based on their hexadecimal prefixes throughout the entire range of 2128 possible GUID values, with increased density of coverage for GUID’s numerically close to its own. • Tables has as many rows as there are hexadecimal digits in a GUID, so for our prototype there are 128/4 = 32 rows • Each row contains 15 entries • Each entry in the table points to one of the potentially many nodes whose GUIDs have the relevant prefix.
Figure 10.8: Pastry routing example Based on Rowstron and Druschel [2001]
Reading Assignment • Reading • Host integration • Host Failure or Departure
Pastry’s routing algorithm • Algo will succeed in delivering the message M to its destination cause lines 1,2 & 7 • They perform action as described in stage 01 • The remaining steps are designed to improve the algorithm's performance by reducing the numbers of hops required
Host integration • Compute GUID • Contact nearby (ref . to network distance)pastry node • New node X sends join request to node A • Node will forward through pastry routing algorithm this join request to node which is closest to A in terms of GUID address , let say node Z • A, Z and all the nodes (B, C, …) through which the join message is routed on its way to Z add additional step to normal pastry routing algo. • This result in transmission of the relevant parts of their respective leaf set to X • X examine them and constructs its own Routing table
Host Failure or Departure • Pastry node is considered failed when its immediate neighbors in GUID space can no longer communicate with it. So repair a leaf containing info of such node is necessary. • For leaf set L repair, node will look for another closest live node and request for its leaf set L’. • L’ will contain a sequence of GUIDs that partly overlap in L, including the appropriate value to replace the failed node. • Other nodes then informed about the failure and they too also perform the same operation
Other issues in pastry • Locality • Highly redundant • Proximity Neighbor selection algo • Locality metric – number of IP hops or measured latency • 30 to 50 percent longer then optimum path • Fault Tolerance • Heartbeat messages are sent by all nodes to nearest neighbours • Failed node info cannot be sent sufficiently rapidly • Nor does it account for malicious nodes • Message delivery guarantee o some extent is achieved through mechanism of retransmission of message through slightly different less optimal route • Dependability • MSPastry – adopts acknowledgement & retransmission as well
Performance • Evaluation of MSPastry • With assumed IP message loss rate 0% MSPastry failed ti deliver 1.5 in 100,000 requests, all message arrived at correct node • With assumed IP message loss rate 5% MSPastry failed ti deliver 3.3 in 100,000 requests, and 1.6 were delivered at wrong node • Performance overhead of overlay MSPastry algo is 2.2 to 5 percent • Extra message for leafset maintenance and initially at setting up the routing table
RT updates • Node failure or departure • Node is considered failed when its immediate neighbors are unable to contact • Node which discovers the failure of the node, looks for the next nearest live node, and request for its leaf set • This leaf set will contain the overlapping info of failed node leaf set. Discovering node will choose the best node from this leaf set to replace the failed node.
Self study • Locality • Fault tolerance • Dependability – MS Pastry • Evaluation of MS Pastry