600 likes | 856 Views
Distributed Hash-based Lookup for Peer-to-Peer Systems. Mohammed Junaid Azad 09305050 Gopal Krishnan 09305915 Mtech1 ,CSE. Agenda. Peer-to-Peer System Initial Approaches to Peer-to-Peer Systems Their Limitations Distributed Hash Tables CAN-Content Addressable Network CHORD.
E N D
Distributed Hash-based Lookupfor Peer-to-Peer Systems Mohammed Junaid Azad 09305050 Gopal Krishnan 09305915 Mtech1 ,CSE
Agenda • Peer-to-Peer System • Initial Approaches to Peer-to-Peer Systems • Their Limitations • Distributed Hash Tables • CAN-Content Addressable Network • CHORD
Peer-to-Peer Systems • Distributed and Decentralized Architecture • No centralized Server(Unlike Client Server Architecture) • Any Peer can behave as Server
Napster • P2P file sharing system • Central Server stores the index of all the files available on the network • To retrieve a file, central server contacted to obtain location of desired file • Not completely decentralized system • Central directory not scalable • Single point of failure
Gnutella • P2P file sharing system • No Central Server store to index the files available on the network • File location process decentralized as well • Requests for files are flooded on the network • No Single point of failure • Flooding on every request not scalable
File Systems for P2P systems • The file system would store files and their metadata across nodes in the P2P network • The nodes containing blocks of files could be located using hash based lookup • The blocks would then be fetched from those nodes
Scalable File indexing Mechanism • In any P2P system, File transfer process is inherently scalable • However, the indexing scheme which maps file names to location crucial for scalability • Solution:- Distributed Hash Table
Distributed Hash Tables • Traditional name and location services provide a direct mapping between keys and values • What are examples of values? A value can be an address, a document, or an arbitrary data item • Distributed hash tables such as CAN/Chord implement a distributed service for storing and retrieving key/value pairs
DNS provides a host name to IP address mapping relies on a set of special root servers names reflect administrative boundaries is specialized to finding named hosts or services Chord can provide same service: Name = key, value = IP requires no special servers imposes no naming structure can also be used to find data objects that are not tied to certain machines DNS vs. Chord/CAN
File System Block Store Block Store Block Store Chord Chord Chord Client Server Server Example Application using Chord:Cooperative Mirroring • Highest layer provides a file-like interface to user including user-friendly naming and authentication • This file systems maps operations to lower-level block operations • Block storage uses Chord to identify responsible node for storing a block and then talk to the block storage server on that node
What is CAN ? • CAN is a distributed infrastructure that provides hash table like functionality • CAN is composed of many individual nodes • Each CAN node stores a chunk (zone) of the entire hash table • Request for a particular key is routed by intermediate CAN nodes whose zone contains that key • The design can be implemented in application level (no changes to kernel required)
Design Of CAN • Involves a virtual d-dimensional Cartesian Co-ordinate space • The co-ordinate space is completely logical • Lookup keys hashed into this space • The co-ordinate space is partitioned into zones among all nodes in the system • Every node in the system owns a distinct zone • The distribution of zones into nodes forms an overlay network
Design of CAN (..continued) • To store (Key,value) pairs, keys are mapped deterministically onto a point P in co-ordinate space using a hash function • The (Key,value) pair is then stored at the node which owns the zone containing P • To retrieve an entry corresponding to Key K, the same hash function is applied to map K to the point P • The retrieval request is routed from requestor node to node owning zone containing P
Routing in CAN • Every CAN node holds IP address and virtual co-ordinates of each of it’s neighbours • Every message to be routed holds the destination co-ordinates • Using it’s neighbour’s co-ordinate set, a node routes a message towards the neighbour with co-ordinates closest to the destination co-ordinates • Progress: how much closer the message gets to the destination after being routed to one of the neighbours
Routing in CAN(continued…) • For a d-dimensional space partitioned into n equal zones, routing path length = O(d.n1/d ) hops • With increase in no. of nodes, routing path length grows as O(n1/d ) • Every node has 2d neighbours • With increase in no. of nodes, per node state does not change
Allocation of a new node to a zone • First the new node must find a node already in CAN(Using Bootstrap Nodes) • The new node randomly chooses a point P in the co-ordinate space • It sends a JOIN request to point P via any existing CAN node • The request is forwarded using CAN routing mechanism to the node D owning the zone containing P • D then splits it’s node into half and assigns one half to new node • The new neighbour information is determined for both the nodes
Failure of node • Even if one of the neighbours fails, messages can be routed through other neighbours in that direction • If a node leaves CAN, the zone it occupies is taken over by the remaining nodes • If a node leaves voluntarily, it can handover it’s database to some other node • When a node simply becomes unreachable, the database of the failed node is lost • CAN depends on sources to resubmit data, to recover lost data
Features • CHORD is a distributed hash table implementation • Addresses a fundamental problem in P2P • Efficient location of the node that stores desired data item • One operation: Given a key, maps it onto a node • Data location by associating a key with each data item • Adapts Efficiently • Dynamic with frequent node arrivals and departures • Automatically adjusts internal tables to ensure availability • Uses Consistent Hashing • Load balancing in assigning keys to nodes • Little movement of keys when nodes join and leave
Features (continued) • Efficient Routing • Distributed routing table • Maintains information about only O(logN) nodes • Resolves lookups via O(logN) messages • Scalable • Communication cost and state maintained at each node scales logarithmically with number of nodes • Flexible Naming • Flat key-space gives applications flexibility to map their own names to Chord keys • Decentralized
Some Terminology • Key • Hash key or its image under hash function, as per context • m-bit identifier, using SHA-1 as a base hash function • Node • Actual node or its identifier under the hash function • Length m such that low probability of a hash conflict • Chord Ring • The identifier circle for ordering of 2mnode identifiers • Successor Node • First node whose identifier is equal to or follows key k in the identifier space • Virtual Node • Introduced to limit the bound on keys per node to K/N • Each real node runs Ω(logN) virtual nodes with its own identifier
Consistent Hashing • A consistent hash function is one which changes minimally with changes in the range of keys and a total remapping is not required • Desirable properties • High probability that the hash function balances load • Minimum disruption, only O(1/N) of the keys moved when a nodes joins or leaves • Every node need not know about every other node, but a small amount of “routing” information • m-bit identifier for each node and key • Key k assigned to Successor Node
Scalable Key Location • A very small amount of routing information suffices to implement consistent hashing in a distributed environment • Each node need only be aware of its successor node on the circle • Queries for a given identifier can be passed around the circle via these successor pointers • Resolution scheme correct, BUT inefficient: it may require traversing all N nodes!
Acceleration of Lookups • Lookups are accelerated by maintaining additional routing information • Each node maintains a routing table with (at most) m entries (where N=2m) called the finger table • ithentry in the table at node n contains the identity of the first node, s, that succeeds n by at least 2i-1 on the identifier circle (clarification on next slide) • s = successor(n + 2i-1) (all arithmetic mod 2) • s is called the ith finger of node n, denoted by n.finger(i).node
finger table keys start int. succ. 6 0 finger table keys 1 7 start int. succ. 1 2 3 5 [2,3) [3,5) [5,1) 3 3 0 6 2 5 3 finger table keys start int. succ. 2 4 4 5 7 [4,5) [5,7) [7,3) 0 0 0 Finger Tables (1) 1 2 4 [1,2) [2,4) [4,0) 1 3 0
Finger Tables (2) - characteristics • Each node stores information about only a small number of other nodes, and knows more about nodes closely following it than about nodes fartheraway • A node’s finger table generally does not contain enough information to determine the successor of an arbitrary key k • Repetitive queries to nodes that immediately precede the given key will lead to the key’s successor eventually
0 finger table keys 1 7 start int. succ. 1 2 3 5 [2,3) [3,5) [5,1) 3 3 0 6 2 finger table keys start int. succ. 7 0 2 [7,0) [0,2) [2,6) 0 0 3 5 3 4 Node Joins – with Finger Tables finger table keys start int. succ. 6 1 2 4 [1,2) [2,4) [4,0) 1 3 0 6 6 finger table keys start int. succ. 2 4 5 7 [4,5) [5,7) [7,3) 0 0 0 6 6
0 1 7 6 2 5 3 4 Node Departures – with Finger Tables finger table keys start int. succ. 1 2 4 [1,2) [2,4) [4,0) 1 3 0 3 6 finger table keys start int. succ. 1 2 3 5 [2,3) [3,5) [5,1) 3 3 0 6 finger table keys start int. succ. 6 7 0 2 [7,0) [0,2) [2,6) 0 0 3 finger table keys start int. succ. 2 4 5 7 [4,5) [5,7) [7,3) 6 6 0 0
Source of Inconsistencies:Concurrent Operations and Failures • Basic “stabilization” protocol is used to keep nodes’ successor pointers up to date, which is sufficient to guarantee correctness of lookups • Those successor pointers can then be used to verify the finger table entries • Every node runs stabilize periodically to find newly joined nodes
nil Stabilization after Join • n joins • predecessor = nil • n acquires ns as successor via some n’ • n notifies ns being the new predecessor • ns acquires n as its predecessor • np runs stabilize • np asks ns for its predecessor (now n) • np acquires n as its successor • np notifies n • n will acquire np as its predecessor • all predecessor and successor pointers are now correct • fingers still need to be fixed, but old fingers will still work ns pred(ns) = n n succ(np) = ns pred(ns) = np succ(np) = n np
Failure Recovery • Key step in failure recovery is maintaining correct successor pointers • To help achieve this, each node maintains a successor-list of its r nearest successors on the ring • If node n notices that its successor has failed, it replaces it with the first live entry in the list • stabilize will correct finger table entries and successor-list entries pointing to failed node • Performance is sensitive to the frequency of node joins and leaves versus the frequency at which the stabilization protocol is invoked
Impact of Node Joins on Lookups: Correctness • For a lookup before stabilization has finished, • Case 1: All finger table entries involved in the lookup are reasonably current then lookup finds correct successor in O(logN) steps • Case 2: Successor pointers are correct, but finger pointers are inaccurate. This scenario yields correct lookups but may be slower • Case 3: Incorrect successor pointers or keys not migrated yet to newly joined nodes. Lookup may fail. Option of retrying after a quick pause, during which stabilization fixes successor pointers
Impact of Node Joins on Lookups: Performance • After stabilization, no effect other than increasing the value of N in O(logN) • Before stabilization is complete • Possibly incorrect finger table entries • Does not significantly affect lookup speed, since distance halving property depends only on ID-space distance • If new nodes’ IDs are between the target predecessor and the target, then lookup speed is influenced • Still takes O(logN) time for N new nodes
Handling Failures • Problem: what if node does not know who its new successor is, after failure of old successor • May be in a gap in the finger table • Chord would be stuck! • Maintain successor list of size r, containing the node’s first r successors • If immediate successor does not respond, substitute the next entry in the successor list • Modified version of stabilize protocol to maintain the successor list • Modified closest_preceding_node to search not only finger table but also successor list for most immediate predecessor • If find_successsor fails, retry after some time • Voluntary Node Departures • Transfer keys to successor before departure • Notify predecessor p and successor s before leaving
Theorems • Theorem IV.3: Inconsistencies in successor are transient • If any sequence of join operations is executed interleaved with stabilizations, then at sometime after the last join the successor pointers will form a cycle on all the nodes in the network. • Theorem IV.4: Lookup take log(N) time with high probability even if N nodes join a stable N node network, once successor pointers are correct, even if finger pointers are not updated • Theorem IV.6: If network is initially stable, even if every node fails with probability ½, expected time to execute find_succcessor is O(log N)
Simulation • Implements Iterative Style (other one is recursive style) • Node resolving a lookup initiates all communication unlike Recursive Style, where intermediate nodes forward request Optimizations • During stabilization, a node updates its immediate successor and 1 other entry in successor list or finger table • Each entry out of k unique entries gets refreshed once in ’k’ stabilization rounds • Size of successor list is 1 • Immediate notification of predecessor change to old predecessor, without waiting for next stabilization round
Parameters • Mean of delay of each packet is 50 ms • Round trip time is 500 ms • Number of nodes is 104 • Number of Keys vary from 104 to 106
Load Balance • Test ability of consistent hashing, to allocate keys to nodes evenly • Number of keys per node exhibits large variations, that increase linearly with the number of keys • Association of keys with Virtual Nodes Makes the number of keys per node more uniform and Significantly improves load balance • Asymptotic value of query path length not affected much • Total identifier space covered remains same on average • Worst-case number of queries does not change • Not much increase in routing state maintained • Asymptotic number of control messages not affected