290 likes | 302 Views
Learn about Chord, a peer-to-peer lookup service for internet apps. Understand its features, challenges, and solutions. Explore node-file mapping, ring organization, and resolving successor node efficiently.
E N D
Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications* CS587x Lecture Department of Computer Science Iowa State University *I. Stoica, et al., publishedin SIGCOMM’01
A B C D E Freenet • Highlights • Each file is identified by a binary key • Each node maintains a routing table, a list of item (host, key) • Query is sent to the neighbor with the nearest key • Files are replication over their retrieval path • Problems • Host transience • Keyword search • No guarantee on when a file can be found • Awful in requesting for a file that does not exist
Research challenge • Given a file, can we find the node that stores the file immediately? • If there is not such file, the query should be answered within a limited steps
Motivation example N0 Example • Assume there are 8 nodes and we can arrange them into a ring • Given a file, we can hash it and store it in the node whose id is equal to the hash value N7 N1 N2 N6 N5 N3 N4
Motivation example N0 Example • Assume there are 8 nodes and we can arrange them into a ring • Given a file, we can hash it and store it in the node whose id is equal to the hash value Problems • The number of nodes is not known ahead • Dynamically changed • Different files must have different hash values • The hash domain should be large enough N7 N1 N2 N6 N5 N3 N4
Chord at 30,000 feet high • Hash node and file • Each node can be identified by a hash value, e.g., H(IP) • This value is called the node’s identifier • Each file can be identified by a hash value, e.g., H(file) • This value is called the file’s key • Node identifier and file key have the same domain space • Both are m bits • Mapping between nodes and files • A file with key K is stored in the node identified by successor(k) • If node K exists, this is node K • If node K does not exist, this is the next available node
Chord Ring N0 Chord Ring • Each node is identified by a hash value • These nodes can be organized into a ring, although some positions may be empty N1 N6 N5 N3
Chord Ring N0 Chord Ring • Each node is identified by a hash value • These nodes can be organized into a ring, although some positions may be empty Node-File mapping • K is stored on node whose identifier is successor(K) • successor(K) = K if node K exists • Otherwise, it is the next available node N1 N6 N5 N3
Chord Ring • Chord Ring • Each node is identified by a hash value • These nodes can be organized into a ring, although some positions may be empty • Node-File mapping • K is stored on node whose identifier is successor(K) • successor(K) = K if node K exists • Otherwise, it is the next available node • Every node knows its successors • N0 knows successor(0), successor(7) • N1 knows successor(1) • N3 knows successor(3), successor(2) • N5 knows successor(5), successor(4) • N6 knows successor(6) N0 N1 N6 N5 N3 successor(4) successor(3)
Resolving successor(K) N0 Naïve Solution 1 • Search the ring until Successor(K) is found • LookupCost=O(N), where N is the number of nodes • All nodes may have to be searched N1 N6 N5 N3 successor(4) successor(3)
Resolving successor(K) N0 Naïve Solution 2 • Every node keeps a table containing the all mapping of K and successor(K) • LookupCost=O(1), but table maintenance cost is high • whenever a host joins or leaves the ring, all other nodes need to update their table! N1 N6 N5 N3 successor(4) successor(3)
Resolving successor(K) Solution 1 and 2 are two extremes: • Solution 1 does not need table update, but need to search entire set of nodes to answer a query • Solution 2 can resolve a query immediately, but each host needs to know all other nodes N0 N1 N6 Challenge: • Can a host know only a small portion of other nodes, while each query can still be resolved in a limited number of steps? • Which nodes should be known to a node? N5 N3 successor(4) successor(3)
Resolving successor(K) Finger Table • Each node maintains m entries • The ith entry at node n contains the identity of the first node, s, that succeeds n by at least 2^(i-1), on the ring • S = successor(n+2^(i-1)) N0 N1 finger2 N6 finger0, 1 Finger Table for N1 successor(N1+1) N3 finger[0] N5 N3 successor(N1+2) finger[1] N3 N5 finger[2] successor(N1+4) finger [i] = first node that succeeds (n+2i-1) mod 2m
Another Example m=6 N1 Finger Table for N8 N56 N8 N51 finger 6 finger 1,2,3 N48 N14 finger 5 N42 finger 4 N38 Finger [k] = first node that succeeds (n+2k-1)mod2m N21 N32
Properties of Finger Table • Each node stores information only (log2N) other nodes • A node’s finger table generally does not contain enough information to determine the successor of an arbitrary key K • Each node knows more that nodes closely following it on the identifier circle that about nodes farther away • 1, 2, 4, 8 …
Key Lookup: find_successor(K) // ask node n to find the successor of id n.find_successor(id) if (id belongs to (n, successor]) return successor; else n0 = closest preceding node(id); return n0.find_successor(id); // search the local table for the highest // predecessor of id n.closest_preceding_node(id) for i = m downto 1 if (finger[i] belongs to (n, id)) return finger[i]; return n; Finger Table for N8
Lookup Using Finger Table N1 lookup(54) N56 Finger Table for N8 N8 N51 N48 N14 N42 N38 N21 N32
Lookup Using Finger Table N1 lookup(54) N56 Finger Table for N8 N8 N51 N48 N14 Lookup can be • iterative • recursive N42 N38 N21 N32
Scalable Lookup Scheme • Each node forwards query at least halfway along distance remaining to the target • Theorem: With high probability, the number of nodes that must be contacted to find a successor in a N-node network is O(log N)
Create() • Creates a new Chord ring n.create() predecessor = nil; successor = n;
N0 N1 N2 N6 N5 N3 Join() • Assumptions • Each node maintains a finger table and knows its predecessor correctly • Node n follows three steps to join • Initialize the predecessor and figures of node n • Node n learn its predecessor and fingers by asking n’ to look them up O(log2N) • Update the fingers of existing nodes • Node n becomes the ith finger of node p if and only if • p precedes n by at least 2^(i-1), and • The ith finger of node p succeeds n • Transfer all keys (or files) that node n is now responsible for • These keys must be from the immediate follower of node n Finger Table for N2 X successor(N2+1) finger[0] Y successor(N2+2) finger[1] Z successor(N2+4) finger[2]
N0 N1 N2 N6 N5 N3 Stabilization: Dealing with Concurrent Node Joins and Fails • Periodically ask n’s immediate successor about successor’s predecessor p • Checks whether p should be n’s successor instead • Also notifies n’s successor about n’s existence, so that successor may change its predecessor to n, if necessary • Periodically check whetherfingertable entries are correct • New nodes initializetheir finger tables • Existing nodes incorporatenew nodes into their finger tables • Periodically check whether predecessor has failed • If yes, it clears the predecessor pointer
Impact of Node Joins on Lookups • Correctness • If finger table entries are reasonably current • Lookup finds the correct successor in O(log N) steps • If successor pointers are correct but finger tables are incorrect • Correct lookup but slower • If incorrect successor pointers • Lookup may fail
Problem of Stabilization • Stabilization won’t correct a Chord system that • has split into multiple disjoint cycles, or • a single cycle that loops multiple times around the identifier space
Failure and Replication • Correctness of the protocol relies on the fact of knowing correct successor • To improve robustness • Each node maintains a successor list of ‘r’ nodes • Each node replicates its data to its immediate successor
Voluntary Node Departures • Can be treated as node failures • Two possible enhancements • Leaving node may transfers all its keys to its successor • Leaving node may notify its predecessor and successor about each other so that they can update their links
Advantage of Chord • Load Balance: Distributed hash function spreads keys evenly over the nodes • Decentralization: Fully distributed • Scalability: Lookup grows as a log of number of nodes • Availability: Automatically adjusts internal tables to reflect changes. • Flexible Naming: No constraints on key structure.
Conclusion • Efficient location of the node that stores a desired data item is a fundamental problem in P2P networks • Chord protocol solves it in a efficient decentralized manner • Routing information: O(log N) nodes • Lookup: O(log N) nodes • Update: O(log2 N) messages • It also adapts dynamically to the topology changes introduced during the run
Critiques • Maintaining Chord ring and up-to-date finger table may be expensive or impossible • Especially when nodes join and leave frequently • Malicious set of Chord participants could present an incorrect view of the Chord ring • Node n periodically asks other nodes to do a lookup for n