1 / 122

Peer-to-Peer (P2P) systems

Peer-to-Peer (P2P) systems. DHT, Chord, Pastry, …. Unstructured vs Structured. Unstructured P2P networks allow resources to be placed at any node. The network topology is arbitrary, and the growth is spontaneous.

rwilkinson
Download Presentation

Peer-to-Peer (P2P) systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Peer-to-Peer (P2P) systems DHT, Chord, Pastry, …

  2. Unstructured vs Structured • Unstructured P2P networksallow resources to be placed at any node. The network topology is arbitrary, and the growth is spontaneous. • Structured P2P networkssimplify resource location and load balancing by defining a topology and defining rules for resource placement. • Guarantee efficient search for rare objects What are the rules??? Distributed Hash Table (DHT)

  3. Hash Tables • Store arbitrary keys and satellite data (value) • put(key,value) • value = get(key) • Lookup must be fast • Calculate hash function h() on key that returns a storage cell • Chained hash table: Store key (and optional value) there

  4. What Is a DHT? • Single-node hash table: key = Hash(name) put(key, value) get(key) -> value • How do I do this across millions of hosts on the Internet? • Distributed Hash Table

  5. …. node node node DHT Distributed application data get (key) put(key, data) (DHash) Distributed hash table lookup(key) node IP address (Chord) Lookup service • Application may be distributed over many nodes • DHT distributes data storage over many nodes

  6. Why the put()/get() interface? • API supports a wide range of applications • DHT imposes no structure/meaning on keys

  7. Distributed Hash Table • Hash table functionality in a P2P network : lookup of data indexed by keys • Key-hash  node mapping • Assign a unique live node to a key • Find this node in the overlay network quickly and cheaply • Maintenance, optimization • Load balancing : maybe even change the key-hash  node mapping on the fly • Replicate entries on more nodes to increase robustness

  8. Distributed Hash Table

  9. The lookup problem N2 N1 N3 Key=“title” Value=MP3 data… Internet ? Client Publisher Lookup(“title”) N4 N6 N5

  10. Centralized lookup (Napster) N2 N1 SetLoc(“title”, N4) N3 Client DB N4 Publisher@ Lookup(“title”) Key=“title” Value=MP3 data… N8 N9 N7 N6 Simple, but O(N) state and a single point of failure

  11. Flooded queries (Gnutella) N2 N1 Lookup(“title”) N3 Client N4 Publisher@ Key=“title” Value=MP3 data… N6 N8 N7 N9 Robust, but large number of messages per lookup

  12. Routed queries (Chord, Pastry, etc.) N2 N1 N3 Client N4 Lookup(“title”) Publisher Key=“title” Value=MP3 data… N6 N8 N7 N9

  13. Routing challenges • Define a useful key nearness metric • Keep the hop count small • Keep the tables small • Stay robust despite rapid change • Freenet: emphasizes anonymity • Chord: emphasizes efficiency and simplicity

  14. What is Chord? What does it do? • In short: a peer-to-peer lookup service • Solves problem of locating a data item in a collection of distributed nodes, considering frequent node arrivals and departures • Core operation in most p2p systems is efficient location of data items • Supports just one operation: given a key, it maps the key onto a node

  15. Chord Characteristics • Simplicity, provable correctness, and provable performance • Each Chord node needs routing information about only a few other nodes • Resolves lookups via messages to other nodes (iteratively or recursively) • Maintains routing information as nodes join and leave the system

  16. Mapping onto Nodes vs. Values • Traditional name and location services provide a direct mapping between keys and values • What are examples of values? A value can be an address, a document, or an arbitrary data item • Chord can easily implement a mapping onto values by storing each key/value pair at node to which that key maps

  17. Napster, Gnutella etc. vs. Chord • Compared to Napster and its centralized servers, Chord avoids single points of control or failure by a decentralized technology • Compared to Gnutella and its widespread use of broadcasts, Chord avoids the lack of scalability through a small number of important information for routing

  18. Addressed Difficult Problems (1) • Load balance: distributed hash function, spreading keys evenly over nodes • Decentralization: chord is fully distributed, no node more important than other, improves robustness • Scalability: logarithmic growth of lookup costs with number of nodes in network, even very large systems are feasible

  19. Addressed Difficult Problems (2) • Availability: chord automatically adjusts its internal tables to ensure that the node responsible for a key can always be found • Flexible naming: no constraints on the structure of the keys – key-space is flat, flexibility in how to map names to Chord keys

  20. Chord properties • Efficient: O(log(N)) messages per lookup • N is the total number of servers • Scalable: O(log(N)) state per node • Robust: survives massive failures

  21. The Base Chord Protocol (1) • Specifies how to find the locations of keys • How new nodes join the system • How to recover from the failure or planned departure of existing nodes

  22. Consistent Hashing • Hash function assigns each node and key an m-bit identifier using a base hash function such as SHA-1 • ID(node) = hash(IP, Port) • ID(key) = hash(key) • Properties of consistent hashing: • Function balances load: all nodes receive roughly the same number of keys • When an Nth node joins (or leaves) the network, only an O(1/N) fraction of the keys are moved to a different location

  23. Chord IDs • Key identifier = SHA-1(key) • Node identifier = SHA-1(IP, Port) • Both are uniformly distributed • Both exist in the same ID space • How to map key IDs to node IDs?

  24. Chord IDs • consistent hashing (SHA-1) assigns each node and object an m-bit ID • IDs are ordered in an ID circle ranging from 0 – (2m-1). • New nodes assume slots in ID circle according to their ID • Key k is assigned to first node whose ID ≥ k •  successor(k)

  25. Consistent hashing Key 5 K5 Node 105 N105 K20 Circular 7-bit ID space N32 N90 K80 A key is stored at its successor: node with next higher ID

  26. identifier node 6 X key 0 1 7 6 2 5 3 4 2 Successor Nodes 1 successor(1) = 1 identifier circle successor(6) = 0 6 2 successor(2) = 3

  27. 0 1 7 6 2 5 3 4 Node Joins and Departures 6 1 6 successor(6) = 7 successor(1) = 3 1 2

  28. Consistent Hashing – Join and Departure • When a node n joins the network, certain keyspreviously assigned to n’s successor now become assigned ton. • When node n leaves the network, all of its assigned keys arereassigned to n’s successor.

  29. 0 1 7 6 2 5 3 4 Consistent Hashing – Node Join keys 5 7 keys 1 keys keys 2

  30. 0 1 7 6 2 5 3 4 Consistent Hashing – Node Dep. keys 7 keys 1 keys 6 keys 2

  31. Scalable Key Location • A very small amount of routing information suffices to implement consistent hashing in a distributed environment • Each node need only be aware of its successor node on the circle • Queries for a given identifier can be passed around the circle via these successor pointers

  32. A Simple Key Lookup • Pseudo code for finding successor: // ask node n to find the successor of id n.find_successor(id) if (id  (n,successor]) return successor; else // forward the query around the circle return successor.find_successor(id);

  33. Scalable Key Location • Resolution scheme correct, BUT inefficient: • it may require traversing all N nodes!

  34. Acceleration of Lookups • Lookups are accelerated by maintaining additional routing information • Each node maintains a routing table with (at most) m entries (where N=2m) called the finger table • ithentry in the table at node n contains the identity of the first node, s, that succeeds n by at least 2i-1 on the identifier circle • s = successor(n + 2i-1) (all arithmetic mod 2m) • s is called the ith finger of node n, denoted by n.finger(i).node

  35. 0 1 7 6 2 5 3 4 Scalable Key Location – Finger Tables finger table keys start succ. 6 For. 1 2 4 1 3 0 0+20 0+21 0+22 finger table keys For. start succ. 1 1+20 1+21 1+22 2 3 5 3 3 0 finger table keys For. start succ. 2 4 5 7 0 0 0 3+20 3+21 3+22

  36. finger table keys start int. succ. 6 0 finger table keys 1 7 start int. succ. 1 2 3 5 [2,3) [3,5) [5,1) 3 3 0 6 2 5 3 finger table keys start int. succ. 2 4 4 5 7 [4,5) [5,7) [7,3) 0 0 0 Finger Tables 1 2 4 [1,2) [2,4) [4,0) 1 3 0

  37. Finger Tables - characteristics • Each node stores information about only a small number of other nodes, and knows more about nodes closely following it than about nodes fartheraway • A node’s finger table generally does not contain enough information to determine the successor of an arbitrary key k • Repetitive queries to nodes that immediately precede the given key will lead to the key’s successor eventually

  38. 0 finger table keys 1 7 start int. succ. 1 2 3 5 [2,3) [3,5) [5,1) 3 3 0 6 2 finger table keys start int. succ. 7 0 2 [7,0) [0,2) [2,6) 0 0 3 5 3 4 Node Joins – with Finger Tables finger table keys start int. succ. 6 1 2 4 [1,2) [2,4) [4,0) 1 3 0 6 6 finger table keys start int. succ. 2 4 5 7 [4,5) [5,7) [7,3) 0 0 0 6 6

  39. 0 1 7 6 2 5 3 4 Node Departures – with Finger Tables finger table keys start int. succ. 1 2 4 [1,2) [2,4) [4,0) 1 3 0 3 6 finger table keys start int. succ. 1 2 3 5 [2,3) [3,5) [5,1) 3 3 0 6 finger table keys start int. succ. 6 7 0 2 [7,0) [0,2) [2,6) 0 0 3 finger table keys start int. succ. 2 4 5 7 [4,5) [5,7) [7,3) 6 6 0 0

  40. Simple Key Location Scheme N1 N8 K45 N48 N14 N42 N38 N21 N32

  41. Scalable Lookup Scheme N1 Finger Table for N8 N56 N8 N51 finger 6 finger 1,2,3 N48 N14 finger 5 N42 finger 4 N38 N21 N32 finger [k] = first node that succeeds (n+2k-1)mod2m

  42. Chord key location • Lookup in finger table the furthest node that precedes key • -> O(log n) hops

  43. Scalable Lookup Scheme // ask node n to find the successor of id n.find_successor (id) n0 = find_predecessor(id); return n0.successor; // ask node n to find the predecessor of id n.find_predecessor (id) n0 = n; while (id not in (n0, n0.successor] ) n0 = n0.closest_preceding_finger(id); return n0; // return closest finger preceding id n.closest_preceding_finger (id) for i = m downto 1 if (finger[i].node belongs to (n, id)) return finger[i].node; return n;

  44. Lookup Using Finger Table N1 lookup(54) N56 N8 N51 N48 N14 N42 N38 N21 N32

  45. Scalable Lookup Scheme • Each node forwards query at least halfway along distance remaining to the target • Theorem: With high probability, the number of nodes that must be contacted to find a successor in a N-node network is O(log N)

  46. “Finger table” allows log(N)-time lookups ½ ¼ 1/8 1/16 1/32 1/64 1/128 N80

  47. Dynamic Operations and Failures Need to deal with: • Node Joins and Stabilization • Impact of Node Joins on Lookups • Failure and Replication • Voluntary Node Departures

  48. Node Joins and Stabilization • Node’s successor pointer should be up to date • For correctly executing lookups • Each node periodically runs a “Stabilization” Protocol • Updates finger tables and successor pointers

  49. Node Joins and Stabilization • Contains 6 functions: • create() • join() • stabilize() • notify() • fix_fingers() • check_predecessor()

  50. Create() • Creates a new Chord ring n.create() predecessor = nil; successor = n;

More Related