1 / 66

Scalable Content-Addressable Network (CAN): A Peer-to-Peer Information System Seminar

In this seminar, speaker Vladimir Eske discusses the CAN system, focusing on its basic architecture, improvements, and a summary of its key features. CAN, as a distributed and fault-tolerant system, overcomes the limitations of centralized and completely decentralized file distribution systems like Napster and Gnutella. By utilizing Cartesian space and hash tables, CAN ensures scalable peer-to-peer file distribution with efficient insertion, lookup, and deletion operations. The architecture of CAN incorporates nodes, zones, and neighbors, facilitating decentralized communication and access to the system through DNS domains and Bootstrap servers. Join this seminar to explore the innovative design and routing algorithms of CAN for robust peer-to-peer information systems.

ddeidre
Download Presentation

Scalable Content-Addressable Network (CAN): A Peer-to-Peer Information System Seminar

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Seminar “Peer-to-peer Information Systems” A Scalable Content-Addressable Network (CAN) Speaker Vladimir Eske Advisor Dr. Ralf Schenkel November 2003

  2. a. Data Model b. CAN Routing c. CAN construction Content 1. Basic architecture 2. Architecture improvements 3. Summary

  3. What is CAN? CAN - Content Addressable Network Napster problem: centralized File Index • There is a single point of failure: Low data availability • Non scalable : No way to decentralize it except to build a new system Gnutella problem: File Index completely decentralized • Network flood: Low data availability • Non scalable: No way to group data The goal was to make a scalable peer-to-peer file distribution system

  4. CAN is designed completely Distributed • (does not require any centralized control) • CAN design is Scalable, every part of the system maintains only a small amount of control state and independent of the # of parts • CAN is Fault-tolerance (It provides a rooting even some part of the system is crashed) What is CAN? CAN - Distributed, Internet-Scale, Hash table. CAN provides Insertion, Lookup and Deletion operations under Key, Value pairs (K,V), e.g. file name, file address CAN features

  5. 1-cartesian space, 0.5 + 0.7 = 0.2 CAN architecture 1 Hash Table works on d-dimension Cartesian coordinate space on D-torus • Cyclical d-dimension Space . d-values hash function hash(K)=(x1, …, xd) Cartesian distance

  6. CAN architecture 1 Hash Table works on d-dimension Cartesian coordinate space on D-torus • Cyclical d-dimension Space . d-values hash function hash(K)=(x1, …, xd) Cartesian distance

  7. CAN architecture 1 Hash Table works on d-dimension Cartesian coordinate space on D-torus • Cyclical d-dimension Space . d-values hash function hash(K)=(x1, …, xd) Cartesian distance

  8. 1-cartesian space, 0.5 + 0.7 = 0.2 CAN architecture 1 Hash Table works on d-dimension Cartesian coordinate space on D-torus • Cyclical d-dimension Space . d-values hash function hash(K)=(x1, …, xd) Cartesian distance Coordinate Zone Zone – chunk of the entire Hash Table, a piece of Cartesian space

  9. 1-cartesian space, 0.5 + 0.7 = 0.2 CAN architecture 1 Hash Table works on d-dimension Cartesian coordinate space on D-torus • Cyclical d-dimension Space . d-values hash function hash(K)=(x1, …, xd) Cartesian distance Coordinate Zone Zone – chunk of the entire Hash Table, a piece of Cartesian space Zone is a valid if it has a squared shape

  10. CAN architecture 2 CAN Nodes • Node is machine in the network • Node is not a Peer • Node stores a chunk of Index (Hash Table) Nodes own Zones • Every Node owns one distinct Zone • Node stores a piece of Hash Table and all objects ([K,V] pairs) which belong to its Zone • All Nodes together cover the whole Space (Hash Table)

  11. CAN architecture 3 Neighbors in CAN 2 nodes are neighbors if their zones overlap among d-1 dimensions and abut along one dimension • Node knows IP addresses of all its neighbor Nodes • Node knows Zone coordinates of all neighbors • Node can communicate only with its neighbors

  12. 1. CAN has an associated DNS domain 2. CAN domain name is resolved by DNS domain to Bootstrap server’s IP addresses 3. Bootstrap is special CAN Node which holds only a list of several Nodes are currently in the system 1. A user wants to join the system and sends the request using CAN domain name 2. DNS domain redirects it to one of Bootstraps 3. A Bootstrap sends a list of Nodes to the user 4. The user chooses one of them and establishes a connection. CAN architecture: Access How to get an access to CAN system User scenario

  13. 1. CAN has an associated DNS domain 2. CAN domain name is resolved by DNS domain to Bootstrap server’s IP addresses 3. Bootstrap is special CAN Node which holds only a list of several Nodes are currently in the system 1. A user wants to join the system and sends the request using CAN domain name 2. DNS domain redirects it to one of Bootstraps 3. A Bootstrap sends a list of Nodes to the user 4. The user chooses one of them and establishes a connection. CAN architecture: Access How to get an access to CAN system • 3 level access algorithm • reduces the failure probability. • DNS domain just redirect all requests • Many Bootstraps • Many Nodes in the Bootstrap list User scenario

  14. Start from some Node 2. P = hash value of the Key • Greedy forwarding • Current Node: • Checks whether it or its neighbors contain the point P • IF NOT • Orders the neighbors by Cartesian distance between them and the point P • Forward the search request to the closest one • Repeat step 1 • 3. OTHERWISE • The answer (Key, Value) pair is sent to the user CAN: routing algorithm

  15. Start from some Node 2. P - hash value of the Key • Greedy forwarding • Current Node: • Checks whether it or its neighbors contain the point P • IF NOT • Orders the neighbors by Cartesian distance between them and the point P • Forwards the search request to the closest one • Repeat step 1 • 3. OTHERWISE • The answer (Key, Value) pair is sent to the user CAN: routing algorithm

  16. Start from some Node 2. P - hash value of the Key • Greedy forwarding • Current Node: • Checks whether it or its neighbors contain the point P • IF NOT • Orders the neighbors by Cartesian distance between them and the point P • Forwards the search request to the closest one • Repeat step 1 • 3. OTHERWISE • The answer (Key, Value) pair is sent to the user CAN: routing algorithm

  17. Start from some Node 2. P = hash value of the Key • Greedy forwarding • Current Node: • Checks whether it or its neighbors contain the point P • IF NOT • Orders the neighbors by Cartesian distance between them and the point P • Forwards the search request to the closest one • Repeat step 1 • 3. OTHERWISE • The answer (Key, Value) pair is sent to the user CAN: routing algorithm

  18. Start from some Node 2. P = hash value of the Key • Greedy forwarding • Current Node: • Checks whether it or its neighbors contain the point P • IF NOT • Orders the neighbors by Cartesian distance between them and the point P • Forwards the search request to the closest one • Repeat step 1 • 3. OTHERWISE • The answer (Key, Value) pair is sent to the user CAN: routing algorithm

  19. Start from some Node 2. P = hash value of the Key • Greedy forwarding • Current Node: • Checks whether it or its neighbors contain the point P • IF NOT • Orders the neighbors by Cartesian distance between them and the point P • Forwards the search request to the closest one • Repeat step 1 • 3. OTHERWISE • The answer (Key, Value) pair is sent to the user CAN: routing algorithm

  20. Start from some Node 2. P = hash value of the Key • Greedy forwarding • Current Node: • Checks whether it or its neighbors contain the point P • IF NOT • Orders the neighbors by Cartesian distance between them and the point P • Forwards the search request to the closest one • Repeat step 1 • 3. OTHERWISE • The answer (Key, Value) pair is sent to the user CAN: routing algorithm

  21. Start from some Node 2. P = hash value of the Key • Greedy forwarding • Current Node: • Checks whether it or its neighbors contain the point P • IF NOT • Orders the neighbors by Cartesian distance between them and the point P • Forwards the search request to the closest one • Repeat step 1 • 3. OTHERWISE • The answer (Key, Value) pair is sent to the user CAN: routing algorithm

  22. Start from some Node 2. P = hash value of the Key • Greedy forwarding • Current Node: • Checks whether it or its neighbors contain the point P • IF NOT • Orders the neighbors by Cartesian distance between them and the point P • Forwards the search request to the closest one • Repeat step 1 • 3. OTHERWISE • The answer (Key, Value) pair is sent to the user CAN: routing algorithm

  23. CAN: routing algorithm Average path length is average # hops should be done to reach a destination node • In the case when: • All Zones have the same volume • There is not any crashed Node Total path length = 0 * 1 + 1 * 2d + 2 * 4d + 3 * 6d + 4 * 7d + 5 * 6d + 6 * 4d + 7 * 2d + 8 * 1

  24. CAN: routing algorithm Average path length is average # should be done to reach a destination node • In the case when: • All Zones have the same volume • There is not any crashed Node Total path length = 0 * 1 + 1 * 2d + 2 * 4d + 3 * 6d + 4 * 7d + 5 * 6d + 6 * 4d + 7 * 2d + 8 * 1

  25. CAN: routing algorithm Average path length is average # should be done to reach a destination node • In the case when: • All Zones have the same volume • There is not any crashed Node Total path length = 0 * 1 + 1 * 2d + 2 * 4d + 3 * 6d + 4 * 7d + 5 * 6d + 6 * 4d + 7 * 2d + 8 * 1

  26. Start from some Node 2. P = hash value of the Key 3. Greedy forwarding • Before sending the request, the current node checks for neighbor’s availability • The request is sent to the best available node CAN: routing algorithm Fault tolerance routing

  27. Start from some Node 2. P = hash value of the Key 3. Greedy forwarding • Before sending the request, the current node checks for neighbor’s availability • The request is sent to the best available node CAN: routing algorithm Fault tolerance routing

  28. Start from some Node 2. P = hash value of the Key 3. Greedy forwarding • Before sending the request, the current node checks for neighbor’s availability • The request is sent to the best available node CAN: routing algorithm Fault tolerance routing

  29. Start from some Node 2. P = hash value of the Key 3. Greedy forwarding • Before sending the request, the current node checks for neighbor’s availability • The request is sent to the best available node CAN: routing algorithm Fault tolerance routing

  30. Start from some Node 2. P = hash value of the Key 3. Greedy forwarding • Before sending the request, the current node checks for neighbor’s availability • The request is sent to the best available node CAN: routing algorithm Fault tolerance routing The destination Node will be reached If there exists at least one path

  31. CAN construction: New Node arrival 1 New Node, a server in internet wants to join the system and shares a piece of Hash Table. • New Node needs to get an access to the CAN • The system should allocate a piece of Hash Table to the New Node • New Node should start working in the system: provide routing 1. Finding an access point • New Node uses the basic algorithm described later: • Sends a request to the CAN domain name • Gets a IP address of one of the Node currently in the system • Connects to this Node

  32. 1. Randomly choose a point P 2. JOIN request is sent to the P-owner node 3. The request is forwarded via CAN routing • 4. Desired node (P-owner) splits its Zone in half • One half is assigned to the New Node • Another half stays with Old Node 5. Zone is split along only one dimension: The greatest dim. with the lowest order 6. Hash table contents associated with New Node’s Zone are moved from Old Node to the New Node CAN construction: New Node arrival 2 2. Finding a Zone

  33. 1. Randomly choose a point P 2. JOIN request is sent to the P-owner node 3. The request is forwarded via CAN routing • 4. Desired node (P-owner) splits its Zone in half • One half is assigned to the New Node • Another half stays with Old Node 5. Zone is split along only one dimension: The greatest dim. with the lowest order 6. Hash table contents associated with New Node’s Zone are moved from Old Node to the New Node CAN construction: New Node arrival 2 2. Finding a Zone

  34. 1. Randomly choose a point P 2. JOIN request is sent to the P-owner node 3. The request is forwarded via CAN routing • 4. Desired node (P-owner) splits its Zone in half • One half is assigned to the New Node • Another half stays with Old Node 5. Zone is split along only one dimension: The greatest dim. with the lowest order 6. Hash table contents associated with New Node’s Zone are moved from Old Node to the New Node CAN construction: New Node arrival 2 2. Finding a Zone

  35. 1. Randomly choose a point P 2. JOIN request is sent to the P-owner node 3. The request is forwarded via CAN routing • 4. Desired node (P-owner) splits its Zone in half • One half is assigned to the New Node • Another half stays with Old Node 5. Zone is split among only one dimension: The greatest dim. with the lowest order 6. Hash table contents associated with New Node’s Zone are moved from Old Node to the New Node CAN construction: New Node arrival 2 2. Finding a Zone

  36. 1. Randomly choose a point P 2. JOIN request is sent to the P-owner node 3. The request is forwarded via CAN routing • 4. Desired node (P-owner) splits its Zone in half • One half is assigned to the New Node • Another half stays with Old Node 5. Zone is split along only one dimension: The greatest dim. with the lowest order 6. Hash table contents associated with New Node’s Zone are moved from Old Node to the New Node CAN construction: New Node arrival 2 2. Finding a Zone

  37. CAN construction: New Node arrival 3 3. Joining the routing 1. New Node gets a list of neighbors from Old Node (old owner of the split Zone) • 2. Old Node refreshes its list of neighbors: • Removes the lost neighbors • Adds New Node • 3. All neighbors get a message to update their neighbor lists: • Remove Old Node • Add New Node

  38. CAN construction: New Node arrival 3 3. Joining the routing 1. New Node gets a list of neighbors from Old Node (old owner of the split Zone) • 2. Old Node refreshes its list of neighbors: • Removes the lost neighbors • Adds New Node • 3. All neighbors get a message to update their neighbor lists: • Remove Old Node • Add New Node

  39. CAN construction: New Node arrival 3 3. Joining the routing 1. New Node gets a list of neighbors from Old Node (old owner of the split Zone) • 2. Old Node refreshes its list of neighbors: • Removes the lost neighbors • Adds New Node • 3. All neighbors get a message to update their neighbor lists: • Remove Old Node • Add New Node

  40. CAN construction: New Node arrival 3 3. Joining the routing 1. New Node gets a list of neighbors from Old Node (old owner of the split Zone) • 2. Old Node refreshes its list of neighbors: • Removes the lost neighbors • Adds New Node • 3. All neighbors get a message to update their neighbor lists: • Remove Old Node • Add New Node

  41. CAN construction: Node departure 1 Node departure a. If Zone of one of the neighbors can be merged with departing Node’s Zone to produce a valid Zone. This neighbors handles merged Zone b. Otherwise one of the neighbors handles two different zones

  42. CAN construction: Node departure 1 2. Node departure a. If Zone of one of the neighbors can be merged with departing Node’s Zone to produce a valid Zone. This neighbors handles merged Zone b. Otherwise one of the neighbors handles two different zones

  43. CAN construction: Node departure 1 1. Node departure a. If Zone of one of the neighbors can be merged with departing Node’s Zone to produce a valid Zone. This neighbors handles merged Zone b. Otherwise one of the neighbors handles two different zones • In both cases (a and b): • Data from departing Node is moved to the receiving Node • The receiving Node should update its neighbor list • All their neighbors are notified about changes and should update their neighbor lists

  44. CAN construction: Node departure 2 Node is crashed • Periodically every node sends a message to all its neighbors • If Node does not receive from one of its neighbors a message for period of time t it starts a TAKEOVER mechanism • It sends a takeover message to each neighbor of the crashed Node, the neighbor which did not send a periodical message • Neighbors receive a message and compare its own Zone with the Zone of the sender. If it has a smaller Zone it sends a new takeover message to all crashed Node neighbors. • The crashed Node’s Zone is handled by the Node which does not get an answer on its message for period of time t Data stored on the crashed Node are unavailable until source owner refreshes the CAN state.

  45. CAN problems Basic CAN architecture archives: • Scalability, State of distribution • Increasing data availability (Napster, Gnutella) Main problems: • Routing Latency • Path Latency - avg. # of hops per path • Hop Latency - avg. real hop duration • Increasing fault tolerance • Increasing data availability

  46. a. Data Model b. CAN Routing c. CAN construction Content 1. Basic architecture 2. Architecture improvements • Path Latency Improvement • Hop Latency Improvement • Mixed approaches • Construction Improvement 3. Summary

  47. Path latency Improvements 1 Realities: multiple coordinate spaces • Maintain multiple (R) coordinate spaces with each Node • Each coordinate Space is called Reality • All Realities have • The same # of Zones • The same data • The same hash function • Every Node contains different Zones in different Realities, all zones are chosen randomly • Contents of hash table replicated on every reality

  48. Path latency Improvements 2 The extended routing Algorithm for Realities 1. The destination Zone are the same for all realities 2. Each Zone can be own by many Nodes 3. For routing is applied a basic algorithm with following extensions: a. Every Node on the path checks in which of its realities a distance to the destination is the closest one b. The request is forwarded in the best Reality

  49. Path latency Improvements 2 The extended routing Algorithm for Realities 1. The destination Zone are the same for all realities 2. Each Zone can be own by many Nodes 3. For routing is applied a basic algorithm with following extensions: a. Every Node on the path checks in which of its realities a distance to the destination is the closest one b. The request is forwarded in the best Reality

  50. Path latency Improvements 2 The extended routing Algorithm for Realities 1. The destination Zone are the same for all realities 2. Each Zone can be own by many Nodes 3. For routing is applied a basic algorithm with following extensions: a. Every Node on the path checks in which of its realities a distance to the destination is the closest one b. The request is forwarded in the best Reality

More Related