1 / 24

CAN

CAN. Distributed Hash Tables DHT recap Uses Example – CAN. What are DHTs ?. A DHT is a topology that provides similar functionality to a typical hash table. put(key , value) get(key ) Peers are buckets in the table with their own local hash tables

terah
Download Presentation

CAN

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CAN • Distributed Hash Tables • DHT recap • Uses • Example – CAN

  2. What are DHTs? • A DHT is a topology that provides similar functionality to a typical hash table. • put(key, value) • get(key) • Peers are buckets in the table • with their own local hash tables • Allows a peer to publish a resource onto a network using a key to determine where the data will be stored (i.e. which peer will receive the data). • Using keys presupposes a logical ‘space’ which the keys map onto. • The key is mapped to the space using a hashing function to ensure equal distribution of resources across the network. • Nodes are responsible for sections of this space.

  3. Why DHTs? • Address the flooding issue without resorting to centralized/decentralized architecture. • no super peers, no power law distribution • Typically search can be achieved in O(logn) hops where n is the number of nodes in the network. • only a few neighbors need to be known – typically O(logn) • small neighborhoods and flat topology makes for a robust network, easy to handle churn. • DHTs guarantee locating a file (or where it should be) • deterministic • unlike unstructured systems such as Gnutella, KaZaA

  4. DHTs Uses • Not just file sharing • BitTorrent trackers in a DHT • super node-like caching (Skype possibly does this) • location independent naming service • typically DHTs are used as a backbone of relatively powerful nodes supporting weaker nodes • because DHTs are flat and presume equality of capability.

  5. Content Addressable Network - CAN • CAN stands for ‘content-addressable network’. • A network that provides a routing overlay on top of a physical network to optimise publishing and searching for data. • Addresses the ‘flooding’ issue. • CAN is based on the Distributed Hash Table (DHT) concept.

  6. How does CAN work? • The CAN space is defined as a d-dimensional Cartesian coordinate space. • At any given time the entire coordinate space is divided amongst the nodes in the system. • Each node owns its own distinct zone within the overall space. • A uniform hashing algorithm is used to map a key to coordinate space • k -> coord{x, y, z} • not specified by CAN • as long as it has the properties of a uniform hash, i.e., even distribution. • messages contain the destination coordinates. • and are routed to the peer whose zone contains the coordinates

  7. Example 2-D Space 0,1 1,1 B (0-0.5,0.5-1) D (0.5-0.75, 0.5-1) E (0.75-1, 0.5-1) A (0-0.5,0-0.5) C (0.5-1,0-0.5) 1,0 0,0

  8. Neighbours • A node is considered a neighbour if its zone overlaps along d-1 dimensions and abuts along one dimension. • A node maintains info about its neighbours – a contact address and its zone coordinates. • An evenly divided space means each node has 2d neighbours

  9. Neighbours(torus - the space wraps) B (A, D, E) D (B, E, C) E (B, D, C) A (B, C) C (A, D, E)

  10. Routing • Routing happens by following a straight line path through the Cartesian space from source to destination coordinates. • A message with destination point P is routed to the neighbour whose zone coordinates are closest to P. • There are multiple paths available at any point.

  11. Routing P(x,y) to P(x,y)

  12. Joining the CAN • To join the CAN, as with many other systems, a node needs a bootstrap node, i.e. the address and coords of a node already in the system. • When a new node wants to join it randomly chooses a point in the coordinate system. • The message is routed to the node whose coordinate space contains the point. • That node splits its space in half, keeping one half and handing over the other to the new node.

  13. …Joining 6 2 9 chooses a point (x,y) In the space. The bootstrap node initiates the Routing. Node 1 splits and hands Over half its zone and Relevant neighbours to 9. 9 wants to join. It Finds a bootstrap Node (out of bounds). Let’s say it’s 5. Node 1’s zone Contains point (x,y) 1 9 3 1 (x,y) 4 5 8 9 Bootstrap node

  14. CAN Structure • When a node wants to leave the network it finds a neighbour it can merge its zone with. • Because the coordinate space is recursively divided in half, the network can be though of as a binary tree in which every network node/zone is a leaf on the tree. • Vertices are previously partitioned zones in a particular dimension. • If a node takes over a sibling’s zone both child nodes of the tree are merged and become their parent.

  15. CAN Seen as a Tree 1 3 4 2

  16. Scalability • The number of neighbours maintained by a node is a function of the amount of dimensions not the overall size of the CAN – so more dimensions – more neighbours. • However, more dimensions means shorter routing paths. As the node number increases, the routing paths grow as O(n1/d). • i.e. tradeoff between neighbour numbers (and hence maintenance overhead) and path length

  17. Maintenance of the CAN • Nodes send each other periodic update messages. • zone coords, list of neighbour nodes and their coords • If a node doesn’t hear from a neighbour after a given amount of time, it initiates a TAKEOVER. • starts a timer that is relative to its zone size • the bigger its zone size, the longer the timer • All the neighbours of the dead node are doing this. • The one with the shortest timer times out, and tells the dead node’s neighbours • It then tries to merge its zone with the dead node’s zone. • If a complete zone cannot be made through merging, the smallest known node temporarily looks after two zones • A zone can be merged with another if itscoords in d-1 dimensions overlap and the remaining dimensions abut and have an equal width.

  18. CAN Repair Algorithm • Easiest envisaged using the binary tree conception. • If a node finds itself with a zone(call it zone B) that it cannot merge with its existing zone (A) : • check if B has a sibling that is also a leaf • this is easy - merge • if not, perform a depth first search down the subtree rooted at B, until two siblings are found. • then merge those as usual • (leaves one node without a zone) • hand over B to the node with no zone after merging

  19. Tree Again depth first search for node with sibling B A

  20. Tree Again merge siblings into parent node B A

  21. Tree Again assign empty node to unoccupied zone B A

  22. Resource Availability • If a node fails, then the resources go with it. • So entities that publish need to periodically refresh the data. • Resource duplication is another mechanism. This can be done in two ways: • Overloading coordinate zones • multiple nodes share zones and the resources mapped to them. • They know about each other • data can be either replicated or partitioned • new peers can choose neighbours with a lower latency • Multiple ‘realities’…

  23. Realities Realities can be ‘stacked’ so a node maintains different zones in different realities concurrently. This allows for duplication of resources and shorter routing paths – i.e. in a 2-D coordinate system routing can happen ‘horizontally’ and ‘vertically’.

  24. Further Design Extensions • Caching • because of multiple paths to a key, caches can grow with the popularity of a key • i.e. if I route a particular key many times, I can decide to get the data and store it myself. • Location awareness • use landmarks, e.g. DNS root name servers • measure Round Trip Time (RTT) • order nodes according to RTT • when joining, choose a node whose landmark ordering is similar • means close nodes make up neighbourhoods

More Related