830 likes | 1.08k Views
a presentation by Group F. Implementing DHT based Load Balancing solution for Hadoop Distributed File System. Anupam Gangotia Sri Saran Balaji. Raajamaathangi Kumar Tejashree Gargoti. Team members:. Table of Contents . Introduction, Paper 1 –Presented by Ms. Raajamaathangi Kumar
E N D
a presentation by Group F Implementing DHT based Load Balancing solution for Hadoop Distributed File System • Anupam Gangotia • Sri Saran Balaji • Raajamaathangi Kumar • Tejashree Gargoti Team members:
Table of Contents Introduction, Paper 1 –Presented by Ms. Raajamaathangi Kumar Papers 2 and 2b – Presented by Mr. Sri Saran Balaji Paper 3- Mr. AnupamGangotia Paper 4 , summary of the topic and concluding note – Ms. TejashreeGargoti
Distributed Hash Table Structured peer to peer networks, provides control over placement of object and use directed search protocols. We have number of DHT and these DHT's differ by rules they use for associating the object to nodes, their routing and lookup protocol. Problem in DHT : Not flexible over data placement because of: Non uniform file size. Time varying file popularity Node heterogeneity in capacity
Paper 1:Simple efficient load balancing algorithms for peer-to-peer systems Overview • The paper mainly discusses about two new protocols that refines Consistent Hashingthat exists in Chord. • Consistent hashing has a common address space on which items(files) and nodes(machines) are mapped 1)The first protocol balances the distribution of key address space to nodes which gives a balanced system despite DHT's random mapping behavior 2)The second protocol balances the distribution of items among nodes
DHT and load balancing • How is DHT different from traditional hash table ? • DHT enables insertion and deletion of buckets/nodes in the network and maintains the changes in hash table at each node • It uses a routing protocol to keep the routing table up-to-date
Load balancing in DHT • Load balancing in DHT works as follows • Randomizes DHT address associated with each item with a strong hash function • Making each node responsible only for a balanced portion of the address space
DHT may fail in the following ways… • Randomization of address space is not completely balanced. Some nodes may end up having larger portion of the address space • In some applications like database systems, some items may need to be placed in specific order which might not be possible in the scheme above.
Load balancing problems • Problem?Partition of address space is not even. Some machines get larger portion of address space • Address space balancing solution • DHT uses virtual nodes. Each machine pretends to be several distinct machines, each participating independently in the DHT protocol. The machine’s load is determined by summing over several virtual nodes, thus maintaining the load value near the average load value. • Drawbacks of virtual nodes are -Waste of data structure space- Checking alive status of virtual nodes and maintaining a huge number of virtual nodes will consume network bandwidth
Load balancing problems • Address space balancing (contd…) • Drawback with virtual nodes can be overcome by • Activating only one of the O(log n) virtual nodes • Steps involved • Find an inactive virtual node • Migrate to one virtual node from parent if the distribution of overall system is changed • This address prevents address spoofing because each parent node only needs limited legitimate virtual nodes
Load balancing problems • Item balancing • In database application, when range search is performed over a list of items due to the randomization of item's key value, the search cannot be possible. Also sometimes even with load balancing, uneven distribution of items may be possible. • Solution proposed? • Allow nodes to move to arbitrary address space • ‘Work Stealing’- Migrate under-loaded nodes to space where more items are allocated • Balancing weighted items
Proposed Load balancing schemes for Data Storage applications in Peer to peer systems • The first protocol proposed improves consistent hashing in the following way. Every node is responsible for a O(1/n) fraction of the address space with high probability, without using virtual nodes. (i)O(log n) degree per real node - A node has O(log n) positions to choose from(ii) O(log n/log log n) lookup hops - if a node has changed position, then number of nodes that has to change positions are O(log log n)(iii) constant factor load balance - the state of each active virtual node is Independent of any insertion or deletion of any node/item • Second protocol considers arbitrary distributions of keys, which forces the protocol to allow nodes to move to arbitrary addresses to address database range selection problem.
Improvements in the current scheme • Three important aspects of improvement are • Address assigned for a node depends on the rest of the network and not just itself (for e.g. node address is not chosen from a list of possible addresses ) • Address assignments will depend on construction history • Load balancing guarantee is made only for insertion of nodes
Address space balancing • Ideal state • Given any set of active nodes, each (possibly inactive) potential node “spans” a certain range of addresses between itself and the succeeding active node on the address ring • Each node’s set of potential nodes depends on itself which is computed by h(i,1), h(i,2), …,h(i,c log n) where I is the node identifier.
Item balancing • Item balancing • Each node i occasionally contacts another node j at random. If ℓi ≤ εℓ j or ℓ j ≤ εℓi then the nodes perform a load balancing operation (assume wlog that ℓi > ℓ j ), distinguishing two cases Case 1: In this case, iis the successor of j and the two nodes handle adjacent address intervals. Node j increases its address so that the (ℓi−ℓ j)/2 items with lowest addresses in i’s interval get reassigned from node ito node j. Both nodes end up with load (ℓi+ℓ j)/2. Case 2: i!= j+1: If ℓ j+1 > ℓi, then we set i:= j +1 and go to case 1. Otherwise, node j moves between nodes i−1 and Ito capture half of node i’s items. This means that node j’s items are now handled by its former successor, node j+1.
Item balancing Lemma 2 Starting with an arbitrary load distribution, if every node contacts O(logn) random nodes for the above protocol, then whp all nodes will end up with a load of at most (16/e) L. Another round of everyone contacting O(logn) other nodes will also bring all loads to at least (e/16)L.
Item balancing Theorem 3 If each node contacts omega(log n) other random nodes per half-life as well as whenever its own load doubles or halves, then the above protocol has the following properties. (i) With high probability, the load of all nodes remains between (e/16)L and (16/e) L. (ii) The amortized number of items moved due to load balancing is O(1) per item insertion or deletion, and O(L) per node insertion or deletion.
Other Load Balancing Discussions… • Item insertion • Item deletion • Node insertion • Node deletion • Load balancing operation
Other Key Features • Selecting Random numbers • Weighted Items Corollary 4 Theorem 3 continues to hold for weighted item with the following changes: (i) Load can be balanced only up to what the items’ weights allow locally (see previous discussion). (ii) The amortized total weight moved upon the insertion or deletion of an item with weight w is O(w) • Routing in Skewed distributions • Range Searches
Paper 2 :Simple load balancing for distributed hash tables Problem with current implementation of Chord: In Chord, we have n peers in the network with a unique ID. When hash function produces same hash key for all the files. All files will be mapped to a single peer in the network. Therefore only one peer is loaded fully whereas the other peers may not be loaded fully. Solution: Use more than one hash function to map a single file content to more than one peer.
DHT Implementation using two hash functions • Step 1: Universal hash function h is used to map each peer in the network with Hash Id. • Step 2: we will have k hash functions to map the same file content to k peers in the network. • Step 3: Then k lookups are executed in parallel to find the peers p1,p2,..,pk responsible for these hash values, according to the mapping given by h0.
DHT Implementation using two hash functions Step 4: After the querying the load of the each peer, the peer pi with lowest load is chosen to store the item x. Step 5: In addition to storing the item x at peer pi, redirection pointers is stored at all the other peers pj.
DHT Implementation using two hash functions Searching Algorithm : To search for the item X, a peer now performs a single query, by choosing a hash functions hj at random. If the peer pj doesn’t have the item X, then it redirects the query to the peer pi using the redirection pointer stored at pj ( x -> pi). We assume that soft state approach is used.
DHT Implementation using two hash functions • Load Stealing: • Underutilized peer p1 seeks out load to take from more heavily utilized peers. • The load stealing peer finds such a peer p2 and takes responsibility for an item x by making the replica of x. • Create a redirection pointer in p2 to p1 for item x. • p1 attempts to steal items for which p1 currently has a redirection pointer.
DHT Implementation using two hash functions • Load Shedding: • Overloaded peer p1 attempts to offload work to a less loaded peer p2. • Creates a redirection pointer in p1 ( x -> p2).
Paper 2b - Hash based proximity clustering for load balancing in Heterogeneous DHT Networks Existing solutions: Using virtual servers : For each real node O(log n) virtual servers are assigned and the keys are mapped onto the virtual servers. This incurs large space overhead and lookup efficiency is low. Churn Resilient Algorithm : When a node's fraction of capacity used exceeds a threshold , its excess virtual nodes will be moved to a lighter nodes.
Hash based proximity clustering for load balancing in Heterogeneous DHT Networks • Existing Solutions: • Item Distribution Scheme: Randomized load balancing algorithm: • Every node contacts a random other nodes and move the items between the nodes. • Does not consider the proximity information. • Proximity Aware Algorithm: It takes node proximity information in load balancing. • It is based on additional network constructed on top of chord. • Extra cost for construction of n/w • Locality Aware Randomized load balancing algorithm: It deals with both proximity and dynamic features of DHT.
Hash based proximity clustering for load balancing in Heterogeneous DHT Networks LandMark Clustering: • This approach is used to generate proximity information. • It is based on the fact on that all nodes close to each other are likely to have same distance to few selected landmark nodes. • Process : • Each node measures its distance to m landmarks • Calculate the co-ordinates in cartesian space. • Node is numbered(Hilbert Number) based on the grid it falls(totally 2mx grids).
Hash based proximity clustering for load balancing in Heterogeneous DHT Networks Proposed Solutions: • In this clustering, we distinguish nodes as • Super Nodes, which have high capacity and fast connection. • Regular Nodes, which have low capacity and slow connection. • Overlay Network for super node is constructed for load balancing. • Super nodes are designated dynamically according to their capacity. • Each Super nodes operates as a server to its associated regular nodes. • Regular nodes are clustered and associated to the supernode by consistent hashing of their physical proximity information.
Network can be physical (pCluster) or virtual (vCluster). Pcluster: Each node is connected to its physically closest supernode and all supernodes form DHT. Vcluster : Each node is connected to logically closest supernode in their ID space. Supernode s in n's SuperNode table will be physically close in pCluster and logically close in vCluster. Types of Clusters used in this implementation
Types of Clusters used in this implementation Physical Clustering : • SuperNode Hilbert number will be logical ID and regular nodes will be key in top level network. • Regular node is assigned to super node whose ID is closest to the node ID's. • When two supernodes have same Hilbert Number, one SuperNode is chosen and others become its backups. • Consistent hashing is required for re-association of regurlar nodes when super nodes leaves and joins network.
Types of Clusters used in this implementation pCluster • Regular nodes are assigned to a Super Node by using varient of consistent hashing: • Regular nodes are stored in the super node which are closer to its node's ID. • Therefore physically closest nodes will be in one cluster. • Proximity Neighbor Selection Algorithm is used to built Super Node routing table. • It selects routing table entries pointing to the physically close nodes among all node ID's in the desired ID space.
Node joining Algorithm • Hilbert number of a node is calculated and using this number, super node of n is found • If the super node s capacity is below threshold value then this node will be assigned as a regular node to s. • Otherwise if n is a super node, then n joins the auxiliary super node network and takes the load from its successor and predecessors. • If n Hilbert number is same as S number then n will be added to the backup list of S.
Pcluster – Algorithm → when a node leaves a network When a regular node leaves a network, its super node will be updated. When a super node leaves a network: regular nodes are reassigned to its successor or predecessor based on the their closeness. Pcluster uses lazy update to cope with the failures of nodes. Regular nodes sends signal periodically to its super node to check whether the super node is still alive or not. When a regular node does not get reply from the super node after some time t then it assumes that s failed and it re assigns itself to other super node Node leaving and Node Failure Algorithm
Load Balancing • The regular nodes periodically sends its load information to the super nodes. • Super nodes does the load management and ask the heavy weighted node to move its load to light weighted nodes. • Each super nodes has two list, • Donating list - It contains information about light weighted nodes • Starving list - It contains information about heavy weighted nodes • Local Load balancing - Initially super nodes does load balancing among its regular nodes first, it ask the nodes in the starving list to move its loads to the node in the Donating list. • Global Load Balancing - Then the super nodes does the load balancing across the other nodes in the top level DHT
vCluster: It records the proximity information in the original DHT Network itself When a node want to join, it contacts the node whose ID is equal to or slightly greater than its Hilbert number. Implementation is similar to Chord where instead of put(key,object), we will use put(HilbertNumber, LoadInfo). If this node is a regular node, then it forward the details to its the super node.
Types of Clusters used in this implementation vCluster : Node joining : • Super node does load rearrangement and transfer the nodes to the corresponding super nodes. • Finally Physically closest nodes will report their load to same or logically close supernode in the load balancing problem. • Problem: Initially Node information is gathered in a super node whose ID is closest to H. • But after load rearrangement, the load has to be transferred to the respective super node, irrespective of their distance.
Paper 3-Efficient Routing for peer to peer overlays Most current peer-to-peer lookup schemes keep • a small amount of routing state per node(logarithmic in the number of overlay nodes.) • routing information at each member node is kept small to keep the book-keeping task minimized. • Which increases lookup’s Latency And that’s a problem ! Why Lookup’s latency increases • each lookup requires contacting several nodes in sequence. • If we can reduce this sequence, we can gain fairly good decrease in latency.
There are two approaches : one hop and two hop • In one hop, each router maintains routing tables with complete membership information • In two hop, a fixed fraction of the complete routing state is kept on each node, in such a manner that first hop has low latency to make the additional delay small.
Key Points • How the algorithm handles membership changes. • How the algorithm reacts to node failures and presents an informal correctness argument. • Asymmetry in the load of individual nodes. • Analysis of the band-width requirements. • Structure Followed • Membership Changes • Our goal is to do this in a way that has low notification delay yet reasonable bandwidth consumption, since bandwidth is likely to be the scarcest resource in the system
Imposing Hierarchy on the System • Dividing the 128-bit circular identifier space into k equal contiguous intervals called slices. • The ith slice contains all nodes currently in the overlay whose node identifiers lie in the range [i*2^(128)/k,(i + 1)*2^(128)/k] • Each slice has a slice leader. • each slice is divided into equal-sized intervals called units. • Each unit has a unit leader, which is dynamically chosen as the successor of the mid-point of the unit identifier space.
Information Flow in one hop • When a node detects a change in membership , it sends an event notification message to its slice leader. • The slice leader collects all event notifications it receives from its own slice and aggregates them for a specific time t(big) before sending a message to other slice leaders. • To spread out bandwidth utilization, communication with different slice leaders is not synchronized: the slice leader ensures only that it communicates with each individual slice leader once every t(big) seconds. • Messages to different slice leaders are sent at different points in time and contain different sets of events.
Information Flow in one hop(contd…) • The slice leaders aggregate messages they receive for a short time period t(wait) and then dispatch the aggregate message to all unit leaders of their respective slices. • A unit leader piggybacks this information on its keep-alive messages to its successor and predecessor. • Information flow in one direction.(predecessor to successor or vice versa). • Nodes at unit boundaries do not send information to their neighboring nodes outside their unit
Worst case scenario timeline • t_total= t_detect + t_wait + t_small + t_big • t_wait : is the frequency with which slice leaders communicate with their unit leaders. • t_small : is the time it takes to propagate information throughout a unit • t_big : is the time a slice leader waits between communications to some other slice leader. • t_detect : represents the delay between the time an event occurs and when the leader of that slice first learns about it
Advantages • Efficient bandwidth usage ? • well-defined event dissemination trees, helps us ensure that there is no redundancy in communications. • aggregation of several events into one • message allows us to avoid small messages. • We chose a three level hierarchy because it has low delay , yet bandwidth consumption at top level nodes is reasonable.