270 likes | 417 Views
Distributed Systems - Plan 3 Report 2. Siddharth Sarasvati Karthikeyan Karur Balu. Introduction. Traditional distributed system issues Load Balancing Data Integrity Performance Common approaches for load balancing Virtual Servers ID Reassignment Multiple random choice scheme
E N D
Distributed Systems - Plan 3 Report 2 SiddharthSarasvati KarthikeyanKarurBalu
Introduction • Traditional distributed system issues • Load Balancing • Data Integrity • Performance • Common approaches for load balancing • Virtual Servers • ID Reassignment • Multiple random choice scheme • Local Probing
Research paper I • Author: Gurmeet Singh Manku • Title: Balanced binary trees for ID management and load balance in distributed hash tables • Conference: Proceedings of the twenty-third annual ACM symposium on Principles of distributed computing • Year: 2004 • URL: http://dl.acm.org/citation.cfm?id=1011797
The ID Assignment Problem • How does a new host acquire an ID? • No global knowledge of “current set of ID’s” • Low cost (# messages) • Almost equi-sized partitions • This paper presents a low-cost, decentralized algorithm for ID management in DHT
Naïve ID Assignment Choose ‘r’ a random number in [0,1) σ = θ(n log n) with n hosts in the system σ >100 when n = 4K Can we do better? perhaps.. If we could “learn” a few ID’s σ= Partition-balance ratio = Ratio of the largest to the smallest partition
The Algorithm • Upon arrival, a host identifies the manager of a random number in [0, 1) • Identifies the IDs of ‘c log n’ hosts adjacent to the manager along the circle • Splits the largest manager into two.
0 1 .01101 .01100 .0001 .0000 Only leaf nodes correspond to IDs in [0, 1) Balance Binary Trees • Small fraction of internal nodes are marked active • For every leaf node, exactly one internal node along the path from that leaf node to the root is active • Insertion done in 3 steps
Claim RANDOM walk down the tree Walk up until sub-tree has (c log n) leaves Split the “shallowest leaf” below the sub-tree Claim: A newly-arrived host needs (R + log n) messages Leaves in at most 3 different levels So σ 4
Features of Algorithm • Generality: Independent of overlay network topology • Low cost: Θ(R + log n) • Optimal re-assignments • Handles host “departures” with only 1 re-assignment • “arrivals” require no re-assignments • Small partition balance Ratio(σ 4) optimal
Research Paper 2 • Author: Brighten Godfrey, KarthikLakshminarayanan, SoneshSurana, Richard Karp and Ion Stoica • Title: Load Balancing in Dynamic Structured P2P Systems • Conference: Proceedings of IEEE Infocom, Hong Kong, March 2004 • Year: 2004 • URL: http://www.cs.berkeley.edu/~karthik/research/papers/infocom04.pdf
Goal • Goal : To maintain the system in a state in which load on a node is less than its target • Load : Depends on the particular P2P system. Eg Storage, Bandwidth • Target : Maximum load a node can hold.
Node A Node B Node C Chord Ring Random ID space distribution • Contiguous region of the ID space. • Each node can be responsible for many virtual servers. • Consider Chord Ring.
Random ID space distribution • Contiguous region of the ID space. • Each node can be responsible for many virtual servers. • Consider Chord Ring. Node A Node B Node C Chord Ring
11 20 L=45 15 L=41 3 L=31 L=3 10 20 30 Random Mapping of nodes • May result in Imbalance either from mapping or addition of new data to the system Node A T=50 Node B T=35 Heavy Node C T=15 Chord Ring
L=45 L=31 L=3 30 L=41 ID Space redistribution Choose where L>T and check with other nodes to redistribute the load 11 20 Node A T=50 15 3 Node B T=35 Heavy Node C 10 20 T=15 Chord Ring
L=45 L=31 L=14 L=30 ID Space redistribution Result in maintaining the GOAL, always L <= T 11 20 Node A T=50 15 3 Node B T=35 Node C 10 30 T=15 Chord Ring
H H L L L L H L L Load Balancing Scheme 1: One-to-One Light contacts the node x responsible for it, and accepts load if x is heavy. It takes ~ O(N)^2 operations.
L1 D1 H1 L2 H3 L3 L5 H2 D2 L4 Light nodes Directories Heavy nodes Load Balancing Scheme 2: One-to-Many • Light nodes report their load information to directories. • Directories are present in DHT • Heavy node H gets this information by contacting a directory. • H contacts the light node which can accept the excess load.
Research Paper 3 • Author: Minseok Kwon, Gahyun Park • Title: Distributed Tries for Load Balancing in Peer-to-Peer Systems • Conference: Proceedings of IEEE IWQoS, June 2010 • Year: 2010 • URL: http://www.cs.rit.edu/~jmk/papers/trieload.pdf
Algorithm Goal : If Trie is balanced, ID space will be balanced
Basic Idea (New node Join) • Optimal Path Discovery – A new node travels down the trie from the root taking the path towards the minimum depth • Drawback : Global knowledge of ID space New node
Node join/leave process • ‘y’ joins with a Random ID ‘r’ and locate the host that owns the interval • Starting from ‘r’ it travels up until |id(r)| = number of bits of id(r)
Hypothesis • Distributed Trie for load-balancing in a structured P2P system allows a node to join or leave the system at low cost, R+Θ(log logn), where R denotes the routing cost and n denotes the number of nodes.
Algorithm (Node Join Process) • |id(r)| = number of bits of id(r) • While i < log|id(r)| + 4
Deliverables • Simulate the P2P distributed system with DHT and implement Balanced Binary Tree and the Distributed Trie load balancing algorithms • Graphical representation comparing node arrival and departure cost(routing cost, ID reassignment)
Progress • Comprehensive understanding of the research papers • Discussions on a generic simulation design to fit in different load balancing algorithms as a pluggable module • Analysis of the discussed Load Balancing algorithms