270 likes | 421 Views
Towards Efficient Load Balancing in Structured P2P Systems. Yingwu Zhu, Yiming Hu University of Cincinnati. Outline. Motivation and Preliminaries Load balancing scheme Evaluation. Why Load Balancing?. Structured P2P systems, e.g., Chord,Pastry
E N D
Towards Efficient Load Balancing in Structured P2P Systems Yingwu Zhu, Yiming Hu University of Cincinnati
Outline • Motivation and Preliminaries • Load balancing scheme • Evaluation
Why Load Balancing? • Structured P2P systems, e.g., Chord,Pastry • Object IDs and Node IDs are produced by using a uniform hash function. • Results in O(log N) load imbalance, in the number of objects stored at each node. • Skewed distribution of node capacity • Nodes may carry loads proportional to their capacities. • Other problems: different object sizes, non-uniform dist. of object IDs.
Virtual Servers (VS) • First introduced in Chord/CFS. • A VS is responsible for a contiguous region of the ID space. • A node can host multiple VSs. Node A Node B Node C Chord Ring
11 20 L=45 15 3 L=41 L=3 10 30 Virtual Sever Reassignment • Virtual server is the basic unit of load movement, allowing load to be transferred between nodes. • L – Load, T – Target Load. Node A T=50 Node B T=35 Heavy Node C T=15 Chord Ring
11 20 L=45 15 3 L=41 L=3 10 30 Virtual Sever Reassignment • Virtual server is the basic unit of load movement, allowing load to be transferred between nodes. • L – Load, T – Target Load. Node A T=50 Node B T=35 Heavy Node C T=15 Chord Ring
11 20 L=45 15 3 L=31 L=14 10 30 L=30 Virtual Sever Reassignment • Virtual server is the basic unit of load movement, allowing load to be transferred between nodes. • L – Load, T – Target Load. Node A T=50 Node B T=35 Node C T=15 Chord Ring
Advantages of Virtual Servers • Flexible: load is moved in the unit of a virtual server. • Simple: • VS movement is supported by all structured P2P systems. • Simulated by a leave operation followed by a join operation.
Current Load Balancing Solutions • Some use the concept of virtual server • However: • Either ignore the heterogeneity of node capabilities. • Or transfer loads without considering proximity relationships between nodes. • Or both.
Goals • Goals: • To maintain each node’s load less than its targetload (maximum load a node is willing to take). • High capacity nodes take more loads. • Load balancing is performed in proximity-aware manner, to minimize the overhead of load movement (bandwidth usage) and allow more efficient and fast load balancing. • Load: depends on the particular P2P systems. • E.g., storage, network bandwidth, and CPU cycles.
Assumptions • Nodes in system are cooperative. • Only one bottlenecked resource, e.g., storage or network bandwidth. • The load of each virtual server is stable over the timescale when load balancing is performed.
Overview of Design • Step1: Load balancing information (LBI) aggregation, e.g., load and capacity info. • Step2: Node classification. E.g., heavy nodes, light nodes, neutral nodes. • Step3: Virtual server assignment (VSA). • Step4: Virtual server transferring (VST). • Proximity-aware load balancing • VSA is proximity-aware.
<35,30,4> <27,18,2> <12,10,2> <15,8,3> <20,10,5> <15,20,4> LBI Aggregation and Node Classification • Rely on a fully decentralized, self-repairing, and fault-tolerant K-nary tree built on top of a DHT (distributed hash table). • Each K-nary tree node is planted in a DHT node. • <L, C, Lmin> represents the load, capacity and the minimum load of virtual servers, respectively. <62, 48, 2>
<62, 48, 2> <62, 48, 2> Heavy Light Heavy Light <12,10,2> <15,8,3> <20,10,5> <15,20,4> <62, 48, 2> <62, 48, 2> <62, 48, 2> <62, 48, 2> LBI Aggregation and Node Classification • Relying on a fully decentralized, self-repairing, and fault-tolerant K-nary tree built on top of a DHT. • Each K-nary tree node is planted in a DHT node. • <L, C, Lmin> represents the load, capacity, and the minimum load of virtual servers. <62, 48, 2> Ti = (L/C+)*Ci
Final rendezvous point Rendezvous point: best fit heuristics Unpaired VSA information Rendezvous point: best-fit heuristics VSA information VSA information Logically close Hm+1 Hm Ln Ln+1 H2 L1 H1 H3 V21 V31, V32 Vm1, Vm2 Vm+1 V11, V12 C1 Cn Cn+1 Virtual Server Assignment VSA happens earlier between logically closer nodes …
H1 L3 L2 V2 V3 L4 L1 H2 V1 Virtual Server Assignment • DHT identifier space-based VSA: • VSA happens earlier between logically closer nodes. • Proximity-ignorant, because logically close nodes in DHT do NOT mean they are physically close together. [1] Nodes in same colors are physically close to each other. [2] H – heavy nodes, L – light nodes. [3] Vi – virtual servers.
H1 L3 L2 V2 V1 L4 L1 H2 V3 Proximity-Aware VSA • Nodes in same colors are physically close to each other. • H – heavy node, L – light node, Vi – virtual server. • VSs are assigned between physically close nodes.
Proximity-Aware VSA • Use landmark clustering to generate proximity information, e.g. landmark vectors. • Use space-filling curves (e.g., Hilbert curve): Landmark vectors Hilbert numbers as DHT keys. • Heavy nodes and light nodes each puts/maps their VSA info. into the underlying DHT with the resulting DHT keys: align physical closeness with logical closeness. • Each virtual server independently reports the VSA info. which is mapped into its responsible region, rather than its node’s own VSA info.
Rendezvous point: best fit heuristics Unpaired VSA information Rendezvous point: best-fit heuristics VSA information VSA information Hm Ln+1 Ln Hm+1 H2 L1 H1 H3 V21 V31, V32 Vm1, Vm2 Vm+1 V11, V12 C1 Cn Cn+1 Proximity-Aware Virtual Server Assignment VSA happens earlier between physically closer nodes Final rendezvous point Physically close …
Experimental Setup • A K-nary tree built on top of a DHT (Chord), e.g., k=2, and 8, respectively. • Two node capacity distributions: • Gnutella-like capacity profile, 5-level capacities. • Zipf-like capacity profile. • Two load distributions of virtual servers: • Gaussian dist. and Pareto dist. • Two transit-stub topologies (5,000 nodes): • “ts5k-large” and “ts5k-small”.
High Capacity Nodes Carry More Loads Gaussian load distribution + Gnutella-like capacity profile
High Capacity Nodes Carry More Loads Pareto load distribution + Zipf-like capacity profile
Proximity-Aware Load Balancing More loads are moved within shorter distances by proximity-aware load balancing. Gaussian load distribution and Gnutella-like capacity profile Pareto load distribution and Zipf-like capacity profile CDF of Moved Load Distribution in ts5k-large
Benefit of Proximity-Aware Scheme • Load movement cost: LM(d) denotes the load moved in the distance of dhops. • Benefit: • Results: • For ts5k-large: B = 37-65% • For ts5k-small: B = 11-20%
Other Results • Quantify the overhead of K-nary tree construction: • Link stress, node stress. • The latencies of LBI aggregation and VSA, bound in O(logN) time. • The effect of pairing threshold in rendezvous points.
Conclusions • Current load balancing approachesusing virtual servers have limitations: • Either ignore node capacity heterogeneity. • Or transfer loads without considering proximity relationships between nodes. • Or both. • Our solution: • A fully decentralized, self-repairing, and fault-tolerant K-nary is built on top of DHTs for performing load balancing. • Nodes carry loads in proportion to their capacities. • The first work to address load balancing issue in a proximity-aware manner, thereby minimizing the overhead of load movement and allowing more efficient load balancing.