260 likes | 392 Views
Distributed Load Balancing for Key-Value Storage Systems. Imranul Hoque Michael Spreitzer Malgorzata Steinder. Key-Value Storage Systems. Usage: Session state, tags, comments, etc. Requirements: Scalability Fast response time High availability & fault tolerance
E N D
Distributed Load Balancing for Key-Value Storage Systems ImranulHoque Michael Spreitzer MalgorzataSteinder
Key-Value Storage Systems • Usage: • Session state, tags, comments, etc. • Requirements: • Scalability • Fast response time • High availability & fault tolerance • Relaxed consistency guarantee • Example: Cassandra, Dynamo, PNUTS, etc.
Load Balancing in K-V Storage • Hash partitioned vs. range partitioned • Range partitioned data ensures efficient range scan/search • Hash partitioned data helps even distribution THU SUN MON TUE SAT WED FRI SUN FRI SAT MON Table Tablets THU TUE WED Server 3 Server 4 Server 1 Server 2
Issues with Load Balancing • Uneven space distribution due to range partitioning • Solution: partition the tablets and move them around • Few number of very popular records SUN FRI SAT MON THU TUE WED Server 3 Server 4 Server 1 Server 2
Contribution • Algorithms for solving the load balancing problem • Load = space, bandwidth • Evenly distribute the spare capacity • Distributed algorithm, not a centralized one • Reduce the number of moves • Previous solutions: • One dimensional/key-space redistribution/bulk loading
Outline • Motivation • System modeling and assumptions • Algorithms • One-to-one • One-to-n • Move suppression • Design decisions • Experimental results Emulation of proposed distributed algorithms • Future works
System Modeling and Assumptions B1, S1 B1, S1 B4, S4 Server A Tablet B5, S5 BA, SA B2, S2 Tablet Server B BB, SB B3, S3 Table Tablet Server C BC, SC <= 0.01 in both dimensions 2. # of tablets >> # of nodes
System State Target Zone: helps achieve convergence S Target Point B Goal: Move tablets around so that every server is within the target zone
Load Balancing Algorithms • Phase 1: • Global averaging scheme • Variance of the approximation of the average decreases exponentially fast • Phase 2: • One-to-one gossip • One-to-n gossip • Move suppression t Phase 2 Phase 2 Phase 1 Phase 1
One-to-One Gossip • Point selection strategy • Midpoint strategy • Greedy strategy • Tablet transfer strategy • Move to the selected point with minimum cost (space transferred)
Tablet Transfer Strategy Server 2 Target for Server 1 Server 1 S B
Tablet Transfer Strategy (2) • Start with an empty bag • Goal: take vectors from the servers so that they add up to the target vector • If slope(bag + left + right) < slope(target): • Add right to bag, move right • Otherwise, add left to bag move left Server 1 Right Left
Initial Configurations Uniform Two Extreme Mid Quadrant
Point Selection Strategy • Midpoint Strategy + Guaranteed convergence + No need to run phase 1 • Lots of extra movement • Visualization Demo • Uniform • Two extreme • Mid quadrant Server 2 S Server 1 B
Point Selection Strategy (2) • Greedy Strategy • Take the point closer to the target • Move it to the target, if • improves the position of the other point • does not worsen by more than δ • Reduces movement Server 2 Server 1 Takes long time to converge in some cases
DHT + Midpoint • Greedy + fallback to DHT: • Convergence problem exists for some configurations • Visualization Demo • Solution: • Greedy + fallback to DHT with Midpoint • Demo: uniform, two extreme, mid quadrant • Alternate approach: • Greedy + fallback to Midpoint • Trade-off: movement cost vs. DHT overhead
Experimental Evaluation • Uniform configuration • Greedy + DHT (Midpoint) • Midpoint • Greedy + Midpoint (No DHT) • Effect of varying target zone • Effect of failed gossip count • Metrics • Amount of space moved • # of gossip rounds • Multiple tablet move
Effect of Varying Target Zone Larger target zone = fast convergence, less accuracy Target zone width should depend on the target point value
Effect of Failed Gossip Count (Greedy) Large failed gossip count = More time in greedy mode, more unproductive gossip at the end
One-to-N Gossip • Contact a few random nodes • Locked/unlocked mode • Pick the most profitable one • Distance from the target is minimized • Advantage • Better choices • Initial results • Locked mode: may lead to deadlock • Unlocked mode: most of the cases other nodes start transfer
Move Suppression • Two global stages • Stage 1: • One-to-One gossip, but moves are hypothetical • Stage 2: • Change to chosen placement • Advantage • Tablet not moved multiple times • Challenges • When to switch to Stage 2 from Stage 1
Future Works • Handling initial placement • Frequency of running the placement algorithm • Considering the network hierarchy • Handling failures • Extending to heterogeneous resources Questions?