1 / 13

Replication Degree Customization for High Availability

Replication Degree Customization for High Availability. Ming Zhong Kai Shen Joel Seiferas Google University of Rochester. Problem Context. Replication degree tradeoff between availability and space Skewed data popularity distributions

Download Presentation

Replication Degree Customization for High Availability

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Replication Degree Customization for High Availability Ming Zhong Kai Shen Joel Seiferas Google University of Rochester

  2. Problem Context • Replication degree tradeoff between availability and space • Skewed data popularity distributions • intuitive to highly replicate popular objects • goal: improve availability under certain space constraint • Should we worry about space constraint today? • decentralized wide-area systems – high machine failure/inaccessibility rate demand high-degree replication • 0.265 failure rate from a Planetlab machine accessibility trace • centrally-managed local-area clusters – often requiring data in-memory and relatively high memory cost EuroSys 2008

  3. Related Work • uniform-degree replication for availability or durability • Farsite, CFS, GFS, Glacier, IrisStore, MOAT • simple adaptive approach in some production systems • higher replication for most popular objects; lower replication for the rest • other utilizations of skewed data object popularity • fast search/lookup – Cohen et al., Beehive • other ways to improve data availability in distributed systems • replica placement – Farsite • erasure coding – Reed-Solomon EuroSys 2008

  4. Basic Analytical Result • Problem formulation: • p is machine failure probability ⇒ unavailability of an object with k replicas is pow(p,k) • a system with n objects: object i popularity is ri, size is si • find object replication degrees k1, k2, …, kn to minimize expected unavailability ∑1≤i≤n ri ∙ pow(p,ki) • subject to space constraint ∑1≤i≤n si ∙ ki ≤ K • Result: • Lagrangian function: ∑1≤i≤n ri ∙ pow(p,ki) + λ∙ (∑ 1≤i≤n si ∙ ki – K) • optimization is reached when the function’s partial derivatives on ki’sand λ are all zero • therefore ki = C + log1/p(ri/si), C is a constant EuroSys 2008

  5. Challenges in Systems Context • Basic analytical result: • optimal object replication degree ki = C + log1/p(ri/si) • p is machine failure probability; ri is object popularity; si is object size • Systems issues: • complex system models: multi-object operations, nonuniform machine failure rates • realistic workload behaviors: skewness and stability of object popularity • maintenance overhead: replica creation/deletion under dynamic system changes EuroSys 2008

  6. Complex System Models • Basic analytical result can be adapted to handle complex system models. • Multi-object operations • multi-object op unavailability is approximately the sum of the unavailabilities of individual objects • when xi‘s are small, 1-Π1≤i≤m (1-xi) ≈ ∑1≤i≤m xi • Nonuniform machine failure rates • redefine p as the average per-machine failure rate EuroSys 2008

  7. Object Popularity/Size Skewness • Popularity to size skewness correlation • Trace-driven examination: • Most popular objects are not necessarily subject to most replication EuroSys 2008

  8. Object Popularity Stability • Stable object popularity is important for learning the object popularity and for low adjustment overhead • Illustration of stability across month-long trace segments: • Not designed to handle flash crowds EuroSys 2008

  9. Dynamic Maintenance Overhead • System adaptation may require dynamic maintenance • object popularity may change over time • Low maintenance overhead due to • stable object popularities • stable replica assignment in the analytical result • object replication degree ki = C + log1/p(ri/si) • coarse-grain adaptation • e.g., support only two replication degrees – low/high EuroSys 2008

  10. Trace-driven Evaluation –Availability Improvement • Availability improvement on four real application traces and two machine failure patterns • Compared to uniform replication under same space constraint: EuroSys 2008

  11. Trace-driven Evaluation –Changing Space Constraint • Results on the Planetlab machine failure pattern • Availability improvement is independent of space limit • High replication needed for some decentralized wide-area systems EuroSys 2008

  12. Trace-driven Evaluation –Dynamic Maintenance Overhead • Overhead of weekly changes on replication degree • number of replica creations/deletions • size of replica creations EuroSys 2008

  13. Conclusion • Results • analytical result: optimal replication when object replication degree is linear to log(popularity/size) • address systems issues in complex system models, realistic workload behaviors, and maintenance overhead • Big picture: skewed & stable data distributions motivate per-object adaptation in distributed system management • adapt replication degree for high availability [this paper] • adapt Bloom filter hash number for low false-positive rate [other result] • adapt co-placement of correlated data objects for fast multi-object operations [other result] EuroSys 2008

More Related