1 / 27

Volley: Automated Data Placement for Geo-Distributed Cloud Services

Volley: Automated Data Placement for Geo-Distributed Cloud Services. Why data placement important?. user wants lower latency . cloud service operator wants to limit cost . partitioning data across DCs . Commercial cloud service trace analysis. Live Messenger Live Mesh

glain
Download Presentation

Volley: Automated Data Placement for Geo-Distributed Cloud Services

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Volley: Automated Data Placement for Geo-Distributed Cloud Services

  2. Why data placement important? • user wants lower latency • cloud service operator wants to limit cost • partitioning data across DCs

  3. Commercial cloud service trace analysis • Live Messenger • Live Mesh • Cover all users and devices that accessed these services over this entire month • clients are identified by application-level unique identifiers.

  4. Challenge of data placement

  5. Challenge: Geographic Diversity

  6. Challenge: Data Sharing

  7. Challenge: Data-inter Dependency Data-inter dependency in Live mesh

  8. Challenge: Datacenter Capacity • The rush in industry to build additional datacenters is motivated in part by reaching the capacity constraints of individual datacenters as new users are added. This in turn requires automatic mechanisms to rapidly migrate application data to new datacenters to take advantage of their capacity

  9. Challenge: User Mobility

  10. Proven algorithms do not apply to this problem

  11. Volley

  12. Volley Algorithm • Three phases

  13. Data placement heuristics • Common IP Put data close to the IP address that accesses it most frequently • oneDC Put all data in one data center • Hash Randomly allocate data • Volley

  14. Evaluation Capacity Skew Inter-Datacenter Traffic Latency Metrics

  15. CapacitySkew • Hash> • Volley> • Common IP> • oneDC

  16. Inter-datacenter Traffic • oneDC> • Volley> • Common IP> • Hash

  17. Latency • Volley> • Common IP> • oneDC> • Hash>

  18. Evaluation • Capacity skew: Hash>Volley>Common IP>oneDC • Inter-DC traffic: oneDC>Volley>Common IP>Hash • Latency Volley>Common IP>oneDC>Hash

  19. Improvement of Volley Iteration Count • In phase 2, exceeded iterations do not have significant improvement • 5 iterations enough • Phase 3 determines the capacity skew Re-computation • Do make sense • Reason: data migration

  20. Conclusion • Data placement is vital in cloud service • Volley has a comprehensive advantage simultaneously reduces user latency and operator cost reduces datacenter capacity skew by over 2X reduces inter-DC traffic by over 1.8X reduces user latency by 30% at 75th percentile runs in under 16 clock-hours for 400 machine-hours computation across 1 week of traces • The re-computation of Volley algorithm is necessary

  21. Let’s go on……. • Limitation of the evaluation conducted by the paper • No good contrast • Can geo-distance stand for latency? • Client mobility? • Large space for development

  22. Thank You!

  23. Phase 1: calculate geographic centroid for each data

  24. Phase 2: Refine centroid for each data iteratively • considering client locations, and data inter-dependencies • using weighted spring model that attracts data items , but on a spherical coordinate system

More Related