280 likes | 439 Views
Volley: Automated Data Placement for Geo-Distributed Cloud Services. Why data placement important?. user wants lower latency . cloud service operator wants to limit cost . partitioning data across DCs . Commercial cloud service trace analysis. Live Messenger Live Mesh
E N D
Volley: Automated Data Placement for Geo-Distributed Cloud Services
Why data placement important? • user wants lower latency • cloud service operator wants to limit cost • partitioning data across DCs
Commercial cloud service trace analysis • Live Messenger • Live Mesh • Cover all users and devices that accessed these services over this entire month • clients are identified by application-level unique identifiers.
Challenge: Data-inter Dependency Data-inter dependency in Live mesh
Challenge: Datacenter Capacity • The rush in industry to build additional datacenters is motivated in part by reaching the capacity constraints of individual datacenters as new users are added. This in turn requires automatic mechanisms to rapidly migrate application data to new datacenters to take advantage of their capacity
Volley Algorithm • Three phases
Data placement heuristics • Common IP Put data close to the IP address that accesses it most frequently • oneDC Put all data in one data center • Hash Randomly allocate data • Volley
Evaluation Capacity Skew Inter-Datacenter Traffic Latency Metrics
CapacitySkew • Hash> • Volley> • Common IP> • oneDC
Inter-datacenter Traffic • oneDC> • Volley> • Common IP> • Hash
Latency • Volley> • Common IP> • oneDC> • Hash>
Evaluation • Capacity skew: Hash>Volley>Common IP>oneDC • Inter-DC traffic: oneDC>Volley>Common IP>Hash • Latency Volley>Common IP>oneDC>Hash
Improvement of Volley Iteration Count • In phase 2, exceeded iterations do not have significant improvement • 5 iterations enough • Phase 3 determines the capacity skew Re-computation • Do make sense • Reason: data migration
Conclusion • Data placement is vital in cloud service • Volley has a comprehensive advantage simultaneously reduces user latency and operator cost reduces datacenter capacity skew by over 2X reduces inter-DC traffic by over 1.8X reduces user latency by 30% at 75th percentile runs in under 16 clock-hours for 400 machine-hours computation across 1 week of traces • The re-computation of Volley algorithm is necessary
Let’s go on……. • Limitation of the evaluation conducted by the paper • No good contrast • Can geo-distance stand for latency? • Client mobility? • Large space for development
Phase 1: calculate geographic centroid for each data
Phase 2: Refine centroid for each data iteratively • considering client locations, and data inter-dependencies • using weighted spring model that attracts data items , but on a spherical coordinate system