Volley: Automated Data Placement for Geo-Distributed Cloud Services

Volley: Automated Data Placement for Geo-Distributed Cloud Services

Why data placement important? • user wants lower latency • cloud service operator wants to limit cost • partitioning data across DCs

Commercial cloud service trace analysis • Live Messenger • Live Mesh • Cover all users and devices that accessed these services over this entire month • clients are identified by application-level unique identifiers.

Challenge of data placement

Challenge: Geographic Diversity

Challenge: Data Sharing

Challenge: Data-inter Dependency Data-inter dependency in Live mesh

Challenge: Datacenter Capacity • The rush in industry to build additional datacenters is motivated in part by reaching the capacity constraints of individual datacenters as new users are added. This in turn requires automatic mechanisms to rapidly migrate application data to new datacenters to take advantage of their capacity

Challenge: User Mobility

Proven algorithms do not apply to this problem

Volley

Volley Algorithm • Three phases

Data placement heuristics • Common IP Put data close to the IP address that accesses it most frequently • oneDC Put all data in one data center • Hash Randomly allocate data • Volley

Evaluation Capacity Skew Inter-Datacenter Traffic Latency Metrics

CapacitySkew • Hash> • Volley> • Common IP> • oneDC

Inter-datacenter Traffic • oneDC> • Volley> • Common IP> • Hash

Latency • Volley> • Common IP> • oneDC> • Hash>

Evaluation • Capacity skew: Hash>Volley>Common IP>oneDC • Inter-DC traffic: oneDC>Volley>Common IP>Hash • Latency Volley>Common IP>oneDC>Hash

Improvement of Volley Iteration Count • In phase 2, exceeded iterations do not have significant improvement • 5 iterations enough • Phase 3 determines the capacity skew Re-computation • Do make sense • Reason: data migration

Conclusion • Data placement is vital in cloud service • Volley has a comprehensive advantage simultaneously reduces user latency and operator cost reduces datacenter capacity skew by over 2X reduces inter-DC traffic by over 1.8X reduces user latency by 30% at 75th percentile runs in under 16 clock-hours for 400 machine-hours computation across 1 week of traces • The re-computation of Volley algorithm is necessary

Let’s go on……. • Limitation of the evaluation conducted by the paper • No good contrast • Can geo-distance stand for latency? • Client mobility? • Large space for development

Thank You!

Phase 1: calculate geographic centroid for each data

Phase 2: Refine centroid for each data iteratively • considering client locations, and data inter-dependencies • using weighted spring model that attracts data items , but on a spherical coordinate system

Volley: Automated Data Placement for Geo-Distributed Cloud Services

Volley: Automated Data Placement for Geo-Distributed Cloud Services

Presentation Transcript

QoS Support in Operating Systems

Windows Azure Diagnostics Logging and Monitoring in the Cloud

Preparing Identities for Cloud Services with Microsoft Forefront Identity Manager

Open Distributed Processing and Multimedia

Wide-Area Traffic Management for Cloud Services

Subject Revision Distributed Systems: Principles and Paradigms

SAP Cloud Implementation Methodology Get Efficient Deployment – Instant Value

The Semantic Web an introduction

Massively Parallel/Distributed Data Storage Systems

Joint work with the Sherpa team in Cloud Computing

Introduction Background Distributed DBMS Architecture Distributed Database Design

Grand Ontology Strategy

Distributed Databases

Hitachi data ingestor

End to End Security and Privacy in Distributed Systems and Cloud

CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design

Outline

Automated Extraction of Transition Systems from Component-model Architectures

Chapter 23

Panda: Public Auditing for Shared Data

Placement