150 likes | 289 Views
ClouDiA : A Deployment Advisor for Public Clouds. Tao Zou , Ronan Le Bras, Marcos Vaz Salles*, Alan Demers , Johannes Gehrke. Cornell University * University of Copenhagen (DIKU). Instance Allocation in Public Clouds. Cloud Provider’s View . Cloud Tenant’s View .
E N D
ClouDiA: A Deployment Advisor for Public Clouds Tao Zou, Ronan Le Bras, Marcos VazSalles*, Alan Demers, Johannes Gehrke Cornell University *University of Copenhagen (DIKU)
Instance Allocation in Public Clouds Cloud Provider’s View Cloud Tenant’s View
Instance Allocation in Public Clouds Core Cloud Provider’s View Aggregation …… Sub-Aggregation TOR TOR TOR ……
Network Latencies in Public Clouds • Mean Latency Heterogeneity in EC2 • Mean Latency Stability in EC2 Challenge: How to avoid long links? Challenge: Can this be done out-of-the-box? (i.e. no changes to the cloud or the application)
Communication Graphs Examples of Latency Sensitive Applications Scientific Simulation Key-value Store (time-to-solution) (response time) Opportunity: Communication graphs are not complete. A careful logical to physical mapping can help! Longest Link Search Aggregation Service Pipelines (response time) (response time) Opportunity: Gamble with over-allocation. Longest Path
Architecture of ClouDiA ClouDiA Public Cloud Tenant Allocate Instances (+ Extra Instances) Communication Graph Get Measurements Objectives Search Mapping Start Application Deployment Plan Terminate Extra Instances
Measuring Network Distance • Goal: obtain a reliable estimate of link costs • Approximations that are easy to obtain • Hop counts • Length of Common IP Prefix • Accurate distance: pair-wise network latencies • Large number of measurements for each pair • To observe enough latency jitters • Interferences at end points • Heavily application and network dependent • Can’t model exactly measure without interference do not work
Measuring Network Latencies • Without interference • Most efficient method: “staged” Stage i Stage i+1 • No concurrent send/recv at end points • Parallelism with minimal coordination
Architecture of ClouDiA ClouDiA Public Cloud Tenant Allocate Instances (+ Extra Instances) Communication Graph Get Measurements Objectives Search Mapping Start Application Deployment Plan Terminate Extra Instances
Search Mapping (Longest Link) • NP-Hard • To find a solution of cost • To find a solution of cost • To find a solution of cost • Goal: find a “good” solution within timeout • Two formulations: • Mixed-Integer Programming • O() boolean variables • Constraint Programming (*) • O() integer variables • an objective cost has to be given a priori Hard to approximate
Searching using Constraint Programming • A lot of distinct latency values • Bi-section search? finding “no solution” takes time • k-means clustering? works well with proper k Give an objective c: 1. Remove all links with cost > c 2. Find a subgraph isomorphism Give an objective c: 1. Remove all links with cost > c 2. Find a subgraph isomorphism k=4 cost 0 Timeout Timeout
Experimental Settings • 100 to 150 m1.large instances in EC2 • IBM ILOG CPLEX Optimizer/CP Optimizer • Multi-core, a single machine • Three workloads: • Behavioral simulation • Synthetic aggregation query • Key-value store
Effect of Over-Allocation • 100 instances + 10%-50% over-allocation • Get Measurements + Search Deployment < 10 minutes
Overall Improvement • 10% over-allocation
Thank you! Conclusion • ClouDiA is a deployment advisor for public clouds • Out-of-the-box • 15%-55% time reduction for latency sensitive applications • Could be adopted by cloud services providers • With API changes