Improving the Scalability of Data Center Networks with Traffic-aware Virtual Machine Placement

Improving the Scalability of Data Center Networkswith Traffic-aware Virtual Machine Placement Xiaoqiao Meng, Vasileios Pappas, Li Zhang IBM T.J. Watson Research Center IEEE INFOCOM 2010

INTRODUCTION • The scalability of modern data centers has become a practical concern and has attracted significant attention in recent years. • existing solutions that require changes in the network architecture and the routing protocols to balance traffic load • With an increasing trend towards more communication intensive applications in data centers, the bandwidth usage between virtual machines (VMs) is rapidly growing.

INTRODUCTION • This paper proposes using traffic-aware virtual machine (VM) placement to improve the network scalability • Many VM placement solutions seek to consolidate VMs for CPU, physical memory and power consumption savings, yet without considering consumption of network resources • This paper tackling the scalability issue by optimizing the placement of VMs on host machines.

INTRODUCTION • e.g. VMs with large mutual bandwidth usage are assigned to host machines in close proximity • design a two-tier approximate algorithm that efficiently solves the VM placement problem for very large problem sizes

INTRODUCTION • Contributions • 1. We address the scalability issue of data center networks with network-aware VM placement. We formulate it as an optimization problem, prove its hardness and propose a novel two-tier algorithm.

INTRODUCTION • Contributions • 2. We analyze the impact of data center network architectures and traffic patterns on the scalability gains attained by network-aware VM placement.

INTRODUCTION • Contributions • 3. We measure traffic patterns in production data center environments, and use the data to evaluate the proposed algorithm as well as the impact analysis.

INTRODUCTION • Problem definition: the Traffic-aware VM Placement Problem (TVMPP) as an optimization problem. • Input: traffic matrix, cost matrix • Output: where VMs should be placed • Goal: minimize the total cost

Data Center Traffic Patterns • Examine traces from two data-center-like systems: • 1. a data warehouse hosted by IBM Global Services the incoming and outgoing traffic rates for 17 thousand VMs • 2. the incoming and outgoing TCP connections for 68 VMs 10 days measurement

Data Center Traffic Patterns • Uneven distribution of traffic volumes from VMs

Data Center Traffic Patterns • Stable per-VM traffic at large timescale:

Data Center Traffic Patterns • Weak correlation between traffic rate and latency:

Data Center Traffic Patterns • The potential benefit : increased network scalability and reduced average traffic latency. • The observed traffic stability over large timescale suggests that it is feasible to find good placements based on past traffic statistics

Data Center Network Architectures

Problem Formulation • Cij: the communication cost from slot i to j • Dij: traffic rate from VM i to j • ei: external traffic rate for VMi • gi: communication cost between VMi and the gateway

Problem Formulation • The above objective function is equivalent to • X: permutation matrix

Problem Formulation • we define Cij as the number of switches on the routing path from VM i to j. • Accordingly, optimizing TVMPP is equivalent to minimizing average traffic latency caused by network infrastructure. • Offline scenario & Online scenario

Complexity Analysis • TVMPP falls into the category of Quadratic Assignment Problem (QAP) • QAP: n things are put into n location, with distance & flows • NP-hard, finding the optimality of QAP problems with size > 15 is practically impossible

Complexity Analysis • Theorem 1: For a TVMPP problem defined on a data center that takes one of the topology in Figure 4, finding the TVMPP optimality is NP-hard

Theorem 1 • This can be proved by a reduction from the Balanced Minimum K-cut Problem (BMKP) • BMKP: G = (V,E) undirected, weighted graph with n vertices A k-cut on G is defined as a subset of E that partition G into k components • BMKP is NP-hard

Theorem 1 • Now considering a data center network, regardless of which topology being used, we can always create a network topology that satisfy: • nslots that are partitioned into k slot-clusters of equal size n/k • Every two slots have a connection with certain cost • within the same cluster : ci • across clusters cost: co co > ci

Theorem 1 • Suppose there are n VMs with traffic matrix D. • By assigning these n VMs to the n slots, we obtain a TVMPP problem • if we define a graph with the n VMs as nodes and D as edge weights, we obtain a BMKP problem • It can be shown that when the TVMPP is optimal, the associated BMKP is also optimal. • And vise versa

Theorem 1 • when the TVMPP is optimal, if we swap any two VMs i, j that have been assigned to two slot-cluster r1, r2 respectively, the k-cut weight will increase. • Let s1 denote the set of VMs assigned to r1 • Let s2 denote the set of VMs assigned to r2

the TVMPP objective value increases: • The amount of change for the k-cut weight is:

ALGORITHMS • Proposition 1: Suppose 0 ≤ a1 ≤ a2 . . . ≤ an and 0 ≤ b1 ≤ b2 . . . ≤ bn, the following inequalities hold for any permutation π on [1, . . . , n]

ALGORITHMS • First, according to Proposition 1, solving TVMPP is intuitively equivalent to finding a mapping of VMs to slots such that VM pairs with heavy mutual traffic be assigned to slot pairs with low-cost connections.

ALGORITHMS • The second design principle is divide-and-conquer: we partition VMs into VM-clusters and partition slots into slot clusters. • VM-clusters are obtained via classical min-cut graph algorithm which ensures that VM pairs with high mutual traffic rate are within the same VM cluster • Slot-clusters are obtained via standard clustering techniques which ensures slot pairs with low-cost connections belong to the same slot-cluster

ALGORITHMS • SlotClustering: Minimum k-clustering ,NP-hard. ( an approximation ratio 2 ) O(nk) • VMMinKcut: minimum k-cut algorithm O(n4) • Assign VMs to slots • Recursive call

ALGORITHMS

IMPACT OF NETWORK ARCHITECTURES AND TRAFFIC PATTERNS • Global Traffic Model: • each VM sends traffic to every other VM at equal and constant rate • For any permutation, matrix X, holds • This simplifies the TVMPP problem to the following:

IMPACT OF NETWORK ARCHITECTURES AND TRAFFIC PATTERNS • which is the classical Linear Sum Assignment Problem (LSAP) . The complexity for LSAP is O(n3) • Random placement:

IMPACT OF NETWORK ARCHITECTURES AND TRAFFIC PATTERNS

IMPACT OF NETWORK ARCHITECTURES AND TRAFFIC PATTERNS • Partitioned Traffic Model: Under the partitioned traffic model, each VM belongs to a group of VMs and it sends traffic only to other VMs in the same group • The GLB is a lower bound for the optimal objective value of a QAP problem

IMPACT OF NETWORK ARCHITECTURES AND TRAFFIC PATTERNS

observation

EVALUATION

DISCUSSION • Combining VM migration with dynamic routing protocols • VM placement by joint network and server resource optimization:

ARiA: A Protocol for Dynamic Fully DistributedGrid Meta-Scheduling Amos Brocco, ApostolosMalatras, Ye Huang, B´eatHirsbrunner Department of Informatics University of Fribourg, Switzerland ICDCS 2010

INTRODUCTION • An advantage of grid systems is their ability to guarantee efficient meta-scheduling (optimal allocation of jobs across a pool of sites with diverse local scheduling policies ) • The centralized nature of current meta-scheduling solutions is not well suited for the envisioned increasing scale and dynamicity

INTRODUCTION • This paper focuses on grid task meta-scheduling, by presenting a fully distributed protocol named ARiAto achieve efficient global dynamic scheduling across multiple sites • The meta-scheduling process is performed online, and takes into account the availability of new resources as well as changes in actual allocation policies

ARiA PROTOCOL • Job Submission Phase • Job Acceptance Phase • Dynamic Rescheduling Phase

Job Submission Phase • Jobs are assigned a universal unique identifier (UUID) • Nodes receiving job submissions are referred to as initiators for these jobs • Initiators issue resource discovery queries across the grid peer-to-peer overlay by broadcasting REQUEST messages

Job Acceptance Phase • If the request cannot be satisfied, the message is further forwarded on the peer-to-peer overlay • otherwise a cost value for the job based on actual resources and current scheduling is computed and sent back to the job’s initiator by means of an ACCEPT message • The initiator evaluates incoming ACCEPT responses, and selects the best qualified node, and sends an ASSIGN message

Dynamic Rescheduling Phase • the assignee attempts to find candidates for rescheduling of jobs in its queue while their execution has not yet started by the INFORM messagges • The structure of INFORM messages relates to that of REQUEST messages

EVALATION • For the evaluation of ARiA, an overlay of 500 nodes with a target average path length of 9 hops was deployed in a custom simulator • The average node’s degree attained during simulations was 4, resulting in about 2000 overlay links

EVALATION • In all scenarios a total of 1000 jobs is submitted to random nodes on the grid. Unless otherwise specified, jobs are submitted at 10 seconds intervals • when dynamic rescheduling is enabled, INFORM messages are sent for at most 2 scheduled jobs every 5 minutes

EVALATION • REQUEST messages are forwarded on the overlay for at most 9 hops; at each step, at most 4 random neighbors of the current node are contacted • INFORM messages a more lightweight approach is followed, with at most 8 hops and up to 2 neighbors

EVALATION

Improving the Scalability of Data Center Networks with Traffic-aware Virtual Machine Placement

Improving the Scalability of Data Center Networks with Traffic-aware Virtual Machine Placement

Presentation Transcript

Virtual Machine Placement Jian Tang 02

Traffic Offloading of Optic/Electric Hybrid Data Center Networks (DCNs)

Improving Data Center Efficiency

VIRTUAL NETWORKS IN VIRTUAL MACHINE PROGRAMS

Data Center Networks

Power Aware Virtual Machine Placement

The Virtual Machine

improving the Scalability of Data Center Networks with Traffic-aware Virtual Machine Placement

Scalability for Virtual Worlds

Improving the Scalability of Data Center Networks with Traffic-aware Virtual Machine Placement

Thermal-aware Task Placement in Data Centers

System Center Virtual Machine Manager

Virtual Machine and Traffic Simulation

Dr. Multicast for Data Center Communication Scalability

Dr. Multicast for Data Center Communication Scalability

A Virtual Machine for Sensor Networks

System Center Virtual Machine Manager

Towards Virtual Networks for Virtual Machine Grid Computing

Power-Aware Placement

Improving the Scalability of Comparative Debugging with MRNet

Power-Aware Placement

Improving the Efficiency of Organ Placement