720 likes | 962 Views
Improving the Scalability of Data Center Networks with Traffic-aware Virtual Machine Placement. Xiaoqiao Meng, Vasileios Pappas, Li Zhang IBM T.J. Watson Research Center. IEEE INFOCOM 2010. INTRODUCTION. The scalability of modern data centers has become a
E N D
Improving the Scalability of Data Center Networkswith Traffic-aware Virtual Machine Placement Xiaoqiao Meng, Vasileios Pappas, Li Zhang IBM T.J. Watson Research Center IEEE INFOCOM 2010
INTRODUCTION • The scalability of modern data centers has become a practical concern and has attracted significant attention in recent years. • existing solutions that require changes in the network architecture and the routing protocols to balance traffic load • With an increasing trend towards more communication intensive applications in data centers, the bandwidth usage between virtual machines (VMs) is rapidly growing.
INTRODUCTION • This paper proposes using traffic-aware virtual machine (VM) placement to improve the network scalability • Many VM placement solutions seek to consolidate VMs for CPU, physical memory and power consumption savings, yet without considering consumption of network resources • This paper tackling the scalability issue by optimizing the placement of VMs on host machines.
INTRODUCTION • e.g. VMs with large mutual bandwidth usage are assigned to host machines in close proximity • design a two-tier approximate algorithm that efficiently solves the VM placement problem for very large problem sizes
INTRODUCTION • Contributions • 1. We address the scalability issue of data center networks with network-aware VM placement. We formulate it as an optimization problem, prove its hardness and propose a novel two-tier algorithm.
INTRODUCTION • Contributions • 2. We analyze the impact of data center network architectures and traffic patterns on the scalability gains attained by network-aware VM placement.
INTRODUCTION • Contributions • 3. We measure traffic patterns in production data center environments, and use the data to evaluate the proposed algorithm as well as the impact analysis.
INTRODUCTION • Problem definition: the Traffic-aware VM Placement Problem (TVMPP) as an optimization problem. • Input: traffic matrix, cost matrix • Output: where VMs should be placed • Goal: minimize the total cost
Data Center Traffic Patterns • Examine traces from two data-center-like systems: • 1. a data warehouse hosted by IBM Global Services the incoming and outgoing traffic rates for 17 thousand VMs • 2. the incoming and outgoing TCP connections for 68 VMs 10 days measurement
Data Center Traffic Patterns • Uneven distribution of traffic volumes from VMs
Data Center Traffic Patterns • Stable per-VM traffic at large timescale:
Data Center Traffic Patterns • Weak correlation between traffic rate and latency:
Data Center Traffic Patterns • The potential benefit : increased network scalability and reduced average traffic latency. • The observed traffic stability over large timescale suggests that it is feasible to find good placements based on past traffic statistics
Problem Formulation • Cij: the communication cost from slot i to j • Dij: traffic rate from VM i to j • ei: external traffic rate for VMi • gi: communication cost between VMi and the gateway
Problem Formulation • The above objective function is equivalent to • X: permutation matrix
Problem Formulation • we define Cij as the number of switches on the routing path from VM i to j. • Accordingly, optimizing TVMPP is equivalent to minimizing average traffic latency caused by network infrastructure. • Offline scenario & Online scenario
Complexity Analysis • TVMPP falls into the category of Quadratic Assignment Problem (QAP) • QAP: n things are put into n location, with distance & flows • NP-hard, finding the optimality of QAP problems with size > 15 is practically impossible
Complexity Analysis • Theorem 1: For a TVMPP problem defined on a data center that takes one of the topology in Figure 4, finding the TVMPP optimality is NP-hard
Theorem 1 • This can be proved by a reduction from the Balanced Minimum K-cut Problem (BMKP) • BMKP: G = (V,E) undirected, weighted graph with n vertices A k-cut on G is defined as a subset of E that partition G into k components • BMKP is NP-hard
Theorem 1 • Now considering a data center network, regardless of which topology being used, we can always create a network topology that satisfy: • nslots that are partitioned into k slot-clusters of equal size n/k • Every two slots have a connection with certain cost • within the same cluster : ci • across clusters cost: co co > ci
Theorem 1 • Suppose there are n VMs with traffic matrix D. • By assigning these n VMs to the n slots, we obtain a TVMPP problem • if we define a graph with the n VMs as nodes and D as edge weights, we obtain a BMKP problem • It can be shown that when the TVMPP is optimal, the associated BMKP is also optimal. • And vise versa
Theorem 1 • when the TVMPP is optimal, if we swap any two VMs i, j that have been assigned to two slot-cluster r1, r2 respectively, the k-cut weight will increase. • Let s1 denote the set of VMs assigned to r1 • Let s2 denote the set of VMs assigned to r2
the TVMPP objective value increases: • The amount of change for the k-cut weight is:
ALGORITHMS • Proposition 1: Suppose 0 ≤ a1 ≤ a2 . . . ≤ an and 0 ≤ b1 ≤ b2 . . . ≤ bn, the following inequalities hold for any permutation π on [1, . . . , n]
ALGORITHMS • First, according to Proposition 1, solving TVMPP is intuitively equivalent to finding a mapping of VMs to slots such that VM pairs with heavy mutual traffic be assigned to slot pairs with low-cost connections.
ALGORITHMS • The second design principle is divide-and-conquer: we partition VMs into VM-clusters and partition slots into slot clusters. • VM-clusters are obtained via classical min-cut graph algorithm which ensures that VM pairs with high mutual traffic rate are within the same VM cluster • Slot-clusters are obtained via standard clustering techniques which ensures slot pairs with low-cost connections belong to the same slot-cluster
ALGORITHMS • SlotClustering: Minimum k-clustering ,NP-hard. ( an approximation ratio 2 ) O(nk) • VMMinKcut: minimum k-cut algorithm O(n4) • Assign VMs to slots • Recursive call
IMPACT OF NETWORK ARCHITECTURES AND TRAFFIC PATTERNS • Global Traffic Model: • each VM sends traffic to every other VM at equal and constant rate • For any permutation, matrix X, holds • This simplifies the TVMPP problem to the following:
IMPACT OF NETWORK ARCHITECTURES AND TRAFFIC PATTERNS • which is the classical Linear Sum Assignment Problem (LSAP) . The complexity for LSAP is O(n3) • Random placement:
IMPACT OF NETWORK ARCHITECTURES AND TRAFFIC PATTERNS • Partitioned Traffic Model: Under the partitioned traffic model, each VM belongs to a group of VMs and it sends traffic only to other VMs in the same group • The GLB is a lower bound for the optimal objective value of a QAP problem
DISCUSSION • Combining VM migration with dynamic routing protocols • VM placement by joint network and server resource optimization:
ARiA: A Protocol for Dynamic Fully DistributedGrid Meta-Scheduling Amos Brocco, ApostolosMalatras, Ye Huang, B´eatHirsbrunner Department of Informatics University of Fribourg, Switzerland ICDCS 2010
INTRODUCTION • An advantage of grid systems is their ability to guarantee efficient meta-scheduling (optimal allocation of jobs across a pool of sites with diverse local scheduling policies ) • The centralized nature of current meta-scheduling solutions is not well suited for the envisioned increasing scale and dynamicity
INTRODUCTION • This paper focuses on grid task meta-scheduling, by presenting a fully distributed protocol named ARiAto achieve efficient global dynamic scheduling across multiple sites • The meta-scheduling process is performed online, and takes into account the availability of new resources as well as changes in actual allocation policies
ARiA PROTOCOL • Job Submission Phase • Job Acceptance Phase • Dynamic Rescheduling Phase
Job Submission Phase • Jobs are assigned a universal unique identifier (UUID) • Nodes receiving job submissions are referred to as initiators for these jobs • Initiators issue resource discovery queries across the grid peer-to-peer overlay by broadcasting REQUEST messages
Job Acceptance Phase • If the request cannot be satisfied, the message is further forwarded on the peer-to-peer overlay • otherwise a cost value for the job based on actual resources and current scheduling is computed and sent back to the job’s initiator by means of an ACCEPT message • The initiator evaluates incoming ACCEPT responses, and selects the best qualified node, and sends an ASSIGN message
Dynamic Rescheduling Phase • the assignee attempts to find candidates for rescheduling of jobs in its queue while their execution has not yet started by the INFORM messagges • The structure of INFORM messages relates to that of REQUEST messages
EVALATION • For the evaluation of ARiA, an overlay of 500 nodes with a target average path length of 9 hops was deployed in a custom simulator • The average node’s degree attained during simulations was 4, resulting in about 2000 overlay links
EVALATION • In all scenarios a total of 1000 jobs is submitted to random nodes on the grid. Unless otherwise specified, jobs are submitted at 10 seconds intervals • when dynamic rescheduling is enabled, INFORM messages are sent for at most 2 scheduled jobs every 5 minutes
EVALATION • REQUEST messages are forwarded on the overlay for at most 9 hops; at each step, at most 4 random neighbors of the current node are contacted • INFORM messages a more lightweight approach is followed, with at most 8 hops and up to 2 neighbors