1 / 72

Improving the Scalability of Data Center Networks with Traffic-aware Virtual Machine Placement

Improving the Scalability of Data Center Networks with Traffic-aware Virtual Machine Placement. Xiaoqiao Meng, Vasileios Pappas, Li Zhang IBM T.J. Watson Research Center. IEEE INFOCOM 2010. INTRODUCTION. The scalability of modern data centers has become a

shelley
Download Presentation

Improving the Scalability of Data Center Networks with Traffic-aware Virtual Machine Placement

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Improving the Scalability of Data Center Networkswith Traffic-aware Virtual Machine Placement Xiaoqiao Meng, Vasileios Pappas, Li Zhang IBM T.J. Watson Research Center IEEE INFOCOM 2010

  2. INTRODUCTION • The scalability of modern data centers has become a practical concern and has attracted significant attention in recent years. • existing solutions that require changes in the network architecture and the routing protocols to balance traffic load • With an increasing trend towards more communication intensive applications in data centers, the bandwidth usage between virtual machines (VMs) is rapidly growing.

  3. INTRODUCTION • This paper proposes using traffic-aware virtual machine (VM) placement to improve the network scalability • Many VM placement solutions seek to consolidate VMs for CPU, physical memory and power consumption savings, yet without considering consumption of network resources • This paper tackling the scalability issue by optimizing the placement of VMs on host machines.

  4. INTRODUCTION • e.g. VMs with large mutual bandwidth usage are assigned to host machines in close proximity • design a two-tier approximate algorithm that efficiently solves the VM placement problem for very large problem sizes

  5. INTRODUCTION • Contributions • 1. We address the scalability issue of data center networks with network-aware VM placement. We formulate it as an optimization problem, prove its hardness and propose a novel two-tier algorithm.

  6. INTRODUCTION • Contributions • 2. We analyze the impact of data center network architectures and traffic patterns on the scalability gains attained by network-aware VM placement.

  7. INTRODUCTION • Contributions • 3. We measure traffic patterns in production data center environments, and use the data to evaluate the proposed algorithm as well as the impact analysis.

  8. INTRODUCTION • Problem definition: the Traffic-aware VM Placement Problem (TVMPP) as an optimization problem. • Input: traffic matrix, cost matrix • Output: where VMs should be placed • Goal: minimize the total cost

  9. Data Center Traffic Patterns • Examine traces from two data-center-like systems: • 1. a data warehouse hosted by IBM Global Services the incoming and outgoing traffic rates for 17 thousand VMs • 2. the incoming and outgoing TCP connections for 68 VMs 10 days measurement

  10. Data Center Traffic Patterns • Uneven distribution of traffic volumes from VMs

  11. Data Center Traffic Patterns • Stable per-VM traffic at large timescale:

  12. Data Center Traffic Patterns • Weak correlation between traffic rate and latency:

  13. Data Center Traffic Patterns • The potential benefit : increased network scalability and reduced average traffic latency. • The observed traffic stability over large timescale suggests that it is feasible to find good placements based on past traffic statistics

  14. Data Center Network Architectures

  15. Data Center Network Architectures

  16. Problem Formulation • Cij: the communication cost from slot i to j • Dij: traffic rate from VM i to j • ei: external traffic rate for VMi • gi: communication cost between VMi and the gateway

  17. Problem Formulation • The above objective function is equivalent to • X: permutation matrix

  18. Problem Formulation • we define Cij as the number of switches on the routing path from VM i to j. • Accordingly, optimizing TVMPP is equivalent to minimizing average traffic latency caused by network infrastructure. • Offline scenario & Online scenario

  19. Complexity Analysis • TVMPP falls into the category of Quadratic Assignment Problem (QAP) • QAP: n things are put into n location, with distance & flows • NP-hard, finding the optimality of QAP problems with size > 15 is practically impossible

  20. Complexity Analysis • Theorem 1: For a TVMPP problem defined on a data center that takes one of the topology in Figure 4, finding the TVMPP optimality is NP-hard

  21. Theorem 1 • This can be proved by a reduction from the Balanced Minimum K-cut Problem (BMKP) • BMKP: G = (V,E) undirected, weighted graph with n vertices A k-cut on G is defined as a subset of E that partition G into k components • BMKP is NP-hard

  22. Theorem 1 • Now considering a data center network, regardless of which topology being used, we can always create a network topology that satisfy: • nslots that are partitioned into k slot-clusters of equal size n/k • Every two slots have a connection with certain cost • within the same cluster : ci • across clusters cost: co co > ci

  23. Theorem 1 • Suppose there are n VMs with traffic matrix D. • By assigning these n VMs to the n slots, we obtain a TVMPP problem • if we define a graph with the n VMs as nodes and D as edge weights, we obtain a BMKP problem • It can be shown that when the TVMPP is optimal, the associated BMKP is also optimal. • And vise versa

  24. Theorem 1 • when the TVMPP is optimal, if we swap any two VMs i, j that have been assigned to two slot-cluster r1, r2 respectively, the k-cut weight will increase. • Let s1 denote the set of VMs assigned to r1 • Let s2 denote the set of VMs assigned to r2

  25. the TVMPP objective value increases: • The amount of change for the k-cut weight is:

  26. ALGORITHMS • Proposition 1: Suppose 0 ≤ a1 ≤ a2 . . . ≤ an and 0 ≤ b1 ≤ b2 . . . ≤ bn, the following inequalities hold for any permutation π on [1, . . . , n]

  27. ALGORITHMS • First, according to Proposition 1, solving TVMPP is intuitively equivalent to finding a mapping of VMs to slots such that VM pairs with heavy mutual traffic be assigned to slot pairs with low-cost connections.

  28. ALGORITHMS • The second design principle is divide-and-conquer: we partition VMs into VM-clusters and partition slots into slot clusters. • VM-clusters are obtained via classical min-cut graph algorithm which ensures that VM pairs with high mutual traffic rate are within the same VM cluster • Slot-clusters are obtained via standard clustering techniques which ensures slot pairs with low-cost connections belong to the same slot-cluster

  29. ALGORITHMS • SlotClustering: Minimum k-clustering ,NP-hard. ( an approximation ratio 2 ) O(nk) • VMMinKcut: minimum k-cut algorithm O(n4) • Assign VMs to slots • Recursive call

  30. ALGORITHMS

  31. IMPACT OF NETWORK ARCHITECTURES AND TRAFFIC PATTERNS • Global Traffic Model: • each VM sends traffic to every other VM at equal and constant rate • For any permutation, matrix X, holds • This simplifies the TVMPP problem to the following:

  32. IMPACT OF NETWORK ARCHITECTURES AND TRAFFIC PATTERNS • which is the classical Linear Sum Assignment Problem (LSAP) . The complexity for LSAP is O(n3) • Random placement:

  33. IMPACT OF NETWORK ARCHITECTURES AND TRAFFIC PATTERNS

  34. IMPACT OF NETWORK ARCHITECTURES AND TRAFFIC PATTERNS • Partitioned Traffic Model: Under the partitioned traffic model, each VM belongs to a group of VMs and it sends traffic only to other VMs in the same group • The GLB is a lower bound for the optimal objective value of a QAP problem

  35. IMPACT OF NETWORK ARCHITECTURES AND TRAFFIC PATTERNS

  36. IMPACT OF NETWORK ARCHITECTURES AND TRAFFIC PATTERNS

  37. observation

  38. EVALUATION

  39. DISCUSSION • Combining VM migration with dynamic routing protocols • VM placement by joint network and server resource optimization:

  40. ARiA: A Protocol for Dynamic Fully DistributedGrid Meta-Scheduling Amos Brocco, ApostolosMalatras, Ye Huang, B´eatHirsbrunner Department of Informatics University of Fribourg, Switzerland ICDCS 2010

  41. INTRODUCTION • An advantage of grid systems is their ability to guarantee efficient meta-scheduling (optimal allocation of jobs across a pool of sites with diverse local scheduling policies ) • The centralized nature of current meta-scheduling solutions is not well suited for the envisioned increasing scale and dynamicity

  42. INTRODUCTION • This paper focuses on grid task meta-scheduling, by presenting a fully distributed protocol named ARiAto achieve efficient global dynamic scheduling across multiple sites • The meta-scheduling process is performed online, and takes into account the availability of new resources as well as changes in actual allocation policies

  43. ARiA PROTOCOL • Job Submission Phase • Job Acceptance Phase • Dynamic Rescheduling Phase

  44. Job Submission Phase • Jobs are assigned a universal unique identifier (UUID) • Nodes receiving job submissions are referred to as initiators for these jobs • Initiators issue resource discovery queries across the grid peer-to-peer overlay by broadcasting REQUEST messages

  45. Job Acceptance Phase • If the request cannot be satisfied, the message is further forwarded on the peer-to-peer overlay • otherwise a cost value for the job based on actual resources and current scheduling is computed and sent back to the job’s initiator by means of an ACCEPT message • The initiator evaluates incoming ACCEPT responses, and selects the best qualified node, and sends an ASSIGN message

  46. Dynamic Rescheduling Phase • the assignee attempts to find candidates for rescheduling of jobs in its queue while their execution has not yet started by the INFORM messagges • The structure of INFORM messages relates to that of REQUEST messages

  47. EVALATION • For the evaluation of ARiA, an overlay of 500 nodes with a target average path length of 9 hops was deployed in a custom simulator • The average node’s degree attained during simulations was 4, resulting in about 2000 overlay links

  48. EVALATION • In all scenarios a total of 1000 jobs is submitted to random nodes on the grid. Unless otherwise specified, jobs are submitted at 10 seconds intervals • when dynamic rescheduling is enabled, INFORM messages are sent for at most 2 scheduled jobs every 5 minutes

  49. EVALATION • REQUEST messages are forwarded on the overlay for at most 9 hops; at each step, at most 4 random neighbors of the current node are contacted • INFORM messages a more lightweight approach is followed, with at most 8 hops and up to 2 neighbors

  50. EVALATION

More Related