1 / 23

Network Support for Cloud Services

Network Support for Cloud Services. Lixin Gao, UMass Amherst. Outline. Data center networking Design issues Resource sharing Asynchronous computation model. Conventional Data Center Networks. Hierarchical tree structure High speed core switches are expensive Hard to scale.

siusan
Download Presentation

Network Support for Cloud Services

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Network Support for Cloud Services Lixin Gao, UMass Amherst

  2. Outline • Data center networking • Design issues • Resource sharing • Asynchronous computation model

  3. Conventional Data Center Networks Hierarchical tree structure High speed core switches are expensive Hard to scale

  4. Data Center Network Design • Commodity Hardware • Server • Switch • Scalable • Fat tree, Dcell, Bcube, VL2, ….

  5. Dpillar Structure • Devices • All servers have dual-port • All switches have n-port • Server and switch columns • k columns • Server naming • (col, label), label • Connecting rule • Servers in and , their labels differ at only

  6. Design Issues • Inexpensive • Scale to a large number of servers • Fault Tolerant Routing • Load Balancing

  7. Network Resource Sharing within Data Center • Virtualization of CPU (Xen), memory (DiffEng), storage (SAN) • Network resource can become bottleneck • Sorting and shuffling of MapReduce • Sync among tasks slows down computation • Backup of VMs • Bandwidth sharing • Granularity: point-to-point or group based • Fair share: centralized vs. distributed • Privacy: public cloud vs. private cloud

  8. MapReduce Model • Map: generate key value pairs • Shuffle and sort • Reduce: aggregate values for a key from multiple sources

  9. Iterative Computations Youtube video suggestion BFS PageRank Clustering Pattern Recognition

  10. Synchronous Model • Ease of MapReduce implementation • However, • Overhead of sync operation, sorting • Slow convergence, waste of CPU, network resources • Many iterative computations can be performed asynchronously • PageRank, shorest path, adsorption, link proximity estimation, belief propagation….

  11. Shortest Paths 3 0 4 ∞ 3 ∞ 1 5 1 4 ∞ 1 1 2 ∞ 2 4 ∞ 2 ∞ 5 ∞ 2 map 3 1 ∞ ∞ reduce

  12. Shortest Paths Parallel execution 3 0 4 7 ∞ ∞ 3 1 5 1 4 ∞ 1 1 2 5 ∞ 8 2 4 ∞ 3 2 8 ∞ 5 3 ∞ 2 map 3 1 ∞ 6 5 4 ∞ reduce

  13. Shortest Paths 3 0 4 7 ∞ 3 ∞ 1 5 1 4 ∞ 1 1 2 ∞ 8 5 2 3 4 ∞ 2 8 ∞ 5 3 ∞ 2 Parallel execution map 3 1 6 ∞ 5 ∞ 4 reduce

  14. An Asynchronous Model • A general framework • Eliminate synchronization • Scheduling policy • Prove correctness for a wide range of applications • PageRank, Personalized PageRank • Link Proximity Estimation • Commute time, Katz metric, shortest path • Bayesian Inference • Scheduling policies • Top-k query

  15. Shortest Path Facebook dataset SSSP-m dataset

  16. PageRank Google webgraph PageRank-m webgraph

  17. Conclusions • Network design within data center • Design based on commodity hardware • Network resources sharing • Asynchronous computation framework • Reduced bandwidth requirement • Efficient computation

  18. An Example of Outage planet02.csc.ncsu.edu experiences packet loss on July 30, 2005

  19. Causes of Outages • Most lost packets are caused by routing outages

  20. Towards 5 Nines Reliability • Exploiting redundancy on Internet Path • Multiple routing instances to ensure consistency • Exploiting multiple sites within a cloud • Site selection through route monitoring • Deliver through private WAN

  21. Packet Loss due to Routing Failures • Failover events: 76% packets lost • Recovery events: 26% packets lost Failover Recovery

  22. Round-trip Delay Failover Recovery Failover events have significant impact on packet round-trip delays. In the worst case, packet round-trip delays can be more than 900msec.

  23. Reordering during Failover Events The number of reordered packets is small. However, the offset of reordered packets is large. Larger buffer sizes for real-time applications.

More Related