1 / 23

Optimizing Network Performance In Replicated Hosting

Optimizing Network Performance In Replicated Hosting. Peter Steenkiste (CMU) with Ningning Hu (CMU), Oliver Spatscheck (AT&T), Jia Wang (AT&T). Motivation. The question of how to use latency to select a replicated web server has been well studied How about using available bandwidth?.

hedwig
Download Presentation

Optimizing Network Performance In Replicated Hosting

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Optimizing Network Performance In Replicated Hosting Peter Steenkiste(CMU) with Ningning Hu (CMU), Oliver Spatscheck (AT&T), Jia Wang (AT&T) Carnegie Mellon University

  2. Motivation • The question of how to use latency to select a replicated web server has been well studied • How about using available bandwidth? ? Carnegie Mellon University

  3. Outline • Pathneck • Internet end user RTT distribution and access bandwidth distribution • Optimization results • For RTT • For bandwidth • For data transmission time Carnegie Mellon University

  4. measurement packets measurement packets Load packets 1 2 20 100 100 100 100 100 20 2 1 20 pkts, 60 B 60 pkts, 500 B 20 pkts, 60 B TTL Pathneck: Recursive Packet Train (RPT) • Two measurement packets are dropped at each router • ICMP packets allow source to estimate train length at each hop • Changes in train length provide bounds on the available bandwidth of each link Carnegie Mellon University

  5. 1 2 3 4 100 100 100 100 100 4 3 2 1 g1 g1 0 0 g2 1 2 3 99 99 99 99 99 3 2 1 g2 0 0 g2 1 2 98 98 98 98 98 2 1 1 2 98 98 98 98 98 2 1 g3 0 0 1 97 97 97 97 97 1 Pathneck Operation S R1 R2 R3 Carnegie Mellon University

  6. Pathneck Properties • Pathneck is an active probing tool designed for locating Internet bottlenecks • It is efficient and effective • Also provide route, delay, and bandwidth information • For technical detail please see www.cs.cmu.edu/~hnn/pathneck • We improve Pathneck to cover the last hop • This allows us to measure the RTT and the access bandwidth of many end users. Carnegie Mellon University

  7. Methodology • Measurement sources: 18 nodes from a large tier-1 ISP • 14 in the US, 3 in Europe, and 1 in East-Asia • Large fraction of paths cover other ISPs • Play the role of possible replica sites • Measurement destinations: 164,130 IP addresses from different prefixes • 67,271 IPs correspond to real online hosts • Firewalls etc sometime require us to use intermediate node as “virtual” destination • Play the role of clients accessing the web Carnegie Mellon University

  8. Results • Internet end user RTT distribution and access bandwidth distribution • Optimization results • For RTT • For bandwidth • For data transmission time Carnegie Mellon University

  9. RTT Distribution • The RTT “views” of Internet clients from different geographical locations are significantly different Europe US-NE East-Asia Carnegie Mellon University

  10. Bandwidth Distribution • The bandwidth “views” are much more alike East-Asia Europe US-NE Carnegie Mellon University

  11. End Access Bandwidth Distribution • Low access bandwidth still dominates among end users Limited by downstream bandwidth of measurement source 62.5% < 10Mbps 50% < 4.2Mbps 40% < 2.2Mbps Carnegie Mellon University

  12. Bottleneck Location Distribution • 75% of bottleneck links are at the last two hop • Little chance to avoid these bottlenecks using replication • However, when access bandwidth is higher than 40Mbps, content replication can help to improve performance Carnegie Mellon University

  13. Results • Internet end user RTT distribution and access bandwidth distribution • Optimization results • For RTT • For bandwidth • For data transmission time Carnegie Mellon University

  14. Optimization Algorithm • We use simple greedy algorithm to optimize the performance of our replication infrastructure • In each step, select the replication node that has the largest marginal utility • Greedy algorithm has been shown to be able to obtain results very close to the optimal results • For our study, it is only 0.1% worse than the optimal results from brute-force search Carnegie Mellon University

  15. RTT Optimization • RTT optimization results have a clear geographical pattern • The first 5 replicas provide most of the benefit US-Central East-Asia US-West Europe US-East Carnegie Mellon University

  16. Marginal Utility of RTT Optimization • The first 5 nodes have significant improvement (i.e., larger than 5%) • [ Marginal utility: the relative performance improvement from a specific node ] Carnegie Mellon University

  17. Bandwidth Optimization • The first 2 replicas provide most of the benefit Carnegie Mellon University

  18. Marginal Utility for B.W. Optimization • Only the first 2 (3) nodes have significant improvement Carnegie Mellon University

  19. For Well-provisioned Access Links • Replication can indeed improve bandwidth performance for end users with access bandwidth larger than 40Mbps 74% 35% 54Mbps Carnegie Mellon University

  20. Data Transmission Time • End-users’ data transmission time depends on delay, bandwidth, and data size • We estimate data transmission time using a simplified TCP model: a slow start and congestion avoidance phase • Assumes no packet loss • Slow start: transfer time is delay sensitive • Congestion avoidance: bandwidth sensitive • Data size determines whether replication should optimize delay or bandwidth • Use “slow-start size” as cross over point • Results: 70% of paths have slow-start size larger than 10KB • Larger than the average web page Carnegie Mellon University

  21. Data Transmission Time (2) • The transmission times for 10KB, 100KB, 1MB and 10MB are 0.4s, 1.1s, 6.4s, and 59.2s, respectively Carnegie Mellon University

  22. Related Work • Content replication with different optimization metrics • Geographic location, network hops and latency, • Retrieval costs, update cost, storage cost, • QoS guarantee, … • Greedy algorithm used in replica selection Carnegie Mellon University

  23. Conclusion • Quantify Internet end-node access-bandwidth distribution and bottleneck location distribution • Two differences distinguish the optimization on bandwidth and on RTT • Geographic location is not important for bandwidth optimization • For throughput, only well-provisioned end users can benefit from content replication Carnegie Mellon University

More Related