230 likes | 385 Views
Optimizing Network Performance In Replicated Hosting. Peter Steenkiste (CMU) with Ningning Hu (CMU), Oliver Spatscheck (AT&T), Jia Wang (AT&T). Motivation. The question of how to use latency to select a replicated web server has been well studied How about using available bandwidth?.
E N D
Optimizing Network Performance In Replicated Hosting Peter Steenkiste(CMU) with Ningning Hu (CMU), Oliver Spatscheck (AT&T), Jia Wang (AT&T) Carnegie Mellon University
Motivation • The question of how to use latency to select a replicated web server has been well studied • How about using available bandwidth? ? Carnegie Mellon University
Outline • Pathneck • Internet end user RTT distribution and access bandwidth distribution • Optimization results • For RTT • For bandwidth • For data transmission time Carnegie Mellon University
measurement packets measurement packets Load packets 1 2 20 100 100 100 100 100 20 2 1 20 pkts, 60 B 60 pkts, 500 B 20 pkts, 60 B TTL Pathneck: Recursive Packet Train (RPT) • Two measurement packets are dropped at each router • ICMP packets allow source to estimate train length at each hop • Changes in train length provide bounds on the available bandwidth of each link Carnegie Mellon University
1 2 3 4 100 100 100 100 100 4 3 2 1 g1 g1 0 0 g2 1 2 3 99 99 99 99 99 3 2 1 g2 0 0 g2 1 2 98 98 98 98 98 2 1 1 2 98 98 98 98 98 2 1 g3 0 0 1 97 97 97 97 97 1 Pathneck Operation S R1 R2 R3 Carnegie Mellon University
Pathneck Properties • Pathneck is an active probing tool designed for locating Internet bottlenecks • It is efficient and effective • Also provide route, delay, and bandwidth information • For technical detail please see www.cs.cmu.edu/~hnn/pathneck • We improve Pathneck to cover the last hop • This allows us to measure the RTT and the access bandwidth of many end users. Carnegie Mellon University
Methodology • Measurement sources: 18 nodes from a large tier-1 ISP • 14 in the US, 3 in Europe, and 1 in East-Asia • Large fraction of paths cover other ISPs • Play the role of possible replica sites • Measurement destinations: 164,130 IP addresses from different prefixes • 67,271 IPs correspond to real online hosts • Firewalls etc sometime require us to use intermediate node as “virtual” destination • Play the role of clients accessing the web Carnegie Mellon University
Results • Internet end user RTT distribution and access bandwidth distribution • Optimization results • For RTT • For bandwidth • For data transmission time Carnegie Mellon University
RTT Distribution • The RTT “views” of Internet clients from different geographical locations are significantly different Europe US-NE East-Asia Carnegie Mellon University
Bandwidth Distribution • The bandwidth “views” are much more alike East-Asia Europe US-NE Carnegie Mellon University
End Access Bandwidth Distribution • Low access bandwidth still dominates among end users Limited by downstream bandwidth of measurement source 62.5% < 10Mbps 50% < 4.2Mbps 40% < 2.2Mbps Carnegie Mellon University
Bottleneck Location Distribution • 75% of bottleneck links are at the last two hop • Little chance to avoid these bottlenecks using replication • However, when access bandwidth is higher than 40Mbps, content replication can help to improve performance Carnegie Mellon University
Results • Internet end user RTT distribution and access bandwidth distribution • Optimization results • For RTT • For bandwidth • For data transmission time Carnegie Mellon University
Optimization Algorithm • We use simple greedy algorithm to optimize the performance of our replication infrastructure • In each step, select the replication node that has the largest marginal utility • Greedy algorithm has been shown to be able to obtain results very close to the optimal results • For our study, it is only 0.1% worse than the optimal results from brute-force search Carnegie Mellon University
RTT Optimization • RTT optimization results have a clear geographical pattern • The first 5 replicas provide most of the benefit US-Central East-Asia US-West Europe US-East Carnegie Mellon University
Marginal Utility of RTT Optimization • The first 5 nodes have significant improvement (i.e., larger than 5%) • [ Marginal utility: the relative performance improvement from a specific node ] Carnegie Mellon University
Bandwidth Optimization • The first 2 replicas provide most of the benefit Carnegie Mellon University
Marginal Utility for B.W. Optimization • Only the first 2 (3) nodes have significant improvement Carnegie Mellon University
For Well-provisioned Access Links • Replication can indeed improve bandwidth performance for end users with access bandwidth larger than 40Mbps 74% 35% 54Mbps Carnegie Mellon University
Data Transmission Time • End-users’ data transmission time depends on delay, bandwidth, and data size • We estimate data transmission time using a simplified TCP model: a slow start and congestion avoidance phase • Assumes no packet loss • Slow start: transfer time is delay sensitive • Congestion avoidance: bandwidth sensitive • Data size determines whether replication should optimize delay or bandwidth • Use “slow-start size” as cross over point • Results: 70% of paths have slow-start size larger than 10KB • Larger than the average web page Carnegie Mellon University
Data Transmission Time (2) • The transmission times for 10KB, 100KB, 1MB and 10MB are 0.4s, 1.1s, 6.4s, and 59.2s, respectively Carnegie Mellon University
Related Work • Content replication with different optimization metrics • Geographic location, network hops and latency, • Retrieval costs, update cost, storage cost, • QoS guarantee, … • Greedy algorithm used in replica selection Carnegie Mellon University
Conclusion • Quantify Internet end-node access-bandwidth distribution and bottleneck location distribution • Two differences distinguish the optimization on bandwidth and on RTT • Geographic location is not important for bandwidth optimization • For throughput, only well-provisioned end users can benefit from content replication Carnegie Mellon University