720 likes | 877 Views
An Integrated Approach to Improving Web Performance. Lili Qiu Cornell University. B-exam December, 2000. Acknowledgements. Robbert van Renesse, George Varghese, Ken Birman, Zygmunt Haas, Eva Tardos Venkata N. Padmanabhan, Geoff Volker, Yin Zhang, Srinivasan Keshav. Outline.
E N D
An Integrated Approach to Improving Web Performance Lili Qiu Cornell University B-exam December, 2000
Acknowledgements • Robbert van Renesse, George Varghese, Ken Birman, Zygmunt Haas, Eva Tardos • Venkata N. Padmanabhan, Geoff Volker, Yin Zhang, Srinivasan Keshav
Outline • Motivation & Open Issues • Solutions • Study the workload of a busy Web server • Optimize TCP performance for Web transfers • Provision the content distribution networks • Summary & Other Work
Motivation • Web is the most dominant traffic in the Internet today • Accounts for over 70% wide-area traffic • Web performance is often unsatisfactory • WWW – World Wide Wait • Consequence: losing potential customers! Network congestion Overloaded Web server
Challenges in Providing Highly Efficient Web Services • Workload characterization • The workload of busy Web sites is not well understood • Protocol inefficiency • Mismatch between Web transfers and TCP protocol • Infrastructure provisioning • Current trend: Content Distribution Networks • Problem: Where to place replicas? Protocol Inefficiency Workload Characterization Infrastructure Provisioning
Our Solutions • Web Workload Characterization • Study the workload of a busy Web server • Improve protocol efficiency • Optimize TCP startup performance for Web transfers • Provision Web replication infrastructure • Develop placement algorithms for content distribution networks (CDNs)
Part I Web Workload Characterization • The Content and Access Dynamics of a Busy Web Site: Findings and Implications. Proceedings of ACM SIGCOMM 2000, Stockholm, Sweden, August 2000. (Joint work with V. N. Padmanabhan)
Motivation • Solid understanding of Web workload is critical for designing robust and scalable systems • Missing piece in previous work: workload of busyWeb servers Internet proxy replica proxy replica proxy Servers Clients
Overview • MSNBC server site • a large news site • consistently ranked among the busiest sites in the Web • server cluster with 40 nodes • 25 million accesses a day (HTML content alone) • Period studied: Aug. – Oct. 99 & Dec. 17, 98 flash crowd • Server logs • HTTP access logs • Content Replication System (CRS) logs • HTML content logs • Data analysis • Content dynamics • Access dynamics
Temporal Stability of File Popularity • Methodology • Consider the traces from a pair of days • Pick the top n popular documents from each day • Compute the overlap • Results • One day apart:significant overlap (80%) • Two months apart: smaller overlap (20-80%) • Ten months apart: very small overlap (mostly below 20%) The set of popular documents remains stable for days
Spatial Locality inClient Accesses Domain membership is significant except when there is a “hot” event of global interest
Spatial Distribution of Client Accesses • Cluster clients using network aware clustering [KW00] • IP addresses with the same address prefix belongs to a cluster • Top 10, 100, 1000, 3000 clusters account for about 24%, 45%, 78%, and 94% of the requests respectively A small number of client clusters contribute most of the requests.
The Applicability of Zipf-law to Web requests • The Web requests follow Zipf-like distribution • Request frequency 1/i, where i is a document’s ranking • The value of is much larger in MSNBC traces • 1.4 – 1.8 in MSNBC traces • smaller or close to 1 in the proxy traces • close to 1 in the small departmental server logs [ABC+96] • Highest when there is a hot event
Impact of larger • Accesses in MSNBC traces are much more concentrated 90% of the accesses are accounted by • Top 2-4% files in MSNBC traces • Top 36% files in proxy traces (Microsoft proxies and the proxies studied in [BCF+99]) • Top 10% files in small departmental server logs reported in [AW96] Popular news sites like MSNBC see much more concentrated accesses Reverse caching and replication can be very effective!
Part II Transport Layer Optimization for the Web • Speeding Up Short Data Transfers: Theory, Architectural Support, and Simulation Results. Proceedings of NOSSDAV 2000 (Joint work with Yin Zhang and Srinivasan Keshav)
Motivation • Characteristics of Web data transfers • Short & bursty [Mah97] • Use TCP • Problem: Short data transfers interact poorly with TCP !
TCP/Reno Basics • Slow Start • Exponential growth in congestion window, • Slow: log(n) round trips for n segments • Congestion Avoidance • Linear probing of BW • Fast Retransmission • Triggered by 3 Duplicated ACK’s
Related Work • P-HTTP [PM94] • Reuses a single TCP connection for multiple Web transfers, but still pays slow start penalty • T/TCP [Bra94] • Cache connection count, RTT • TCP Control Block Interdependence [Tou97]: • Cache cwnd, but large bursts cause losses • Rate Based Pacing [VH97] • 4K Initial Window [AFP98] • Fast Start [PK98, Pad98] • Need router support to ensure TCP friendliness
Our Approach • Directly enter Congestion Avoidance • Choose optimal initial congestion window • A Geometry Problem: Fitting a block to the service rate curve to minimize completion time
Optimal Initial cwnd • Minimize completion time by having the transfer end at an epoch boundary.
Shift Optimization • Minimize initial cwnd while keeping the same integer number of RTTs Before optimization:cwnd = 9 After optimization:cwnd = 5
TCP/SPAND • Estimate network state by sharing performance information • SPAND: Shared PAssive Network Discovery [SSK97] • Directly enter Congestion Avoidance, starting with the optimal initial cwnd • Avoid large bursts by pacing Internet Performance gateway Web Servers
Implementation Issues • Scope for sharing and aggregation • 24-bit heuristic • network-aware clustering [KW00] • Collecting performance information • Performance reports, New TCP option, Windmill’s approach, … • Information aggregation • Sliding window average • Retrieving estimation of network state • Explicit query, active push, … • Pacing • Leaky bucket based pacing
Opportunity for Sharing • MSNBC: 90% requests arrive within 5 minutes since the most recent request from the same client network (using 24-bit heuristic)
Cost for Sharing • MSNBC: 15,000-25,000 different client networks in a 5-minute interval during peak hours (using 24-bit heuristic)
Simulation Results • Methodology • Download files in rounds • Performance Metric • Average completion time • TCP flavors considered • reno-ssr: Reno with slow start restart • reno-nssr: Reno w/o slow start restart • newreno-ssr: NewReno with slow start restart • newreno-nssr: NewReno w/o slow start restart
Summary • TCP/SPAND significantly reduces latency for short data transfers • 35-65% compared to reno-ssr / newreno-ssr • 20-50% compared to reno-nssr / newreno-nssr • Even higher for fatter pipes • TCP/SPAND is TCP-friendly • TCP/SPAND is incrementally deployable • Server-side modification only • No modification at client-side
Part III Provision Content Distribution Networks (CDNs) • On the Placement of Web Server Replicas. To appear in INFOCOM'2001. (Joint work with V. N. Padmanabhan and G. M. Voelker)
Introduction to CDNs • Content providers want to offer better service to their clients at lower cost • Increasing deployment of content distribution networks (CDNs) • Akamai, Digital Island, Exodus … • Idea: a network of servers • Features: • Outsourcing infrastructure • Improve performance by moving content closer to end users • Flash crowd protection CDN server server server server server Content Providers Clients
Placement of CDN servers • Goal • minimize users’ latency or bandwidth usage • Minimum K-median problem • Select K centers to minimize the sum of assignment costs • Cost can be latency or bandwidth or other metric we want to optimize • NP-hard problem CDN server server server server server Content Providers Clients
Placement Algorithms • Tree based algorithm [LGG+99] • Assume the underlying topologies are trees, and model it as a dynamic programming problem • O(N3M2) for choosing M replicas among N potential places • Random • Pick the best among several random assignments • Hot spot • Place replicas near the clients that generate the largest load
Placement Algorithms (Cont.) • Greedy algorithm Greedy(N,M) { for I = 1 .. M { for each remaining replica R { cost[R] = cost after placing an additional replica at R } select the replica with the lowest cost } } • Super Optimal algorithm • Lagrangian relaxation + subgradient method
Simulation Methodology • Network topology • Randomly generated topologies • Using GT-ITM Internet topology generator • Real Internet network topology • AS level topology obtained using BGP routing data from a set of seven geographically dispersed BGP peers • Web Workload • Real server traces • MSNBC, ClarkNet, NASA Kennedy Space Center • Performance Metric • Relative performance: costpractical/costsuper-optimal
Effects of Imperfect Knowledge about Input Data • Predict load using moving window average (a) Perfect knowledge about topology (b) Knowledge about Topology with a factor of 2 accurate
Summary • First experimental study on placement of CDNs • Knowledge about client workload and topology is crucial for provisioning CDNs • The greedy algorithm performs the best • Within a factor of 1.1 – 1.5 of super-optimal • The greedy algorithm is insensitive to noise • Stay within a factor of 2 of the super-optimal when the salted error is a factor of 4 • The hot spot algorithm performs nearly as well • Within a factor of 1.6 – 2 of super-optimal • How to obtain inputs • Moving window average for load prediction • Using BGP router data to obtain topology information
Contributions • Workload characterization • Study the workload of MSNBC web site • Protocol efficiency • Optimize TCP startup performance for Web transfers • Infrastructure provisioning • Develop placement algorithms for Content Distribution Networks Protocol Efficiency Workload Characterization Infrastructure Provisioning
Other Work • Available at http://www.cs.cornell.edu/lqiu/papers/papers.html • Fast Firewall Implementations for Software and Hardware-based Routers. Submitted to ACM SIGMETRICS’2001. • Integrating Packet FEC into Adaptive Voice Playout Buffer Algorithms on the Internet. Proceedings of IEEE INFOCOM'2000, Tel-Aviv, Israel, March 2000. • On Individual and Aggregate TCP Performance. 7th International Conference on Network Protocols (ICNP'99), Toronto, Canada, October 1999.
Contributions • Study the workload of a busy Web server • Develop placement algorithms for Content Distribution Networks • Optimize TCP startup performance for short Web transfers
Integrating Packet FEC into Adaptive Voice Playout Buffer Algorithms • Internet telephony are subject to • Variable loss rate • Variable delay • Previous work has addressed the two problems separately • Use FEC for loss recovery • Use playout buffer adaptation for delay jitter compensation
Integrating Packet FEC into Adaptive Voice Playout Buffer Algorithms (Cont.) • Our work • Demonstrate the interaction between playout algorithm and FEC • Playout algorithm should depend on both FEC and network loss conditions and network jitter • Propose several playout algorithms that provide this coupling • Demonstrate the effectiveness of the algorithms through simulations
On Individual and Aggregate TCP Performance • Motivation • TCP behavior under many competing TCP connections has not been sufficiently explored • Our work • Use extensive simulations to investigate the individual and aggregate TCP performance for many concurrent connections
On Individual and Aggregate TCP Performance (Cont.) • Major findings • All connections have the same rtt • Wc > 3*Conn global synchronization • Conn < Wc < 3*Conn local synchronization • Wc < Conn shut off connections • Adding random processing time synchronization and consistent discrimination less pronounced • Derive the general characterization of overall throughput, goodput, and loss probability • Quantify the roundtrip bias for connections with different RTT
Understanding the End-to-End Performance Impact of RED in a Heterogeneous Environment • Motivation • IETF recommends wide spread deployment of RED in routers • Most previous work studies RED in relatively homogeneous environment • Our work • Investigate the interaction of RED with five types of heterogeneity