500 likes | 636 Views
Web Performance Modeling Issues. Daniel A. Menascé Department of Computer Science George Mason University http://www.cs.gmu.edu/faculty/menasce.html. ã 1998 Menascé, D. A.. All Rights Reserved. Outline. E-commerce facts. WWW Traffic Characterization. Improving Web Performance.
E N D
Web Performance Modeling Issues Daniel A. Menascé Department of Computer Science George Mason University http://www.cs.gmu.edu/faculty/menasce.html ã 1998 Menascé, D. A.. All Rights Reserved.
Outline • E-commerce facts. • WWW Traffic Characterization. • Improving Web Performance. • Predicting Web Performance. • An Example. • Concluding Remarks. ã 1998 Menascé, D. A.. All Rights Reserved.
Part I E-commerce Facts ã 1998 Menascé, D. A.. All Rights Reserved.
Electronic Commerce: online sales are soaring “… IT and electronic commerce can be expected to drive economic growth for many years to come.” The Emerging Digital Economy, US Dept. of Commerce, 1998. ã 1998 Menascé, D. A.. All Rights Reserved.
Caution Signs Along the Road There will be jolts and delays along the way for electronic commerce: congestion is the most obvious challenge. (Gross & Sager, Business Week, June 22, 1998, p. 166.) ã 1998 Menascé, D. A.. All Rights Reserved.
What people are saying about Web performance… • “Tripod’s Web site is our business. If it’s not fast and reliable, there goes our business.”, Don Zereski, Tripod’s vice-president of Technology (Internet World) ã 1998 Menascé, D. A.. All Rights Reserved.
What people are saying about Web performance… • “Sites have been concentrating on the right content. Now, more of them -- specially e-commerce sites -- realize that performance is crucial in attracting and retaining online customers.” Gene Shklar, Keynote, The New York Times, 8/8/98 ã 1998 Menascé, D. A.. All Rights Reserved.
What people are saying about Web performance… • “Capacity is King.” Mike Krupit, Vice President of Technology, CDnow, 06/01/98 • “Being able to manage hit storms on commerce sites requires more than just buying more plumbing.” Harry Fenik, vice president of technology, Zona Research, LANTimes, 6/22/98 ã 1998 Menascé, D. A.. All Rights Reserved.
E-commerce facts • Businesses will exchange $327 billion in goods and services by the year 2,002. • Cisco Systems sells $4 billion/yr on the Web at a cost savings of $363 million. • General Electric estimates that e-commerce will save them $500 million over the next three years. • Boeing booked $100 million in spare parts in the first seven month of activity of its Web site. • Texas Instruments fills 60,000 orders a month through its Web site meeting delivery deadlines 95% of the time. ã 1998 Menascé, D. A.. All Rights Reserved.
Business in the Internet Age (Business Week, June 22, 1998) ã 1998 Menascé, D. A.. All Rights Reserved.
Part II WWW Traffic Characteristics ã 1998 Menascé, D. A.. All Rights Reserved.
WWW Traffic Characteristics • Unpredictable in nature. • Self-similar, i.e., bursty over several time scales. • Load spikes can be many times higher than average traffic. • Workload characterization studies done at: • client side • proxy cache • server • Web • see http://www.parc.xerox.com/istl/projects/http-ng/web-characterization-reading.html ã 1998 Menascé, D. A.. All Rights Reserved.
Workload Characterization at the Client Side Cunha, Bestavros, and Crovella (1995) • Half a million requests from instrumented Mosaic in an academic setting. • The distribution of document sizes, popularity of documents as a function of size, distribution of user requests for documents, and number of references to documents as a function of overall rank in popularity can be modeled bypower-law distributions. ã 1998 Menascé, D. A.. All Rights Reserved.
Workload Characterization at the Client Side Cunha, Bestavros, and Crovella (1995) • 22% of the requests generated by the browser were cache misses. • 96% of the total requests were for html files and only 1% for CGI bin requests. • Current studies show that dynamically generated pages ranging from 2 to 6% (Almeida98) ã 1998 Menascé, D. A.. All Rights Reserved.
Workload Characterization at the Client Side Cunha, Bestavros, and Crovella (1995) • 79% of requests were for external servers • Less than 10% of requests were for unique URLs, i.e., URLs not previously referenced. • 9.6% of accesses were to html files with an average size of 6.4 KB and 69% to images with an average size of 14KB. ã 1998 Menascé, D. A.. All Rights Reserved.
Workload Characterization at the Client Side Tauscher and Greenberg (1997) • Six weeks of WWW usage by 23 users. • 58% of pages visited are revisits. • Users tend to visit pages just visited more often than pages visited less recently. ã 1998 Menascé, D. A.. All Rights Reserved.
Workload Characterization at the Proxy Server Abrams, Standrige, Abdulla, Williams, and Fox (1995) • Six months of data from 3 educational sites. • Trace-driven simulation of a cache proxy server. • The maximum cache hit rate was between 30 and 50% for infinite size caches regardless of cache design. ã 1998 Menascé, D. A.. All Rights Reserved.
Workload Characterization at the Server Arlitt and Williamson (1996) • Six WWW servers: academic and commercial. • Number of requests ranged from 188K to 3.5M per site. • Search for invariants. ã 1998 Menascé, D. A.. All Rights Reserved.
Workload Characterization at the Server Arlitt and Williamson (1996) • HTML and image files account for 90-100% of requests • The average size of a transferred document does not exceed 21KB • Less than 3% of the requests are for distinct files. • The file size distribution is Pareto with 0.40 < < 0.63. I.e., this distribution is heavy-tailed. ã 1998 Menascé & Almeida. All Rights Reserved.
Workload Characterization at the Server Arlitt and Williamson (1996) • Ten percent of the files accessed account for 90% of server requests and 90% of the bytes transferred. • File inter-reference times are exponentially distributed and independent. • At least 70% of the requests come from remote sites. These requests account for at least 60% of the bytes transferred. ã 1998 Menascé, D. A.. All Rights Reserved.
Workload Characterization at the Server Crovella and Bestravos (1996) • Traces of users using Mosaic reflecting requests to over half a million documents. • Purpose: show the presence of self-similarity in Web traffic and explain it through the underlying characteristics of the WWW workload. ã 1998 Menascé, D. A.. All Rights Reserved.
Workload Characterization at the Server Crovella and Bestravos (1996) • File sizes have a heavy-tailed distribution. • This distribution may explain the fact that transmission time distributions are also heavy-tailed. ã 1998 Menascé, D. A.. All Rights Reserved.
Workload Characterization at the Server Almeida and Oliveira (1996) • Used fractal models to study the document reference pattern at Web servers. • Used an LRU stack model to study references to documents stored in two Web sites. • Found strong evidence of self-similarity in the document reference pattern. ã 1998 Menascé, D. A.. All Rights Reserved.
Web Traffic Workload Characterization Bray (1996) • Over 11 million Web pages were analyzed in 1995. • The average page size was 6,518 bytes with a standard deviation of 31,678 bytes. • About 50% of the pages were found to have at least one embedded image and 15% were found to have exactly one image. ã 1998 Menascé, D. A.. All Rights Reserved.
Web Traffic Workload Characterization Bray (1996) • Over 80% of the sites are pointed by a few (between 1 and 10) other sites. • Almost 80% of the sites contain no links to off-site URLs. • Around 45% of the files had no extension and 37% were html files. Then .gif and .txt files were the next most popular with 2.5% each. ã 1998 Menascé, D. A.. All Rights Reserved.
Web Workload Characterization • File size and request sizes are heavy tailed. • Popularity: • Zipf’s Law: the number of references, P, to a file tends to be inversely proportional to its rank r: P = k/r • Temporal locality: • refers to the likelihood that once a document has been requested it will be requested again in the near future. ã 1998 Menascé, D. A.. All Rights Reserved.
Web Workload Characterization • SURGE (Barford and Crovella, ACM Sigmetrics 1998): workload generator that mimics real Web users. • SURGE exercises Web servers quite differently from most commonly used benchmarks (i.e., SPECweb96) • maintains a higher number of open connections • results in much higher CPU load ã 1998 Menascé, D. A.. All Rights Reserved.
Part III Improving Web Performance ã 1998 Menascé, D. A.. All Rights Reserved.
Improving Web Performance Through Caching and Prefetching • Prefetching and caching of inlines. (Dodge and Menascé, 1998) • Prefetching Results of Queries to Search Engines. (Foxwell and Menascé, 1998) ã 1998 Menascé, D. A.. All Rights Reserved.
Improving Web Performance Through Caching and Prefetching • Prefetching and caching of inlines. (Dodge and Menascé, 1998) • Prefetching Results of Queries to Search Engines. (Foxwell and Menascé, 1998) ã 1998 Menascé, D. A.. All Rights Reserved.
No Caching/Prefetching of Inlines Browser Server HTTP request server disk HTTP document HTML document parsed by the browser inline 1 request inline 1 file inline 2 request inline 2 file ã 1998 Menascé, D. A.. All Rights Reserved.
Caching/Prefetching of Inlines ã 1998 Menascé, D. A.. All Rights Reserved.
Web browsers Network 1 - h Disk CPU h Cache WEB Server ã 1998 Menascé, D. A.. All Rights Reserved.
Response Time of Inline Files (in sec) vs. Cache Size (KB) ã 1998 Menascé, D. A.. All Rights Reserved.
Improving Web Performance Through Caching and Prefetching • Prefetching and caching of inlines. (Dodge and Menascé, 1998) • Prefetching Results of Queries to Search Engines. (Foxwell and Menascé, 1998) ã 1998 Menascé, D. A.. All Rights Reserved.
Probability of Access for Lycos Queries vs. URL Position ã 1998 Menascé, D. A.. All Rights Reserved.
Hit Ratio of Query Results ã 1998 Menascé, D. A.. All Rights Reserved.
Hit Ratio vs. Threshold for Lycos Queries ã 1998 Menascé, D. A.. All Rights Reserved.
Part IV Predicting Web Performance ã 1998 Menascé, D. A.. All Rights Reserved.
The Impact of Burstiness • As shown by some measurements (Banga and Druschel 1997), the maximum throughput of a Web server decreases as burstiness increases. • How can we represent the effects of burstiness in performance models? • We know that the maximum throughput is equal to the inverse of the maximum service demand or the service demand of the bottleneck resource. ã 1998 Menascé, D. A.. All Rights Reserved.
WWW Traffic Burst Bytes 107 106 Chronological time (slots of 1000 sec) ã 1998 Menascé, D. A.. All Rights Reserved.
Traffic Burstiness on the Web • a: ratio between the maximum observed request rate and the average request rate during an observation period. • b: fraction of time during which the instantaneous arrival rate exceeds the average arrival rate. ã 1998 Menascé, D. A.. All Rights Reserved.
The Impact of Burstiness(Menascé and Almeida, 1998) • To account for burstiness, we write the service demand of the bottleneck resource as: • D = Df + b • Dfis the portion of the service demand that does not depend on burstiness • is a factor used to inflate the service demand according to burstiness factor b. It is given by: • = (U1/X10 - U2/X20)/(b1-b2) • The measurement interval is divided into 2 subintervals 1 and 2to obtain Ui, Xi0, and bi ã 1998 Menascé, D. A.. All Rights Reserved.
0.0 0.1 0.2 0.3 Effects of Burstiness on Performance ã 1998 Menascé, D. A.. All Rights Reserved.
Part V Predicting Web Performance: An Example ã 1998 Menascé, D. A.. All Rights Reserved.
Upgrading the Capacity of Your Link to the ISP ã 1998 Menascé, D. A.. All Rights Reserved.
Using QN models to predict Web Performance ã 1998 Menascé, D. A.. All Rights Reserved.
Results of QN Model ã 1998 Menascé, D. A.. All Rights Reserved.
Concluding Remarks • The Web is becoming an important element of the IPG. • Understanding the nature of the Web workload is crucial to being able to predict its performance. • New workload characterization studies for e-commerce sites are required (use of dynamic pages, XML, etc). • Need performance models for the Web that capture the effects of Web traffic characteristics on performance. ã 1998 Menascé, D. A.. All Rights Reserved.
Capacity Planning for Web Performance: metrics, models and methods Prentice Hall, June 1998 Daniel Menascé and Virgilio Almeida ã 1998 Menascé, D. A.. All Rights Reserved.