380 likes | 473 Views
Whole Page Performance. Leeann Bent and Geoffrey M. Voelker University of California, San Diego. Whole Page Performance?. Extensive previous work on how specific techniques affect individual object download. Caching, Prefetching, CDNs, DNS caching.
E N D
Whole Page Performance Leeann Bent and Geoffrey M. Voelker University of California, San Diego
Whole Page Performance? • Extensive previous work on how specific techniques affect individual object download. • Caching, Prefetching, CDNs, DNS caching. • However, user downloads pages of objects. • Not clear how individual object performance maps onto whole page performance • Goal: Study whole page performance • Extent to which different optimizations are used • Effect on downloading whole pages of objects WWCCD ‘02
Related Work • [Krishnamurthy and Wills99] look at: • Parallel (HTTP1.0), persistent and pipelined connections. • In addition to caching, range requests, and content placed on different servers. • Top-level pages of popular sites. • Focus on pages where all optimizations used. • Our Study: • Follow on, with a different perspective. • Use real user workloads. • All pages, not just top level pages on popular servers • Not all pages use optimizations • Base page + embedded objects. • Connection optimizations + CDNs + DNS. WWCCD ‘02
Overview • Introduction • Methodology • Results • Conclusion WWCCD ‘02
Methodology Overview • Use Medusa to: • Record everyday browsing from six users over four days. • Replay traces toggling performance options: • Parallel Connections • Using CDNs • Complete DNS caching • Persistent Connections • Compute download costs for whole pages WWCCD ‘02
User Driven Behavior Trace Driven Behavior The Medusa Proxy Internet WWCCD ‘02
Page Download Time • Page download time • Time required to download base page and all embedded objects. • Reflects user-perceived web performance • Calculated using object download time. • Determine object download time from just after DNS lookup to connection close or full object return (persistent). • Incorporate original recorded DNS times where appropriate. WWCCD ‘02
Example Individual Object Times: Page Download Times: 854 ms Serial Parallel (2 conns) 259 ms 580ms 580 ms WWCCD ‘02
Traces • Six users: April 27 - 30 (Sat. - Tues.). • Originally 22,228 objects and 1,455 pages. • Remove error pages. • Replay data gathered May 6-7 (Mon - Tues)& June 22-27 (Sat. – Thurs.). • Minimize warming effects by taking median of 5 consecutive page downloads. WWCCD ‘02
Optimization Combinations • Parallel Connections (1) • Medusa tracks number of concurrent connections used during trace. • Used to replay parallel download. • CDN Usage (2) • When no CDN usage, remove CDN references. • Replace with references to origin servers. • When CDN usage enabled, traces left intact. • DNS Caching (3) • Simulate ideal DNS caching by excluding DNS time. • Normal DNS: add originalDNS lookup times from trace. • Persistent Connections (4) • Use whichever protocol (1.0/1.1) recorded in original trace. WWCCD ‘02
Overview • Introduction • Methodology • Results • Conclusion WWCCD ‘02
Whole Page Optimizations • Parallel gives large improvement. • CDN improvement small. • 2.5% • DNS improvement consistent. • 7.4% • 6.7% • Persistent connections not as helpful as expected • 1.5% WWCCD ‘02
Overall Trace Conclusions • Parallelism has the greatest effect. • Parallelism used aggressively on all pages. • All other options provide incremental benefits. • Does not mean other optimizations don’t work. • Some overheads may be relatively small. • Average over all pages. • Not all pages implement all optimizations. • We don’t simulate more aggressive use of options than found in original trace. • A closer look… WWCCD ‘02
Ideal DNS Caching • Average DNS costs: • Per object: 7.1 ms • Per page: 529 ms • DNS improvement moderate across the board. • 5 – 14% improvement across all pages. • Provides moderate benefit to all pages. • Not all objects require full DNS lookups • Already effective DNS caching in traces WWCCD ‘02
Objects Per Page • We would expect some other optimizations to have a greater effect (e.g. persistent connections). • Looking at all pages in trace doesn’t tell the whole story. • Less opportunity for connection optimizations on small pages. • Page with one object counts as much as a page with 152 objects. • Optimizations more effective on a page with 152 objects. • Separate out effects of optimizations in pages with different numbers of objects: • Median number of objects per page is 5. • Average number of objects per page is 15. WWCCD ‘02
Page Breakdown • 1-5 objects • 1: 21% • 2-5:63% • 6+ objects improvements. • 6-15: 157% • 16+: 183% • Persistent • 1.95% • 18.5% WWCCD ‘02
Page Breakdown Conclusions • Performance optimizations dependent on number of objects per page. • Optimizations more effective when more objects per page. • Especially connection optimizations. • Single object pages see moderate improvement. • Can usually only benefit from DNS caching and CDNs. • Persistent benefit only if on same server as previous page. • And 26% of pages had one object WWCCD ‘02
Persistent Connections • Still don’t see a whole lot of improvement for persistent connections. • Expected to see more benefit for 16+ objects. • Not all pages use persistent connections. • 20% of pages in our trace use them (229 pages). • 2211 objects or 16.1%. • 9.65 objects per page. • Look at only pages that contain persistent connections. WWCCD ‘02
Persistent Connections • Persistent connections useful if: • Many objects downloaded over persistent connections in the original trace. • Objects downloaded from few servers. • For pages < 6 objects: • 2 out of 3 downloaded with persistent connections. • Average page size 3. • On average, 1.32 persistent objects per server. • For pages >= 16 objects: • Average 18 objects with persistent connections. • On average, 3.92 persistent objects per server. WWCCD ‘02
Mostly Persistent Pages • Know what it takes to see persistent optimization improvement: • Look at large pages where persistent connections used extensively (>50% of objects). • Pages that can benefit, do: • 6+ objects improve 33-50%. WWCCD ‘02
CDN • Previous study showed CDNs highly effective for individual objects. [Koletsou01] • What is effect on whole page performance? • Few pages with explicit Akamai-hosted objects. • 48 pages or 5.2% of pages. • 216 objects or 1.6% of total downloaded objects. • Average of 4.5 CDN objects per page. • Looked at CDN only page improvements: • CDNs improve CDN containing pages 6% - 30%. WWCCD ‘02
Conclusions • Parallel connections have greatest impact. • Universally applicable and easy to implement. • Other options give incremental performance across all pages. • Some optimizations provide consistent, but moderate, improvement across all pages. • Some optimizations are not implemented on all pages. • Provide benefit when used extensively. WWCCD ‘02
Conclusions • Can we draw correlation between object and real-world whole page performance? • Depends. • Not all optimizations widely used. • When optimizations are used to full advantage, they are effective. WWCCD ‘02
Medusa Available http://ramp.ucsd.edu/~lbent/Medusa/index.html WWCCD ‘02
Medusa Proxy Functionality • Trace and Replay • Record requests and replay. • Parallel connections. • Persistent connections. • Transformation • CDN/no CDN replay. • Performance Measurement • Request latency. • DNS overhead. • Optimization options • Use parallel connections. • Use persistent connections. • HTTP 1.0 and HTTP 1.1. • Always attempt, never attempt, mirror trace attempt. WWCCD ‘02
Page Delimitation • Determining pages: • Necessary for: • Calculating total page costs. • Limiting optimizations to within one page. • Parallel Connections. • Can analyze page and draw object dependencies. • High overhead • May impact user • Use inter-object times in the original trace data. • Use 2 second inter-object times. WWCCD ‘02
Akamaized URLs • Akamai accounts for 85%-98% of CDN hosted objects [ref]. • Will not account for sites completely hosted on Akamai hosts. • Filter: • http://a1964.g.akamai.net/f/1964/2730/1h/app.whenu.com/image.gif • http://app.whenu.com/image.gif WWCCD ‘02
Interleaved Requests • Requests may get interleaved when recorded in parallel mode and replayed in serial mode • E.G. • Connection 0 requests: www.cnn.com, www.cnn.com/style.css. • Connection 1 requests: ar.atwola.com. • Requests may be ordered in trace as: • www.cnn.com, ar.atwola.com, www.cnn.com/style.css. • Negates benefit of parallel connections. WWCCD ‘02
Page Characterization: Objects per Page WWCCD ‘02
Object Types • Identified object type by clues in URL: • 80% of URLs images (.gif, .jpg). • 5.6% html file (.htm, .html). • 3.8% cgi, perl or javascript (?,.pl, .class). • 3.3% javascript (.js). • 3.6% unidentified (no suffix, pdf, txt, etc). WWCCD ‘02
Persistent Connection/Brower • Persistent connections appear correlated with browser: • IE - 12% pgs, 15.8% objs. • Netscape - 19.5% pgs,10.0% objs. • Omniweb - 66.0% pgs, 72.4% objs. • Mozilla 5.0/Gecko - 95.8% pgs, 91.3% objs. WWCCD ‘02
Persistent Connection Pages • Still not as improved as expected: • Better than for only large pages: • Serial 7.28% vs. 1.98% • Parallel 24.03% vs.18.5% • Medians don’t show improvements in all cases. WWCCD ‘02
Mostly Persistent Pages WWCCD ‘02
Persistent Connections per Page WWCCD ‘02
Same as previous 16+ WWCCD ‘02
Ad-Servers • Identified by identifying hosts that were named with the phrases “ads” and “adserver”. • YES:http://rmads.msn.com/images_47144_date_0429_50.jpg. • NO:http://graphics4.nytimes.com/ads/scottrade_sov.gif. WWCCD ‘02
Ad-Servers and DNS • Number of pages with ad-servers. • 9.5% of pages, 1.53% of total objects. • Average of 2.4 ads per page. • Objects not hosted on content server. • DNS lookup may be large part of lookup cost. • DNS caching doesn’t give great improvement: • DNS caching improves parallel case 10.9%. • Compared with 12.2% over all pages. • DNS caching improves parallel, persistent case 8%. • Compared with 6.3% over all pages. • DNS caching improves parallel, persistent w/ CDN 4.7%. • Compared to 6.3%. WWCCD ‘02