350 likes | 360 Views
This article discusses the birth and evolution of the web, the basics of Web 2.0 and JavaScript, and the importance of performance in web services. It also introduces a tool for diagnosing and optimizing performance issues in large-scale web services.
E N D
Web Overview • The birth of Web: 1989 • Now Web is about everything • Business (HR systems, e.g. NUHR) • Online Shopping (Amazon), Banking (Citibank, Chase) • Communications (Gmail, Facebook) • Become mission-critical • Performance • Security
Web 2.0 • Web 1.0 • Basic HTML + Images • What is Web 2.0? • No one really gives a clear definition • Features • AJAX ( Asynchronous JavaScript and XML) • DOM (Document Object Model) • Flash • CSS (Cascading Style Sheets) • User involvement: Wiki, Blog, Social Networks
Web 2.0 Basics - JavaScript • JavaScript • A scripting language with C/C++ like grammar • Dynamic, weakly typed language • Eval() • No need to claim the object types • Web 2.0 websites are JavaScript heavy • Google Maps (510KB) • Google Calendar (152KB) • Facebook (558KB)
DOM (Document Object Model) <html> <head> <title>Sample Document</title> </head> <body> <h1>An HTML Document</h1> <p>This is a <i>simple</i> document. </html> • One of the first JavaScript/DOM heavy apps: Gmail • DOM Event API: Keyboard and mouse events • DOM CSS API
AJAX ( Asynchronous JavaScript and XML) req = new XMLHttpRequest(); function callback () { … } function handler () { if (req.readyState == 4 && req.status == 200) { callback(req.responseText); } } req.onreadystatechange = handler; req.open(“GET”, url, true); req.send(null); • XMLHttpRequest • register a callback function to be asynchronous • enable JavaScript visit the url directly • response can be either plain text or XML • Foundation of popular web apps: Google Map, Gmail, Facebook, etc. • Can transfer any object between browser Web server, e.g. XML or JSON (JavaScript Object Notation)
WebProphet: Automating Performance Prediction for Web Services Zhichun Li, Ming Zhang, Zhaosheng Zhu, Yan Chen, Albert Greenberg and Yi-min Wang Northwestern University Microsoft Research
Large-scale Web Services • Most large-scale online services today are web-based • Web search, map, Webmail, calendar, online stores, etc. • Provided by Online Service Providers (OSPs) • MSN, Google, Yahoo, Amazon, etc. • Hosted by multiple data-centers around the world • More and more complex • Yahoo Maps: 110 embedded objects, complex object dependencies and 670KB JavaScript
Performance Is Important • Amazon: 1% sale loss at the cost of 100ms extra delay • Google found 500 ms extra delay reduce revenues by up to 20% Need a tool to understand and improve the user perceived performance. Revenue OSP A SLOW! OSP B Revenue
Potential Performance Problems Large Web Services Are Complex Complex UI large browser delay Poor object dependency more RTTs (online map needs 40~60 http objects) Browser Backend DCs Complex DNS redirection long dns query (CNAME) Different servers more dns queries DNS Need a tool to diagnose why slow? and where is the bottleneck? Internet RTT Packet loss interact with TCP Overload Long response time Frontend DCs RTT Packet loss OSP Internal Network Overload Long response time for dynamic contents
Performance Prediction Problem • Many ways can be used for performance optimization. However, cannot try them one by one, huge cost! • What the performance will be under hypothetical optimization strategies? • How to quickly evaluate the predicted performance? Performance Optimization ???
Outline • Motivation • Design • Dependency Extraction • Performance Prediction • Implementation • Evaluation • Conclusion
Client Side Performance Prediction • Provider-based techniques • Hard to consider multiple data sources • Object dependencies • Page rendering time Data Center Internet CDN Data Center
The Page Load Time Decomposition Page Load time ObjectDependency Load time of Objecti Client Delay Net Delay Server Delay DNS Delay TCP 3-WAY Data Transfer RTT Packet loss
System Architecture New Scenarios Measurement Engine Performance Predictor Dependency Extractor PDGs Results
Outline Motivation Design Dependency Extraction Performance Prediction Implementation Evaluation Conclusion 15
What are dependencies? • The embedded objects in an HTML page • Object requests generated by JavaScript depend on the corresponding .JS files • External CSS and JavaScript files blocks the other embedded objects in the HTML page • Event triggers, such as when image B trigger “onload” event, then image A will be load by JavaScript
Dependency Definitions • Descendant(X): objects that depend on X • Ancestor(X): objects that X depends on • Parent(X): The objects that X directly depends on. Direct means can be the last among ancestors • Based on parent relationship build PDG (parental dependency graph)
Discover Ancestors and Descendants • We discover the descendant(X) sets by using time perturbation through HTTP proxy.
Extract non-stream parents • Stream VS. Non-stream • HTML is stream objects and other types of objects are non-stream • Non-stream parent extraction X Y Z Descendant(A)={B,D} Descendant(B)={D} B A D C
Extract stream parents • 1) Load the HTML page very slow • 2) Delay other known non-stream parents Offset(Z) X Y Z X Y Z
Extract stream parents • 1) Load the HTML page very slow • 2) Delay other known non-stream parents Offset(Z) X Y Z Offset2(Z) X Y Z
Outline Motivation Design Dependency Extraction Performance Prediction Implementation Evaluation Conclusion 22
Performance Prediction Procedure Packet trace PDG New Scenario PDG Extract Object timing information Annotate client delay Adjust each of object according to new scenario Simulate the page load process
Object Timing Info • Basic object timing info • Adding client delay info DNS DNS lookup time TCP TCP handshaking time Response time HTTP Reply transfer time Request transfer time X Parent(X) Client delay
Adjust Object Timing Info • Adjust DNS lookup time directly • Server response time: change the response time • RTT: ΔRTT m * ΔRTT n * ΔRTT
Simulating Page Load Process I • Browser behaviors
Simulating Page Load Process II • Page load process • Find the earliest candidate C from CandidateQueue • Load C according to the conditions in the pervious slide • Find new candidates whose parents are all available • Adjust timings of new candidates • Insert new candidates into CanidateQueue
Outline Motivation Design Dependency Extraction Performance Prediction Implementation Evaluation Conclusion 28
Performance Predictor Trace Analyzer Web Agent Web Proxy Annotate object timing info Page simulator Dependency Extractor WebProphet Framework Web robot Scripting API Application transaction script snippet The whole system is about 12,000 lines of code Control plug-in Browser Agent network Traces Pcap trace logger New scenario input PDGs Results
Dependency Extraction Results • Google and Yahoo Search • Validation: manual code analysis Google Yahoo
Dependency Extraction Results Yahoo • Google and Yahoo Maps • Validation: Create fake pages with the same PDGs and validate the fake pages Google
Predication Accuracy • Evaluate both median and 95-percentile • Control experiment • 50% cases with predication error less than 6.1% • 90% cases with predication error less than 16.2% • Planetlab experiment • Predication error of median less than 6.1% • Predication error of 95-percentile less than 10.7%
Usage Scenarios • Analyze how to improve Yahoo Maps • Only want to optimize a small number of objects • Use a greedy based search • Evaluate 2,176 hypothetical scenarios, find that • Move 5 objects to CDN: 14.8% • Reduce client delays of 14 objects to half: 26.6% • Combine both: 40.1%
Outline Motivation Design Dependency Extraction Performance Prediction Implementation Evaluation Conclusion 34
Conclusions • Develop a novel technique to extract the object dependencies of complex web pages • Implement a simple but yet effective model to simulate the page load process • Apply Webprophet to Yahoo Map to show that it can be useful for performance optimization