Web Performance Optimization: scaling the witchcrafts

Web Performance Optimization: scaling the witchcrafts Xiaoliang “David” Wei • www.facebook.com/DavidWei

Agenda

A web site is a distributed system • A web page request can involve many parties: • Client browser • Web server (and backend caches, DBs) • Content distributor

Life of a web page request • From a user clicking event to the presentation of the page at the browser: Browsers Web Servers

Life of a web page request • From a user clicking event to the presentation of the page at the browser: • time to send to and receive from web server through Internet Browsers • ½ NetTime Web Servers

Life of a page request • From a user clicking event to the presentation of the page at the browser: • time to send to and receive from web server through Internet • time to generate HTML at web server Browsers • ½ NetTime Web Servers • GenTime

Life of a page request • From a user clicking event to the presentation of the page at the browser: • time to send to and receive from web server through Internet • time to generate HTML at web server Browsers • NetTime Web Servers • GenTime

Life of a page request • From a user clicking event to the presentation of the page at the browser: • time to send to and receive from web server through Internet • time to generate HTML at web server • time to download multimedia contents and to render the page at browser Browsers Content Distribution Network (CDN) Render Time • NetTime Web Servers • GenTime

Web service: a distributed system • Questions: • Scalability • Consistency • Performance Browsers Content Distribution Network (CDN) Render Time • NetTime Web Servers • GenTime

Web Performance Optimization

Performance impacts business • Google: +400ms => -0.6% daily search per user • Bing: +500ms => -1% click, +1.2 sec time to click (2x) • Facebook: user session time stays the same, faster => more page views

Status of web performance optimization • Decade-old protocols: TCP, HTTP, HTML • Fast-changing industry: browser war • Optimization is mostly witchcraft • Some rules emerging (e.g. Steve Souder’s books) • Large scale optimization is difficult

Web performance: TCP / HTTP / HTML • TCP overhead: 3-way handshake, slow start => persistent conn • TCP performance: depends on loss rate and distance => multiple conn • HTTP persistent connection: maximum 2 per server • HTTP Pipeline:failed, sequential loading • HTML: very limited description for priority/concurrency • => Big resources, multiple connections are preferred

Web performance: Browsers • most browsers blocked on JavaScript download / execution; some blocked on Stylesheet • Stylesheet application might be slow • JS execution might be slow • Maximum connections per browser • => The less the better

Web performance: “best practice rules” • Minimize HTTP requests • Gzip resources • Preload Components • Reduce DNS Lookup • Split Requests Across Domain

Web performance: trade-offs • Minimize HTTP requests  Increased unused bytes • Gzip resources  Increased CPU utilization • Preload Components  Increased bandwidth cost • Reduce DNS Lookup Split Requests Across Domain • Split Requests Across Domain  Reduce DNS Lookup

Web performance optimization • Improve speed • Reduce cost (both latency penalty and $ cost) • Operate on the equilibrium • Easy for a single page, hard for large scale systems

Performance optimization: challenges • Large scale • Viral growth • Agile development

Challenges: Large scale • Thousands of servers in multiple locations • Millions of users per minute • Billions of page views served per day • Hundreds of thousands of developers • Thousands of web resource components

Challenges: Viral growth • Concept of “Platform”: A single web page can have many features from many developers • A feature could be adopted by millions of users in a day • Features come and go

Challenges: Agile development • Fast release, allow mistake, fix quickly • Frequent code base changes • Continuous release

Web performance: the gap • Large scale • Viral growth • Agile development “Reduce # of HTTP Requests” Package JS/CSS; Sprite Image Large Scale Operation

Web performance: fill the gap • Large scale • Viral growth • Agile development “Reduce # of HTTP Requests” Package JS/CSS; Sprite Image Interface, Model, and Infrastructure Large Scale Operation

Adaptive Performance Optimization

Adaptive Performance optimization • Example: “Best Practices for Speeding Up You Web Site” – YUI Rule 1: Minimize HTTP Request • Combined files: package Javscript and Stylesheets • Sprite Images • …

Performance optimization: Case Study • Day 1: Some smart engineers start a project! <Print css tag for feature A> <Print css tag for feature B> <Print css tag for feature C> <print HTML of feature A> <print HTML of feature B> <print HTML of feature C> … “Let’s write a new page with features A, B and C!”

Day 2: Some smart engineers read YUI blog and says… <Print css tag for feature A> <Print css tag for feature B> <Print css tag for feature C> <print HTML of feature A> <print HTML of feature B> <print HTML of feature C> … “A & B & C are always used; let’s package them together!”

Day 2: Awesome! <Print css tag for feature A&B&C> <print HTML of feature A> <print HTML of feature B> <print HTML of feature C> …

Day 3: feature C evolves… <Print css tag for feature A & B & C> <print HTML of feature A> <print HTML of feature B> If (users_signup_for_C()) { <print HTML of feature C>} …

Day 3: <Print css tag for feature A & B & C> <print HTML of feature A> <print HTML of feature B> If (users_signup_for_C()) { <print HTML of feature C>} … A&B are always used, while C is not. ..

Day 4: feature C is deprecated <Print css tag for feature A & B & C> <print HTML of feature A> <print HTML of feature B> // no one uses C { <print HTML of feature C>} …

Day 4: we start to send unused bits <Print css tag for feature A & B & C> <print HTML of feature A> <print HTML of feature B> // no one uses C { <print HTML of feature C>} … It is hard to remember we should remove C here.

One months later… <Print css tag for feature A & B & C & D & E & F & G…> if (F is used) <print HTML of feature F> <print HTML of feature G> if (F is not used) { <print HTML of feature E>} … Thousands of dead CSS bytes in the package.

Performance Optimization: the gap • One months later… <Print css tag for feature A & B & C & D & E & F & G…> if (F is used) <print HTML of feature F> <print HTML of feature G> if (F is not used) { <print HTML of feature E>} … “Reduce # of HTTP Requests” Package JS/CSS; Sprite Image Large Scale Operation

Adaptive Static Resource Optimization Fill the gap: • Build interface to separate requirement declaration and delivery of static resources • Model the trade-off of optimization • Infrastructure to analyze requirement logs and globally optimize the delivery performance “Reduce # of HTTP Requests” Package JS/CSS; Sprite Image Interface, Model, and Infrastructure Large Scale Operation

Adaptive Packager: Designs

Interfaces • Back to Day 1: require_static(A_css); <render HTML of feature A> require_static(B_css); <render HTML of feature B> require_static(C_css);<render HTML of feature C> <deliver all required CSS> <print all rendered HTML> Separate Declaration from actual Delivery Requirement Declaration lives with HTML Global Optimization on Delivery

Packager: Global JS/CSS Optimization Offline analysis Online process require_static(A_css); <render HTML of feature A> require_static(B_css); <render HTML of feature B> require_static(C_css); <render HTML of feature C> <deliver all required CSS> <print all rendered HTML> Usage Pattern logs Packaging algorithm “Optimal” packages

Packager: Usage Pattern Logs Usage Pattern logs

Packager: Cost/Benefit Model • To package two files A & B: • “Cost”: for page requests that only uses A, we waste the bytes of B, vice versa • “Benefit”: for page requests that uses both A and B: we save one round trip • Bytes / Bandwidth ~ Latency • “Profit” to be maximized: Benefit – Cost

Packager: Cost/Benefit Model • Assume: latency = 40ms, and bandwidth = 1 Mbps • A+B: 40ms * 11.131M • No cost, pure gain. • Definitely package

Packager: Cost/Benefit Model • Assume: latency = 40ms, and bandwidth = 1 Mbps • B+C: 40ms * 11.1M • – 300B / 1Mbps * 0.031M • Benefit larger than cost • OK to package

Packager: Cost/Benefit Model • Assume: latency = 40ms, and bandwidth = 1 Mbps • B+D: 40ms * 1K • – 2K / 1Mbps * 11.13M • Cost larger than benefit • Don’t package

Packager: Optimal packages • Pkg 1: A, B, C • Pkg 2: E, F, G Usage Pattern logs Packaging algorithm “Optimal” packages

Packager: Cost/Benefit Model The weight (count) of each page load The usage matrix (0/1) on the left for m page loads and n resources The package definition (0/1 matrix). P[i,j] = 1 if resource j is in package i. The byte size of each resource

Packager: Cost/Benefit Model The usage matrix (0/1) on the left for m page loads and n resources The weight (count) of each page load Packages for each page load The byte size of each resource Byte size of each package The package definition (0/1 matrix)

Packager: Cost/Benefit Model The usage matrix (0/1) on the left for m page loads and n resources Byte Cost The weight (count) of each page load RT Gain The byte size of each resource The package definition (0/1 matrix)

Packager: Cost/Benefit Model The usage matrix (0/1) on the left for m page loads and n resources The weight (count) of each page load The byte size of each resource The package definition (0/1 matrix)

Packager: Algorithms • Greedy algorithm • Simulated Annealing • Clustering algorithms • … Packaging algorithm

Adaptive Static Resource Optimization Adaptive Packaging / Spriting • Cross-feature optimizations (e.g. search + ads) • Adaptive to change of user behaviors and code developments • Same methodology works for image spriting

Web Performance Optimization: scaling the witchcrafts

Web Performance Optimization: scaling the witchcrafts

Presentation Transcript

Deterministic Optimization Models

Search Engine Optimization

Process Optimization

Allometric scaling to predict pharmacokinetic and pharmacodynamic parameters in man

Dynamic Optimization 03.378 Dynamic Optimization BP/LP (nur für das Wahlpflichtfach Umweltökonomie) 2st Di 10-12, Geoma

Automatic Performance Tuning and Sparse-Matrix-Vector-Multiplication (SpMV)

Scaling and Bolting or Shotcrete

Optimizing NetScaler for Enterprise Applications

Profiling tools

What is Search Engine Optimization (SEO)?

C66x Code Optimization

Chapter 1: Introduction to Scaling Networks

QUERY OPTIMIZATION AND QUERY PROCESSING

Performance Management

Using random models in derivative free optimization

Optimization of Java-Like Languages for Parallel and Distributed Environments

(Nonlinear) Multiobjective Optimization

6 . Distributed Query Optimization

Automatic Performance Tuning of Sparse Matrix Kernels: Recent Progress

Scales and Scaling in Biology and Ecology

BGP 102: Scaling the Network

Automatic Performance Tuning Sparse Matrix Kernels