Admission Control and Request Scheduling in Dynamic E-Commerce Web Sites

Admission Control and Request Scheduling in Dynamic E-Commerce Web Sites Sameh Elnikety, Erich Nahum, John Tracey, Willy Zwaenepoel C.S. Dept. EPFL IBM T.J.Watson Research Center

Dynamic Content 1 2 3

Increasing Online Commerce • $11B in 3rd Quarter 2002 (up 37%) • $11B in last 2 months of 2002 (up 40%) (Source: News.com)

Two Key Problems • Overloaded Web Sites: • The “Slashdot Effect” • Unanticipated load causes site to crash • Unresponsive Web Sites: • The “Abandoned Shopping Cart’’ • Unacceptable delays lead to reduced usage • Reduced usage leads to reduced $$$ How can we address these problems for dynamic sites?

Generating Dynamic Content Web Server Dynamic Content Generator Database Server http • Consists of 3 Components: • Web Server: static content • Dynamic Content Generator: Java servlets • DB Server: state of the business

Outline • Motivation & Background • The Gatekeeper Proxy • Admission Control • Request Scheduling • Experimental Environment • Results • Summary and Conclusions

Ideal Throughput Actual Load Admission Control • To prevent overload, perform admission control: • Notion of capacity in the system • Identify the job ahead of time & amount of work generated • Only let jobs in if they won’t overload system • Once you reach full capacity: • Make jobs wait • Drop jobs

The Gatekeeper Transparent Proxy Web Server Dynamic Content Generator Gate Keeper Database Server http • Transparently intercepts DB requests • connections to the DB via the JDBC interface • Maintains several measurement-based estimates: • Total capacity of the database • Current estimate of DB load • Work generated by each query type

Estimating Work by Query Type Web Server Dynamic Content Generator Gate Keeper Database Server http • Key Observations: • Queries of the same type take (roughly) the same time • Different queries differ greatly in execution time • Any web site has a finite number of query types • Gatekeeper maintains per-query work estimates

Service Time Distributions

TPC-W: Execution Times (note times are in log scale)

Estimating System Capacity Web Server Dynamic Content Generator Gate Keeper Database Server http • Query execution time = load or work units of a job • Database capacity = max # work units before overload • Rough approximation • Unit approximates resource usage • Use binary search to determine capacity • More elaborate methods (adaptive, control theoretic, etc)

Admission Control - Example Q3 Q2 Q1 1 700 Q3 Q2 Q1 2 695 Q1 Q3 3 195 Q2 Q3 Q2 4 200

Scheduling: Theory and Practice • Theory: SRPT scheduling is best • SRPT: shortest remaining processing time • Proven to have minimum response time (Schrage 68) • Perfect prediction of work costs • Pre-emption has zero overhead, does not affect service time • Practice: not so simple • Pre-emption isn’t free (context switch costs, cache affinity) • Priorities and inheritance • Deadlock (e.g., Q1 is holding a lock when pre-empted) • Gatekeeper: • Use shortest job first (SJF) policy • Once a job (query) is admitted, it is never pre-empted

Request Scheduling - Example (0+500) + (500+10) = 1010  505 (0+10) + (10+500) = 520  260 10 500 500 10

Outline • Motivation & Background • The Gatekeeper Proxy • Experimental Environment • Software & Hardware • Metrics & Methodology • Results • Summary and Conclusions

Workload Generation Requests • Workload generators typically used for experimental server performance evaluation • Many available for use with static content: • WebStone, SPECweb, SURGE, httperf, WaspClient • Only 1 available for e-Commerce: TPC-W Responses

TPC-W • Transaction Processing Council (TPC-W) • TPC more known for database workloads like TPC-D • Provides specification, not source • Use the implementation from Dynaserver project at Rice • Models a large e-commerce site: Amazon • Web serving, searching, browsing, shopping carts • Secure purchasing (SSL), best sellers, new products • Customer registration, administrative updates • Persistent data • Static images on Web Server • All others on back-end database

TPC-W: Snapshot Image Promo Shopping Cart Next Interaction

TPC-W: Interactions • 14 Interactions, e.g.: • Home (read-only query) • Best sellers (complex) • Secure payment (ssl) • Shopping cart (update query) • Workload Mixes • Browsing (95% read-only) • Shopping (80% read-only) • Ordering (50% read-only)

TPC-W: Queries SELECT c_uname FROM customer WHERE c_id = 10 SELECT i_id, i_title, a_fname, a_lname FROM item, author, order_line WHERE item.i_id = order_line.ol_i_id AND item.i_a_id = author.a_id AND order_line.ol_o_id > (SELECT MAX(o_id)-3333 FROM orders) AND item.i_subject = ‘ARTS’ GROUP BY i_id, i_title, a_fname, a_lname ORDER BY SUM(ol_qty) DESC FETCH FIRST 50 ROWS ONLY 3 ms 4000 ms

TPC-W: Frequencies

Software Web Server Dynamic Content Generator Database Server http

Hardware Apache Tomcat MySQL DB2 http sql

Emulated Clients Emulated Clients Apache Tomcat MySQL DB2 http sql • Remote Browser Emulator • Session duration • Think time • Markov model • Load is a function of the number of clients

Experiments • Performance Metrics: • Throughput (interactions/minute) • Response time (msec, submission to completion) • Examine each as a function of load (# of clients) • Examine two locking approaches: • Locking in the database (slower, more general) • Locking in the application server (faster, less general) • Methodology: • Average of 5 runs • Each run lasts 600 seconds • Measurement starts after 100 second warm-up • 90 % confidence intervals

Outline • Motivation & Background • The Gatekeeper Proxy • Experimental Environment • Results • Admission Control • Request Scheduling • Summary and Conclusions

Admission Control - Throughput

Admission Control - Explanation (Captured using systat utility on Linux)

Admission Control - Explanation • Memory Pressure • Clients 200 to 300 • Captured using Rabbit (Athlon performance counters) • L1 data cache miss increases 24% • L1 DTLB miss & L2 DLTB hit increases 25% • L1 DTLB miss & L2 DLTB miss increases 23% • Database Processes • Kernel linear and logarithmic overhead (e.g., maintain the ready queue) • Database logarithmic overhead (e.g., list operations, sorting, searching)

Throughput – DB Lock Contention

Throughput - DB2

Outline • Motivation & Background • The Gatekeeper Proxy • Experimental Environment • Results • Admission Control • Request Scheduling • Summary and Conclusions

Request Scheduling - Response Time

Response Time - DB Lock Contention

Request Scheduling - Explanation 10000 1 1 1 1 10000 10000 1 1 1 1 10000/5

Request Scheduling - Analysis • Same throughput, lower response time • Response time = Waiting time + Execution (service) time • Fairness • FIFO: all wait for same amount of time • SJF: favors short requests Q: How much are long jobs penalized?

Request Scheduling - Explanation • Short Job: “Exec Search” • Response time breakdown: • Service time unchanged • 400 ms • Waiting time reduced • 8000 ms -> 100 ms • 80x difference!

Request Scheduling - Explanation • Long Job: “Admin Response” • Response time breakdown: • Service time unchanged • 4800 ms • Waiting time increases • 12890 ms -> 15621 ms • Wait time increases 21 % • Response time increases 13 %

Request Scheduling - Explanation • Average over all requests • Response time breakdown: • Service time unchanged • 428 ms • Waiting time decreases • 8856 ms -> 225 ms

Preventing Starvation Aging mechanism, locking in App Server

Preventing Starvation Aging mechanism, locking in DB

Related Work • Admission Control/QoS for Static Content Web Servers: • Bhatti99, Li00, Voigt01, Abdelzaher02, Pradhan02, Voigt02 • Identify content via IP addr, URL, Cookie • Provide throughput/resp. time/BW guarantees • Request Scheduling: • Crovella99, Bansal01, Schroeder02 • Use SRPT scheduling for static content servers • Better response time, reasonable fairness, better overload protection • Dynamic Content: • Dynaserver project at Rice/EPFL • Iyengar97, Challenger00: Fragments, dependency graphs, caching • Akamai Edge Side Includes

Summary • Presented the Gatekeeper Proxy • Transparent, DB-independent • Admission Control • Consistent performance during overload • Improves throughput 10 % • Request Scheduling using SJF • Improves response time 14 times • Penalizes long jobs only 13 %

Future Work • Workloads where application server is bottleneck • Place Gatekeeper in front of application server • Workload characterization • Get dynamic site traces from IGS • See if TPC-W is representative • System support for dynamic content • Use Linux profiling support to identify bottlenecks • Implement and evaluate improvements • Scaling issues in multiple-tiered Web sites • Content-aware back-end redirection

Thank You!

TPC-W Queries

TPC-W Resources (Shopping Mix) Conclusion: Bottleneck is DB Lock contention

Admission Control and Request Scheduling in Dynamic E-Commerce Web Sites

Admission Control and Request Scheduling in Dynamic E-Commerce Web Sites

Presentation Transcript

Web programming and e-commerce

E-Commerce and Web Security

CIS679: Scheduling, Resource Configuration and Admission Control

E-Commerce and the Web

Dynamic Scheduling

Dynamic Scheduling

Dynamic scheduling

Web and E-Commerce Strategy

DYNAMIC PATIENT ADMISSION SCHEDULING PROBLEM

Dynamic Scheduling Using Pools and Dynamic Pools in Dynamic Domains

Dynamic Scheduling

Dynamic Scheduling

Admission Control and Dynamic Adaptation for a Proportional-Delay DiffServ-Enabled Web Server

A Method for Transparent Admission Control and Request Scheduling in E-Commerce Web Sites

Dynamic scheduling

Web Pages, Web Sites, And E-Commerce

CIS679: Scheduling, Resource Configuration and Admission Control

Web e-Commerce

Web e-Commerce

Web Data Scraping from popular E-commerce Sites

Measuring Price Discrimination and Steering on E-commerce Web Sites

Dynamic Scheduling and Dynamic Percolation