550 likes | 723 Views
Admission Control and Request Scheduling in Dynamic E-Commerce Web Sites. Sameh Elnikety, Erich Nahum, John Tracey, Willy Zwaenepoel. C.S. Dept. EPFL. IBM T.J.Watson Research Center. Dynamic Content. 1. 2. 3. Increasing Online Commerce. $11B in 3 rd Quarter 2002 (up 37%)
E N D
Admission Control and Request Scheduling in Dynamic E-Commerce Web Sites Sameh Elnikety, Erich Nahum, John Tracey, Willy Zwaenepoel C.S. Dept. EPFL IBM T.J.Watson Research Center
Dynamic Content 1 2 3
Increasing Online Commerce • $11B in 3rd Quarter 2002 (up 37%) • $11B in last 2 months of 2002 (up 40%) (Source: News.com)
Two Key Problems • Overloaded Web Sites: • The “Slashdot Effect” • Unanticipated load causes site to crash • Unresponsive Web Sites: • The “Abandoned Shopping Cart’’ • Unacceptable delays lead to reduced usage • Reduced usage leads to reduced $$$ How can we address these problems for dynamic sites?
Generating Dynamic Content Web Server Dynamic Content Generator Database Server http • Consists of 3 Components: • Web Server: static content • Dynamic Content Generator: Java servlets • DB Server: state of the business
Outline • Motivation & Background • The Gatekeeper Proxy • Admission Control • Request Scheduling • Experimental Environment • Results • Summary and Conclusions
Ideal Throughput Actual Load Admission Control • To prevent overload, perform admission control: • Notion of capacity in the system • Identify the job ahead of time & amount of work generated • Only let jobs in if they won’t overload system • Once you reach full capacity: • Make jobs wait • Drop jobs
The Gatekeeper Transparent Proxy Web Server Dynamic Content Generator Gate Keeper Database Server http • Transparently intercepts DB requests • connections to the DB via the JDBC interface • Maintains several measurement-based estimates: • Total capacity of the database • Current estimate of DB load • Work generated by each query type
Estimating Work by Query Type Web Server Dynamic Content Generator Gate Keeper Database Server http • Key Observations: • Queries of the same type take (roughly) the same time • Different queries differ greatly in execution time • Any web site has a finite number of query types • Gatekeeper maintains per-query work estimates
TPC-W: Execution Times (note times are in log scale)
Estimating System Capacity Web Server Dynamic Content Generator Gate Keeper Database Server http • Query execution time = load or work units of a job • Database capacity = max # work units before overload • Rough approximation • Unit approximates resource usage • Use binary search to determine capacity • More elaborate methods (adaptive, control theoretic, etc)
Admission Control - Example Q3 Q2 Q1 1 700 Q3 Q2 Q1 2 695 Q1 Q3 3 195 Q2 Q3 Q2 4 200
Scheduling: Theory and Practice • Theory: SRPT scheduling is best • SRPT: shortest remaining processing time • Proven to have minimum response time (Schrage 68) • Perfect prediction of work costs • Pre-emption has zero overhead, does not affect service time • Practice: not so simple • Pre-emption isn’t free (context switch costs, cache affinity) • Priorities and inheritance • Deadlock (e.g., Q1 is holding a lock when pre-empted) • Gatekeeper: • Use shortest job first (SJF) policy • Once a job (query) is admitted, it is never pre-empted
Request Scheduling - Example (0+500) + (500+10) = 1010 505 (0+10) + (10+500) = 520 260 10 500 500 10
Outline • Motivation & Background • The Gatekeeper Proxy • Experimental Environment • Software & Hardware • Metrics & Methodology • Results • Summary and Conclusions
Workload Generation Requests • Workload generators typically used for experimental server performance evaluation • Many available for use with static content: • WebStone, SPECweb, SURGE, httperf, WaspClient • Only 1 available for e-Commerce: TPC-W Responses
TPC-W • Transaction Processing Council (TPC-W) • TPC more known for database workloads like TPC-D • Provides specification, not source • Use the implementation from Dynaserver project at Rice • Models a large e-commerce site: Amazon • Web serving, searching, browsing, shopping carts • Secure purchasing (SSL), best sellers, new products • Customer registration, administrative updates • Persistent data • Static images on Web Server • All others on back-end database
TPC-W: Snapshot Image Promo Shopping Cart Next Interaction
TPC-W: Interactions • 14 Interactions, e.g.: • Home (read-only query) • Best sellers (complex) • Secure payment (ssl) • Shopping cart (update query) • Workload Mixes • Browsing (95% read-only) • Shopping (80% read-only) • Ordering (50% read-only)
TPC-W: Queries SELECT c_uname FROM customer WHERE c_id = 10 SELECT i_id, i_title, a_fname, a_lname FROM item, author, order_line WHERE item.i_id = order_line.ol_i_id AND item.i_a_id = author.a_id AND order_line.ol_o_id > (SELECT MAX(o_id)-3333 FROM orders) AND item.i_subject = ‘ARTS’ GROUP BY i_id, i_title, a_fname, a_lname ORDER BY SUM(ol_qty) DESC FETCH FIRST 50 ROWS ONLY 3 ms 4000 ms
Software Web Server Dynamic Content Generator Database Server http
Hardware Apache Tomcat MySQL DB2 http sql
Emulated Clients Emulated Clients Apache Tomcat MySQL DB2 http sql • Remote Browser Emulator • Session duration • Think time • Markov model • Load is a function of the number of clients
Experiments • Performance Metrics: • Throughput (interactions/minute) • Response time (msec, submission to completion) • Examine each as a function of load (# of clients) • Examine two locking approaches: • Locking in the database (slower, more general) • Locking in the application server (faster, less general) • Methodology: • Average of 5 runs • Each run lasts 600 seconds • Measurement starts after 100 second warm-up • 90 % confidence intervals
Outline • Motivation & Background • The Gatekeeper Proxy • Experimental Environment • Results • Admission Control • Request Scheduling • Summary and Conclusions
Admission Control - Explanation (Captured using systat utility on Linux)
Admission Control - Explanation • Memory Pressure • Clients 200 to 300 • Captured using Rabbit (Athlon performance counters) • L1 data cache miss increases 24% • L1 DTLB miss & L2 DLTB hit increases 25% • L1 DTLB miss & L2 DLTB miss increases 23% • Database Processes • Kernel linear and logarithmic overhead (e.g., maintain the ready queue) • Database logarithmic overhead (e.g., list operations, sorting, searching)
Outline • Motivation & Background • The Gatekeeper Proxy • Experimental Environment • Results • Admission Control • Request Scheduling • Summary and Conclusions
Request Scheduling - Explanation 10000 1 1 1 1 10000 10000 1 1 1 1 10000/5
Request Scheduling - Analysis • Same throughput, lower response time • Response time = Waiting time + Execution (service) time • Fairness • FIFO: all wait for same amount of time • SJF: favors short requests Q: How much are long jobs penalized?
Request Scheduling - Explanation • Short Job: “Exec Search” • Response time breakdown: • Service time unchanged • 400 ms • Waiting time reduced • 8000 ms -> 100 ms • 80x difference!
Request Scheduling - Explanation • Long Job: “Admin Response” • Response time breakdown: • Service time unchanged • 4800 ms • Waiting time increases • 12890 ms -> 15621 ms • Wait time increases 21 % • Response time increases 13 %
Request Scheduling - Explanation • Average over all requests • Response time breakdown: • Service time unchanged • 428 ms • Waiting time decreases • 8856 ms -> 225 ms
Preventing Starvation Aging mechanism, locking in App Server
Preventing Starvation Aging mechanism, locking in DB
Related Work • Admission Control/QoS for Static Content Web Servers: • Bhatti99, Li00, Voigt01, Abdelzaher02, Pradhan02, Voigt02 • Identify content via IP addr, URL, Cookie • Provide throughput/resp. time/BW guarantees • Request Scheduling: • Crovella99, Bansal01, Schroeder02 • Use SRPT scheduling for static content servers • Better response time, reasonable fairness, better overload protection • Dynamic Content: • Dynaserver project at Rice/EPFL • Iyengar97, Challenger00: Fragments, dependency graphs, caching • Akamai Edge Side Includes
Summary • Presented the Gatekeeper Proxy • Transparent, DB-independent • Admission Control • Consistent performance during overload • Improves throughput 10 % • Request Scheduling using SJF • Improves response time 14 times • Penalizes long jobs only 13 %
Future Work • Workloads where application server is bottleneck • Place Gatekeeper in front of application server • Workload characterization • Get dynamic site traces from IGS • See if TPC-W is representative • System support for dynamic content • Use Linux profiling support to identify bottlenecks • Implement and evaluate improvements • Scaling issues in multiple-tiered Web sites • Content-aware back-end redirection
TPC-W Resources (Shopping Mix) Conclusion: Bottleneck is DB Lock contention