460 likes | 566 Views
Caching Dynamic Web Content: Designing and Analyzing an Aspect-Oriented Solution Sara Bouchenak – INRIA, France Alan Cox – Rice University, Houston Steven Dropsho – EPFL, Lausanne Sumit Mittal – IBM Research, India Willy Zwaenepoel – EPFL, Lausanne. Cache. HTTP request.
E N D
Caching Dynamic Web Content: Designing and Analyzing an Aspect-Oriented SolutionSara Bouchenak – INRIA, France Alan Cox – Rice University, Houston Steven Dropsho – EPFL, Lausanne Sumit Mittal – IBM Research, India Willy Zwaenepoel – EPFL, Lausanne
Cache HTTP request SQL req. SQL res. HTTP response Web tier Business tier Database tier Internet Database server Client Web server Application server Dynamic Web Content • Motivation for Caching • Represents large portion of web requests • Stock quotes, bidding-buying status on auction site, best-sellers on bookstore • Generation places huge burden on application servers
Caching Dynamic Web Content • Dynamic Content Not easy to Cache • Ensure consistency, invalidate cached entries due to updates • Write requests can modify entries used by read requests • Caching logic inserted at different points in the application • Entry and exit of requests, access to underlying database • Correlation between requests and their database accesses • Most solutions rely on “manually” understanding complex application logic
Our Contributions • Design a cache “AutoWebCache” that • Ensures consistency of cached documents • Insertion of caching logic transparent to application • Make use of aspect-oriented programming • Analysis of the cache • Transparency of injecting caching logic • Improvement in response time for test-bed applications
Cache Check Request info Database access Caching Logic Cache inserts, invalidations AutoWebCache HTTP request SQL req. Internet SQL res. Database server HTTP response Client Web server Application server Dynamic Web Caching – Solution Approach • Consistency • Correlation between read and write requests Web Page Cache • Transparency • Capture information flow
Outline • Design of AutoWebCache • Maintaining cache consistency • Determine relationship between reads and updates • Cache Structure • Aspectizing Web Caching • Insertion of caching logic transparently • Evaluation • Analysis of effectiveness, transparency • Conclusion
Maintaining Cache Consistency – Read Requests • Response to read-only requests cached • Read SQL queries recorded with cache entry Index: URI (readHandlerName + readHandlerArgs) Cached web page Associated Read Queries URI1 WebPage1 { Read Query 11, Read Query 12, ….} URI2 WebPage2 { Read Query 21, Read Query 22, ….} … …
No Invalidation WS RS Invalidation WS RS Maintaining Cache Consistency – Write Requests • Result not cached • Write SQL queries recorded • Intersect write SQL queries with read queries of cached pages • Invalidate if non-zero intersection
Remove Invalidating Cache Entries Index: URI (readHandlerName + readHandlerArgs) Cached web page Associated Read Queries URI1 WebPage1 { Read Query 11, Read Query 12, ….} URI2 WebPage2 { Read Query 21, Read Query 22, ….} URI3 WebPage3 { Read Query 31, Read Query 32, ….} URInWrite Query
Query Analysis Engine • Determines intersection between SQL queries • Three levels of granularity for intersection • Column based • Value based • Extra query based • Balance precision with complexity
UPDATE T SET T.c = 7 WHERE T.b = 10 UPDATE T SET T.a = 12 WHERE T.b = 10 Column Based Intersection Invalidate if Column_Read = Column_Updated a b c 5 8 7 1 10 9 SELECT T.a FROM T WHERE T.b = 8 Ok Invalidate
SELECT T.a FROM T WHERE T.b = 8 UPDATE T SET T.a = 7 WHERE T.b = 10 UPDATE T SET T.a = 12 WHERE T.b = 8 Value Based Intersection Invalidate if Rows_Read = Rows_Updated a b c 5 8 7 1 10 9 Invalidate with column-based Ok Invalidate
SELECT T.b FROM T WHERE T.c = 9 Extra Query Based Intersection Generate extra query to find missing values a b c 5 8 7 ?? 1 10 9 Invalidate with value-based SELECT T.a FROM T WHERE T.b = 8 Ok UPDATE T SET T.a = 3 WHERE T.c = 9
Outline • Design of AutoWebCache • Maintaining cache consistency • Determine relationship between reads and updates • Cache Structure • Aspectizing Web Caching • Insertion of caching logic transparently • Evaluation • Analysis of effectiveness, transparency • Conclusion
Cache Check Request info Database access Caching Logic Cache inserts, invalidations AutoWebCache HTTP request SQL req. Internet SQL res. Database server HTTP response Client Web server Application server Dynamic Web Caching – Solution Approach • Transparency • Capture information flow Web Page Cache
Aspect-Oriented Programming (AOP) • Modularize cross-cutting concerns - Aspects • Logging, billing, exception handling • Works on three principles • Capture the execution points of interest – Pointcuts (1) • Method calls, exception points, read/write accesses • Determine what to do at these pointcuts – Advice (2) • Encode cross-cutting logic (before/ after/ around) • Bind Pointcuts and Advice together – Weaving (3) • AspectJ compiler for Java
Original web application Caching library Weaving Rules Aspect Weaving (Aspect J) Cache-enabled web application version Insertion of Caching Logic
Cache check Capturing request entry Capturing request exit String cachedDoc = Cache.get (uri, inputInfo); if (cachedDoc != null) return cachedDoc; // Cache hit Capturing SQL queries Collecting dependency info Capture main Collect SQL query info Cache insert Cache.add(webDoc, uri, inputInfo, dependencyInfo); // Cache miss Aspectizing Read Requests Original code of a read-only request handler // Execute SQL queries … SQL query 1 SQL query 2 … // Generate a web document webDoc = … // Return the web document …
Capturing SQL queries Collecting invalidation info Collect SQL query info Capture main Capturing request exit Cache invalidation // Cache consistency Cache.remove(invalidationInfo); Aspectizing Write Requests Original code of a write request handler // Execute SQL queries … SQL query 1 SQL query 2 … … // Return
Capturing Servlet’s main Method // Pointcut for Servlets’ main methodpointcut servletMainMethodExecution(...) : execution( void HttpServlet+.doGet( HttpServletRequest, HttpServletResponse)) ||execution( void HttpServlet+.doPost( HttpServletRequest, HttpServletResponse)); • Pointcut captures entry and exit points of web request handlers • Cache Checks and Inserts for Read Requests • Invalidations for Update Requests
Weaving Rules for Cache Checks and Inserts // Advice for read-only requestsaround(...) : servletMainMethodExecution (...) { // Pre-processing: Cache check String cachedDoc; cachedDoc = ... call Cache.get of AutoWebCache if (cachedDoc != null) {... return cachedDoc } // Normal execution of the requestproceed(...); // Post-processing: Cache insert ... call Cache.add of AutoWebCache }
Weaving Rules for Cache Invalidations // Advice for write requestsafter(...) : servletMainMethodExecution (...) { // Cache invalidation ... call Cache.remove of AutoWebCache }
Weaving Rules for Collecting Consistency Information // Pointcut for SQL query callspointcut sqlQueryCall( ) : call(ResultSet PreparedStatement.executeQuery()) || call(int PreparedStatement.executeUpdate()); // Advice for SQL query callsafter( ) : sqlQueryCall ( ) { ... collect consistency info ...} • After each SQL query, note • Query template • Query instance values
Transparency of AutoWebCache • Ability to Capture Information Flow • Entry and exit points of request handlers • e.g. doGet(), doPost() APIs for Java Servlets • Modification to underlying data sets • e.g. JDBC calls for SQL requests • Multiple sources of dynamic behavior • Currently handle dynamic behavior from SQL queries • Need standard interfaces for all sources
Hidden State Problem … Number number = getRandom ( ); Image img = getImage (number); displayImage (img); request execution … • Request does not contain all information for response creation • Occurs when random nos., timers etc. used by application • Subsequent requests result in different responses • Duty of developer to declare such requests non-cacheable
Use of Application Semantics • Aspect-orientedness relies on code syntax • Cannot capture semantic concepts • In TPC-W application • Best Seller requests allows dirty reads for 30 sec • Conforms to specification clauses 3.1.4.1 and 6.3.3.1 • Application semantics can be used to improve performance • Best seller cache entry time-out set for 30 sec
Outline • Design of AutoWebCache • Maintaining cache consistency • Determine relationship between reads and updates • Cache Structure • Aspectizing Web Caching • Insertion of caching logic transparently • Evaluation • Analysis of effectiveness • Conclusion
Evaluation Environment • RUBiS • Auction site based on eBay • Browsing items, bidding, leaving comments etc. • Large number of requests that can be satisfied quickly • TPC-W • Models an on-line bookstore • Listing new products, best-sellers, shopping cart etc. • Small number of requests that are database intensive • Client Emulator • Client browser emulator generates requests • Average think time, session time conform to TPCW v1.8 specification • Cache warmed for 15 min, statistics gathered over 30 min
140 120 100 80 60 Response Time (ms) 40 20 0 0 200 400 600 800 1000 Number of Clients No cache AutoWebCache Response Time for RUBiS – Bidding Mix
Relative Benefits for different Requests in RUBiS 25 20 15 Percent of Requests 10 5 0 Put Bid Put Cmt Buy Now About Me View Bids View Item View User Search Rgn Search Cat Browse Cat Browse Rgn Request Type Hits Misses
10000 1000 100 Response Time (ms) 10 1 50 100 150 200 250 300 350 400 Number of Clients No cache AutoWebCache Optimization for Semantics Response Time for TPC-W – Shopping Mix
25 20 15 Percent of Requests 10 5 0 best sellers order display order inquiry new products product detail admin request search request execute search home interaction Request Type Hits based on app. semantics Hits Misses Relative Benefits for different Requests in TPC-W
Conclusion • AutoWebCache - a cache that • Ensures consistency of cached documents • Query Analysis • Insertion of caching logic transparent to application • Make use of aspect-oriented programming • Transparency of AutoWebCache • Well-defined, standard interfaces for information flow • Presence of hidden states • Use of application semantics
Column(s) Selected Table Concerned Predicate Condition Column(s) Updated SQL Query Structure SELECT T.a FROM T WHERE T.b=10 UPDATE T SET T.c WHERE 20 < T.d < 35
Response Time for RUBiS – Bidding Mix 140 120 100 80 Response time (ms) 60 40 20 0 0 200 400 600 800 1000 Number of Clients No cache AC column based AC value based AC extra query Hand-coded
Response Time for TPCW – Shopping Mix 10000 1000 Response time (ms) 100 10 1 0 50 100 150 200 250 300 350 400 450 Number of Clients No cache AC column based AC value based AC extra query Hand-coded
Remove If a Write Query invalidates ReadQueryTemplate1with instances values1a Cache Structure in AutoWebCache Index: SQL String <value vector, URI> pair Index: URI (readHandlerName + readHandlerArgs) Cached web page ReadQueryTemplate1 <instance values1a, URI1> <instance values1b, URI41> <instance values1c, URI57> URI1 WebPage1 ReadQueryTemplate2 <instance values2a, URI7> URI2 WebPage2 ReadQueryTemplate3 <instance values3a, URI12> … … … …
Evaluation • Analysis of AutoWebCache • Effect on performance of applications • Relation of application semantics to cache efficiency • Relative benefit of caching on different read-only requests • Usefulness of AOP techniques in implementing the caching system
350 300 250 200 Response Time (ms) 150 100 50 0 Put Bid Put Cmt Buy Now View Item About Me View Bids View User Search Cat Browse Cat Search Rgn Browse Rgn Request Type Breakdown of Response Times for Requests in RUBiS Overall avg. response time Extra time for a Miss (on top of overall response time)
350 300 250 200 Response Time (ms) 150 100 50 0 best sellers order inquiry order display product detail new products admin request execute search search request home interaction Request Type Extra time for a Miss (on top of overall response time) Overall avg. response time Breakdown of Response Times for Requests in TPC-W
Key Aspect-Oriented Programming Concepts • “Join points”identify executable points in system • Method calls, read and write accesses, invocations • “Pointcuts” allow capturing of various join points • “Advice” specifies actions to be performed at pointcuts • Before or after the execution of a pointcut • Encode the cross-cutting logic
Conclusion • Dynamic Content Not easy to Cache • Ensure consistency, invalidate cached entries as a result of updates • AutoWebCache – Query Analysis • Caching logic inserted at different points in the application • Entry and exit of requests, access to underlying database • Most solutions rely on understanding complex application logic • AutoWebCache – Transparent insertion of caching logic using AOP • Transparency affected by • Well-defined, standard interfaces for information flow • Presence of hidden states • Use of application semantics
Web Caching versus Query Caching • The two are complimentary • Web caching useful when app server is bottleneck • Documents can be cached nearer to the client, distributed • Can make use of application semantics with web page caching (best seller for TPC-W)