1 / 75

On the Scale and Performance of Cooperative Web Proxy Caching

On the Scale and Performance of Cooperative Web Proxy Caching. A. Wolman, G.M. Voelker, N. Sharma, A. Karlin, H.M. Levy Washington University. Web Caching. http://l4ka.org/. Miss. Hit. Internet. http://l4ka.org/. Why Web Caching?!. Reduce download time Reduce internet bandwidth usage

frieda
Download Presentation

On the Scale and Performance of Cooperative Web Proxy Caching

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. On the Scale and Performance of Cooperative Web Proxy Caching A. Wolman, G.M. Voelker, N. Sharma, A. Karlin, H.M. Levy Washington University

  2. Web Caching http://l4ka.org/ Miss Hit Internet http://l4ka.org/

  3. Why Web Caching?! • Reduce download time • Reduce internet bandwidth usage  Save money

  4. Cooperative Caching • Sharing and coordination of cache state among multiple communicating caches • Request miss occurs • Local cache transfer the request to other nodes • Requests server…

  5. Cooperative Caching • A proxy forwards a missing request to others to determine if: • Another proxy holds the requested document • That document can be returned faster • Their distance (inter-proxy communication latency) • The client population served

  6. Cooperative Caching • Ideal Goal • The collection of cooperative caches achieve the hit rate of a ideal single proxy acting over the combined population of all the proxies • In Reality • Performance will be less • Proxies will not have perfect knowledge • Proxies will pay the overhead • Inter-proxy communication latency

  7. Cooperative Caches Overall Hit Rate?

  8. Cooperative Caching • Hierarchical • Hierarchy of cache levels • Client’s request is forwarded up • Distributed • Hash function maps URL to one of caches • Peer-to-Peer • Eliminates the proxy servers • Meta-information stored in clients or the server

  9. Outline… • For what range of client populations can cooperative caching work effectively?

  10. Cache Traces Microsoft University of Washington Traces of same period of time

  11. Cache Traces • The University of Washington • 50,000 students, faculty and staff… • 200 small, independent organizations • Microsoft Corporation • 40,000 employees • Collected simultaneously May 7…14, 1999

  12. Overall Trace Statistics

  13. Simulation Methodology • Infinite sized caches • No expiration for objects • No compulsory misses (cold start) • Ideal vs. Practical Cache (cacheability)

  14. Request Hit-Rate / # Clients Caches with more than 2500 clients do not increase hit rates significantly!

  15. Byte Hit-Rate / # Clients (UW) Shared Objects are smaller on average than others!

  16. Object Request Latency More clients do not reduce object latency significantly.

  17. Bandwidth / # Clients There is no relation between number of clients and bandwidth utilization!

  18. Locality:Proxies and Organizations • University of Washington • Museum of Art and Natural History • Music Department • Schools of Nursing and Dentistry • Scandinavian Languages • Computer Science comparable to cooperating businesses

  19. Local and Global Proxy Hit rates

  20. Randomly populated vs. UW organizations Locality is minimal(about 4%)

  21. Large-scale Experiment Microsoft University of Washington 60K Clients 23K Clients

  22. Cooperative CachingMicrosoft + UW Disappointing…  Why?!?

  23. Very small increase… Why!?? • Unpopular documents are universally unpopular! • Unlikely that a miss in one of these large populations will find the document in the other population’s proxy!

  24. Even more disappointing…!  • For the most popular documents, cooperation does not help either…

  25. Summary and Conclusions • Cooperative caching with small population is effective (< 2500)…  • Can be handled by single server…  • Locality not significant…

  26. Caching Technologies for Web Applications Based on the presentation by C. Mohan from IBM Corp. Some of pictures from the reference presentation

  27. Outline • Introduction: Caching • Main principles • Different caching approaches • Focus: database caching in e-Commerce • Three case studies • Open issues and discussions

  28. Introduction • Motivation of caching in web applications: • Improving performance in many contexts • Processor caches in hardware • Buffer pools in DBMSs • Web proxy servers • Focus of the presentation: • e-commerce transactional/Database applications, not Internet search, etc.

  29. Main Features of e-Business Applications • Large number of users • Dominant loads can grow without natural limit • Users are customers (should get satisfied) • Multiple channels of access available for many applications • Scalability • Using caching to make it cheaper and faster • 24x365 availability • Manageability, and security are critical considerations

  30. Caching principles • What to cache? • Web pages, fragment of pages, SQL query result, data, execution result, … • Where place the cache? • Client, proxy, edge-of-net, ISP, edge-of-enterprise, application server, web server, DBMS • Caching and invalidation policies: • Push and pull, freshness maintenance, transparency, … • Other requirements to enable caching: • Routing, failover, authentication, …

  31. HTTP Caching • Multiple caches between browser and server • HTTP headers control • Whether or not to cache a page • TTL for caching • Full pages and images can be cached • Unable to cache HTML fragments

  32. Caching HTML fragments • When part of a page is too volatile to cache, the rest of the page can still be cached

  33. Goals of fragment caching • Achieve benefits of cache for personalized pages • Improved price/performance • Improved response time latency • Reducing cache storage requirements • By sharing common fragments among multiple pages

  34. Database caching • Corporate data is the backbone of many eCommerce applications • Existing approaches • Application-aware caching model • Suits app-specific data • Replication • Not dynamic enough to adapt to changing access patterns • Alternative: caching database data • Scalability • Reduced response time • Reduced cost of ownership • Improved throughput, congestion control, availability, QoS

  35. Classical Web Setup 1

  36. Classical Web Setup 2

  37. Web setup with Data Caching

  38. Edge server cache added

  39. Types of caching

  40. Middle-tier cache requirements • Application's SQL shouldn’t have to change • Application’s DB scheme shouldn’t have to change • Support failover of nodes • Support dynamic addition/deletion of app server nodes • Limits on update propagation latencies

  41. Middle-tier data model choices • Cloned: each table identical to a backend table • Pros: • DDL (Data Definition Language) definition is easy • Every query can be satisfied anywhere • Cons: • Updates need to reach multiple nodes • Lots of data copying during node addition • Subset: each table is a proper subset of a backend table • Pros: • Updates can be handled at minimal number of sites • Smaller DBs in cache nodes, increasing the performance • Cons: • More complex routing • Complexity in DDL spec and query processing • More complex updating

  42. Cache Refresh • Refresh brings new content or invalidates existing cache • Requirement: mutual consistency of related data, to guarantee transaction • Automatic • Time-driven • Immediate • Synchronous • Asynchronous • On-demand

  43. Update Approaches • Push • Reduced response time for first hit • Overwrite: less total path length than invalidation + pull • Pull • No need to cache everything • Cache upon access • Hottest stay longer in cache • Personalized data cache only where needed

  44. IBM Almaden’s DBCache Project

  45. DBCache Components • Seamless access to front-end data and back-end data • Front-end data a subset of back-end data • Queries can be partitioned with some going to front-end DB and rest going to backend DB • Data freshness guarantee • Challenges • Query optimization • Failure handling • Tracking changing workload

  46. NEC’s CachePortal • URL-query caching

  47. Case Study: IBM WebSphere Commerce Suite (WCS) Cache • Usually for caching catalog data • Admin influences what gets cached

  48. Case study: Olympics

  49. Case study: eBay • As of Oct 2000 • 18.9 million registered users • Millions of items in 4,500 categories • 600k new items added daily by users • 4M page views per month • Average usage per month by a user is 119.6 minutes • Caching system: • Backend: Sun machines running Oracle • Front-ends: Intel boxes running some specialized DBMS • Data cached in FEs refreshed every 30 minutes or so

More Related