Meta algorithms for Hierarchical Web Caches

Meta algorithms for Hierarchical Web Caches Nikolaos Laoutaris Sofia Syntila Ioannis Stavrakakis {laoutaris,grad0585,ioannis}@di.uoa.gr Department of Informatics and Telecommunications University of Athens 15784 Athens, Greece

Introduction • The rapid growth of the Internet and the WWW have increased • The network traffic • The user-perceived latency • The load on web servers Caching has been employed in order to • Reduce access latency • Reduce bandwidth consumption • Server load balancing • Improved data availability

Contemporary hierarchical caches • characteristic of contemporary hierarchical caches: • Leave Copy Everywhere (LCE): a hit for a document at an l-level cache leads to the caching of the document in all intermediate caches, on the path towards the leaf cache that received the initial request. hit copy miss copy miss client request

New approach • We introduce three new Meta Algorithms that revise the standard behavior of hierarchical caches, by: • operating before and independently of the actual replacement algorithm running in each individual cache (hence the “Meta”) • keeping copies in a subset of intermed. caches instead of all • We compare these algorithms against • the de facto one (LCE) • the one proposed by Che, Tung and Wang (JSAC, Sep. 2002 ) • Additionally, we introduce a simple load balancing algorithm, based on the concept of meta algorithms

Advantages of the new algorithms • Significant reduction of average hit distance (delay/traffic reduction gain)over LCE in most cases • Suitable for storage constrained applications • Low complexity • Memoryless • Do not require additional information (e.g., object request frequencies etc.) • Little or no change to the protocols used to implement the existing hierarchical caches

The Prob algorithm • Each intermediate cache keeps a copy with probability p, and does not keep a copy with probability 1-p hit copy with probability p miss copy with probability p miss client request

The LCD algorithm • Leave a copy only at the cache that resides immediately below the location of the hit on the path to the requesting client. • Requires multiple requests to bring document to a leaf cache hit copy miss miss client request

The MCD algorithm • Similar to LCD with the difference that a hit at level-l moves the requested document to the underlying cache (whereas LCD copies the document). • deletes requested document from the cache where the hit occurred † delete hit copy miss miss client request †The document does not have to be physically deleted but rather be marked for eviction

The Filter algorithm (Che et al.) • Each cache is seen as a low-pass filter, with • a cutoff frequency given by the inverse of its characteristic time • the characteristic timeof cache m is approximated by: • = (current time –last access time of the replaced document ) Rarely requested objects are not cached thus their requests pass the filter thus flowing to upper levels A hit for document i at level l on behalf of client k leads to the caching of i in an intermediate cache m on the path to k, when m satisfies the condition: λkiis the frequency that client k requests document I Filter is non-memory-less (requires frequency estimation)

don’t leave copy leave copy The Filter algorithm (cont.) • When a document is evicted from a cache at level l the algorithm forces its caching at level l+1 (upwards)if not already cached there (this may lead to a domino effect) hit miss miss client request * Assume that caches (1,1),(2,1),(3,1) are full

Design Principles • Prob, LCD, MCD they take advantage of the following 3 design principles: • Avoid the amplification of replacement errors • Filter-out one-timer documents • Rationalize the degree of replication

1.Avoid the amplification of replacement errors • replacement error: when document i is evicted while there exists a document j that if evicted would lead to an improved hit ratio. • LCE: in an L-level hierarchical cache a request for an unpopular document leads to its caching in all L caches  L replacement errors  amplification of replacement errors • Prob,LCD, MCD reduce the extent of the amplification by reducing the number of copies triggered by a single request

2.Filter-out one-timer documents • Measured proxy workloads contain high percentage of so called one-timerdocuments • One-timers: documents that are requested only once • Caching a one-timer document leads to the worse type of replacement error that can occur • LCE: deprives popular documents of valuable storage capacity by allowing one-timers to clog all caches • LCD,MCD: one-timers cannot affect any cache other than the root cache • Prob: filters out one-timers by using a small p (cache probability)

3.Rationalize the degree of replication • LCE places copies in all intermediate caches to achieve 2 goals: • Have a nearby copy to service other clients connected to leaf caches • Have a “backup” copy for the requesting client in case its leaf copy is evicted • Storing a large number of replicas is not always beneficial. • When demand pattern is non-homogeneous • When storage capacity is limited • Prob, LCD, MCD create fewer copies, allowing for more distinct documents to be cached • This improves the exclusivity†of caches (Wong, Wilkes, Usenix 2002 ) • Exclusivity relates to the ability to avoid the ineffective caching the same documents at multiple levels †We would like to thank an anonymous IPCCC reviewer for bringing Wong and Wilke’s work to our attention

Synthetic Simulations • Zipf-like document popularity distribution (a=0.9) • Simulated hierarchical cache: regular Q-ary tree with L levels (Q=2,L=3) • Documents originate from an origin server (L+1 level) • Each client is co-located with a leaf cache • A client represents the population of an organization • Replacement policy at each cache: LRU • Storage capacity equally allocated to the caches • Further improvements if the dimensioning of the caches is optimized (Laoutaris et al., Information Processing Letters, March 2004)

Average hit distance for Prob Prob:[+]small p filters out more effectively one-timers [-] cost paid: slower convergence to steady state

Average hit distance for LCE,Prob,LCD,MCD …

… Average hit distance for LCE,Prob,LCD,MCD The following may be noted: • LCE has the worse performance • Prob(0.2) is ranking second across all S • MCD’s, LCD’s performance is always better than LCE and Prob • Filter, although non-memoryless, is outperformed by LCD and closely matched by MCD

Non-stationary demand … • Non-stationary document sets common in the web • Simulation scenario: every W reqs., M documents out of the total N that can be requested, are replacedby M new ones • Models volatility is user access patterns

… Non-stationary demand • Hit distance increases with the volatility (captured here by M) • LCE: for small M is the worst performer for large M outperforms all algorithms Why? LCE is able to track the new demand more quickly by requiring a single request to bring a new document to the leaf level • Prob,LCD,MCD,Filter require multiple requests to bring a copy of a new document to the leaf cache • However, the required volatility to make LCE better than the new algorithms is too high and is not typical of measured workloads which appear quite stable (Chen et al., JSAC, Aug. 2003)

Trace-driven Simulations • Description of traces: • traces were filtered to keep only requests for cacheable documents • 2 types of caches were studied: • Leaf caches (duration:one week) • UoA • NTUA • Root caches of the NLANR hierarchy (duration:one day) • Boulder,Colorado • Palo Alto • California • Pittsburgh, Pennsylvania • Urbana-Champaign • San Diego,California • Silicon Valley,California

Urbana-Champaignrequests: 815194, docs:279375, 1-timers: 72%

Silicon Valley, Californiarequests: 1299024, docs:726075, 1-timers: 82%

Boulder, Coloradorequests: 698691, docs:365060, 1-timers: 81%

Pittsburgh, Pennsylvaniarequests:709180, docs:405680, 1-timers: 84%

San Diego, Californiarequests: 193769, docs:94457, 1-timers: 83%

Palo Altorequests: 273511, docs:137497, 1-timers: 76%

UoArequests: 282540, docs:41088, 1-timers: 71%

NTUA requests: 580460, docs:234432, 1-timers: 73%

Results • Filter inferior to the best performing one, LCD, across all traces • Filter more complicated than LCD • Average hit distance (AHD) • AHD_Prob > AHD_MCD > AHD_LCD • LCE compared to LCD • is inferior under all six NLANR traces • almost as good under the UoA trace • slightly better under the NTUA trace • LCE performs better when S/N is large (S:storage, N: #of docs)

Load Balancing… • LCE gives rise to the “filtering effect” (Williamson, ACM ToIT, Feb. 2002) • The “filtering effect“: popular documents gather at the leaf caches It leads to: • Poor hit ratios at upper levels • The servicing of most of the requests at the lower level caches (causing load imbalance) A simple load balancing mechanism • Threshold based / fully distributed • Each cache • calculates its load • accepts new copies of documents only when its load is below the threshold • Some popular documents are denied admission to the leaf level thus reside only at upper levels  this allows for load to flow upwards solution ?

…Load Balancing • Load: we count as load only the requests that lead to hits (we neglected the relatively smaller load due to misses) • Nearby hit  small propagation delay! But!!! • This does not always lead to small total delivery delay  When?  when the low level cache is overloaded (then processing takes too long) • With the proposed load balancing mechanism: wesacrifice an increase of propagation delay to gain in terms of end-system processing delay

Simulations… • Load balancing may be applied to all discussed meta algorithms • Our experiments evaluate the effectiveness of LB mechanism : • Using trace data • Under the LCE algorithm • LCE-LB: variation of LCE that keeps copies at all intermediate caches provided that a cache has not reached its load threshold TH n: #of caches k: controls the intensity of the desired load balancing

LCE without LB

(2)LCE with LB (k=1) No change relative to the no-LB case (previous slide) LB becomes effective after k=2

(3)LCE with LB (k=8) The effect of LB becomes clear for k=8 With k=16 all levels get almost the same amount of load (see the paper for more results under several k)

Summary of LB related results • Previous figures show that: • As k  , load tends to be more evenly distributed among levels • Distribution of load under LCE-LB with k=1 is almost identical to one under LCE. Why? • Load constraint under k=1 is too loose • Load constraint almost equal to the maximum load that is assigned under LCE • Load balancing becomes effective for k>2 • Almost perfect LB for high values of k

The cost paid for having LB The average hit distance (propagation delay) increases with the intensity of LB (with k that is)

Conclusions • We introduced three new Meta Algorithms • We compared these algorithms against • the de facto one • the one proposed by Che, Tung and Wang • We showed that these algorithms are useful in a variety of situations • LCD (the best one) seems to be performing well under all studied scenarios • We introduced a simple load balancing algorithm, based on the concept of meta algorithms that deals effectively with the “filtering effect”

Post IPCCC work† • We have derived an approximate mathematical model for predicting the performance of LCE analytically • The model predicts accurately the actual performance and gives further insights as to why LCD outperforms LCE • We have shown that LCE performs better than the DEMOTE algorithm of Wong and Wilkes (not discussed in this paper) †Nikolaos Laoutaris, Hao Che, Ioannis Stavrakakis, "The LCD interconnection of LRU caches and its analysis," submitted work, 2004.

Meta algorithms for Hierarchical Web Caches

Meta algorithms for Hierarchical Web Caches

Presentation Transcript

Destage Algorithms for Disk Arrays with Non-Volatile Caches

Caches

Caches

Hierarchy-aware Replacement and Bypass Algorithms for Last-level Caches

Bypass and Insertion Algorithms for Exclusive Last-level Caches

Bypass and Insertion Algorithms for Exclusive Last-level Caches

Caches

Caches

Dynamic hierarchical algorithms for document clustering

Caches

Randomized Algorithms for Bayesian Hierarchical Clustering

Clustering Algorithms Meta Applier (CAMA) Toolbox

Caches

Evaluating Content Management Techniques for Web Proxy Caches

Evaluating Content Management Techniques for Web Proxy Caches

Meta Predicate Abstraction for Hierarchical Symbolic Heaps

Meta-learning for automatic selection of algorithms for text classification

Caches

Caches

Web Caches and CGI

Caches

Hierarchical Scheduling Algorithms