1 / 75

Web Cache Replacements

Web Cache Replacements. 張燕光 資訊工程系 成功大學 ykchang@mail . ncku.edu.tw. Introduction. Which page to be removed from its cache? Finding a replacement algorithm that can yield high hit rate . Differences from traditional caching nonhomogeneity of the object sizes

gryta
Download Presentation

Web Cache Replacements

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Web Cache Replacements 張燕光 資訊工程系 成功大學 ykchang@mail.ncku.edu.tw

  2. Introduction • Which page to be removed from its cache? • Finding a replacement algorithm that can yield high hit rate. • Differences from traditional caching • nonhomogeneity of the object sizes • same frequency and different size, favor smaller objects if consider only hit rate, • Byte hit rate 2

  3. Introduction • Other consideration • transfer time cost • Expiration time • Frequency • Measurement metrics? • admission control? • When or how often to perform the replacement operations? • How many documents to remove? 3

  4. Measurement Metrics • Hit Rate (HR): • % requests satisfied by cache • (shows fraction of requests not sent to server) • Volume measures: • Weighted hit rate (WHR): Byte Hit Ratio • % client-requested bytes returned by proxy (shows fraction of bytes not sent by server) • Fraction of packets not sent • Reduction in distance traveled (e.g., hop count) • Latency Time 4

  5. Three Categories • Traditional replacement policies and its direct extensions: • LRU, LFU, … • Key-based replacement policies: • Cost-based replacement policies: 5

  6. Traditional replacement • Least Recently Used (LRU) evicts the object which was requested the least recently • prune off as many of the least recently used objects as is necessary to have sufficient space for the newly accessed object. • This may involve zero, one, or many replacements. 6

  7. Traditional replacement • Least Frequently used (LFU) evicts the object which is accessed least frequently. • Pitkow/Recker [78] evicts objects in LRU order, except if all objects are accessed within the same day, in which case the largest one is removed. 7

  8. Key-based Replacement • The idea in key-based policies is to sort objects based upon a primary key, break ties based on a secondary key, break remaining ties based on a tertiary key, and so on. 8

  9. Key-based Replacement • LRUMIN: • This policy is biased in favor of smaller sized objects so as to minimize the number of objects replaced. • Let the size of the incoming object be S. Suppose that this object will not fit in the cache. • If there are any objects in the cache which have size at least S, we remove the least recently used such object from the cache. • If there are no objects with size at least S, then we start removing objects in LRU order of size at least S/2, then objects of size at least S/4, and so on until enough free cache space has been created. 9

  10. Key-based Replacement • SIZE policy: • In this policy, the objects are removed in order of size, with the largest object removed first. • Ties based on size are somewhat rare, but when they occur they are broken by considering the time since last access. Specifically, objects with higher time since last access are removed first. 10

  11. Key-based Replacement • LRU-Threshold [2] is the same as LRU, but objects larger than a certain threshold size are never cached. • Hyper-G [78] is a refinement of LFU, break ties using the recency of last use and size. • Lowest Latency First [77] minimizes average latency by evicting the document with the lowest download latency first. 11

  12. Cost-based Replacement • Employ a potential cost function derived from different factors such as • time since last access, • entry time of the object in the cache, • transfer time cost, • object expiration time and so on. • GreedyDual-Size (GD-Size) associates a cost with each object and evicts object with the lowest cost/size. • Hybrid [77] associates a utility function with each object and evicts the one has the least utility to reduce the total latency. 12

  13. Cost-based Replacement • Lowest Relative Value evicts the object with the lowest utility value. • Least Normalized Cost Replacement (LCN-R) [70] employs a rational function of the access frequency, the transfer time cost and the size. • Bolot/Hoschka [10] employs a weighted rational function of transfer time cost, the size, and the time last access. 13

  14. Cost-based Replacement • Size-Adjusted LRU (SLRU) orders the object by ratio of cost to size and choose objects with the best cost-to-size ratio. • Server-assisted scheme models the value of caching an object in terms of its fetching cost, size, next request time, and cache prices during the time period between requests. It evicts the object of the least value. • Hierarchical GreedyDual (Hierarchical GD) does object placement and replacement cooperatively in a hierarchy. 14

  15. GreedyDual • GreedyDual is originally proposed by Young and Tarjan, concerned with the case when pages in a cache have the same size but incur different costs to fetch from a secondary storage • A value H is initiated with each cached page p when a page is brought into cache. • H is set to be the cost of bringing the page into the cache • the cost is always nonnegative. • (1) Page with the lowest H value (minH) is replaced and (2) then all pages reduce their H values by minH 15

  16. GreedyDual • If a page is accessed its H value is restored to the cost of bringing it into the cache • Thus the H values of recently accessed pages retain a larger portion of the original cost than those of pages that have not been accessed for a long time • By reducing the H values as time goes on and restoring them upon access the algorithm integrates the locality and cost concerns in a seamless fashion 16

  17. GreedyDual-Size • setting H to cost/size upon accesses to a document where cost is the cost of bringing the document and size is the size of the document in bytes • call this extended version as GreedyDualSize • The definition of cost depends on the goal of the replacement algorithm cost is set to • 1 if the goal is to maximize hit ratio • the downloading latency if the goal is to minimize average latency • network cost if the goal is to minimize the total cost 17

  18. GreedyDual-Size • Implementation: • Need to decrement all the pages in cache by Min(q) every time a page q is replaced, which may be very inefficient • Improved algorithm is in the next page • Maintaining a priority queue based on H • Handling a hit requires O(log k) time and • Handling an eviction requires O(log k) time since in both cases the queue needs update 18

  19. GreedyDual-Size • Algorithm GreedyDual (document p) • /* Initialize L  0 */ • If p is already in memory, • H(p)  L + cost(p)/size(p) • If p is not in memory, • while there is not enough room in memory for p, • Let L  min H(q) for all q in cache • Evict q such that H(q) = L • Put p into memory & set H(p)L+cost(p)/size(p) 19

  20. Hybrid Algorithm (HYB) • Motivated by Bolot and Hoschka's algorithm. • HYB is a hybrid of several factors, considering not only download time but also number of references to a document and document size. HYB selects for replacement the document i with the lowest value of the following expression: 20

  21. Wb Wn ( ) (np) Cs + bs Zp HYB • Utility function is defined as follows • Cs is the estimated time to connect to the server • bs is the estimated bandwidth to the server • Zp is the size of the document • np is the # of times document has been referenced • Wb and Wn are constants that set the relative importance of the variables bsand np, respectively 21

  22. Latency Estimation Algo. (LAT) [REF] • Motivated by estimating the time required to download a document, and then replace the document with the smallest download time. • Apply some function to combine (e.g., smooth) these time samples to form an estimate of how long it will take to download the document • keeping a per-document estimate is probably not practical. • Alternative: keep statistics of past downloads on a per-server basis, rather than a per-document basis. (less storage) • For each server j, the proxy maintains an • Clatj: estimated latency (time) to open connection to server • Cbwj: estimated bandwidth of the connection (in bytes/second), 22

  23. Latency Estimation Algo. (LAT) [REF] • When a new document is received from server, the connection establishment latency (sclat) and bandwidth for that document (scbw) are measured , the estimates are updated as follows: clatj = (1-ALPHA) clatj + ALPHA sclat cbwj = (1-ALPHA) cbwj + ALPHA scbw • ALPHA is a smoothing constant, set to 1/8 as it is in the TCP smoothed estimation of RTT • Let ser(i) denote the server on which document i resides, and si denote the document size. Cache replacement algorithm LAT selects for replacement the document i with the smallest download time estimate, denoted di: • di = clatser(i) + si/cbwser(i) 23

  24. Latency Estimation Algo. (LAT) • One detail remains: • a proxy runs at the application layer of a network protocol stack, and therefore would not be able to obtain the connection latency samples sclat. • Therefore the following heuristic is used to estimate connection latency. A constant CONN is chosen (e.g., 2Kbytes). Every document that the proxy receives whose size is less than CONN is used as an estimate of connection latency sclat. • Every document whose size exceeds CONN is used as a bandwidth sample as follows: scbw = download time of document – current value of clatj. 24

  25. Experiment 1 25

  26. Experiment 2 WB=8Kb,WN=0.9Kb,CONN=2Kb 26

  27. Lowest Relative Value (LRV) • time from the last access t: for its large influence on the probability of a new access • the probability of a new access conditioned to the time from the last access can be expressed as (1 - D(t)) • # of previous accesses i: this parameter allows the proxy to select a relatively small number of documents with a much higher probability of being accessed again • document size s: This seems to be the most effective parameter that make a selection among documents with only one access 27

  28. Distribution of interaccess times, D(t) 28

  29. Prob. Density function of interaccess times, d(t) 29

  30. t 2E6 Lowest Relative Value (LRV) • We compute the probability that a document is accessed again, Pr(i, t, s), as follows Pr(i, t, s) = P1(s)(1 - D(t)) if i = 1 Pr(i, t, s) = Pi (1 – D(t)) otherwise • Pi: conditional probability that a document is reference i+1 times given that it has been accessed i times • P1(s): Percentage of size s with at least 2 accesses • D(t): density distribution of times between consecutive requests to the same document, derived as • D(t) = 0.035log(t+1) + 0.45(1 - e ) 30

  31. Lowest Relative Value (LRV) 31

  32. Lowest Relative Value (LRV) 32

  33. Lowest Relative Value (LRV) the cache size is 500Mb percentage of wrong choices in discarding documents vs # of accesses issued to the document at the moment of the choice 33

  34. Lowest Relative Value (LRV) the cache size is 500Mb cumulative number of wrong choices in discarding documents vs # of accesses issued to the document at the moment of the choice 34

  35. Performance from Pei Cao • Use hit ratio, byte hit ratio, reduced latency and reduced hops • reduced latency = the sum of downloading latency for the pages that hit in cache as a percentage of the sum of all downloading latencies • reduced hops = the sum of the network costs for the pages that hit in cache as a percentage of the sum of the network costs of all Web pages • model network cost of each document as hops • Web server has hop value: 1 or 32; we assign 1/8 of servers with hop value 32 and 7/8 with hop value 1 • The hop value can be thought of either as the number of network hops traveled by a document or as the monetary cost associated with the document 35

  36. Performance from Pei Cao • GD-Size(1) sets cost of each document to be 1, thus trying to maximize hit ratio • GD-Size(packets) sets the cost for each document to 2+size/536, i.e. estimated number of network packets sent and received if a miss to the document happens • 1 packet for the request, 1 packet for the reply and size/536 for extra data packets assuming a 536-byte TCP segment size. • It tries to maximize both hit ratio and byte hit ratio • Finally GD-Size(hops) sets the cost for each document to the hop value of the document trying to minimize network costs 36

  37. Performance from Pei Cao • See Cao’s paper: page 4 37

  38. Bolot/Hoschka’s algorithm’96 • Consider following variables: • ti, time since the document was last referenced • Si, the size of the document • rtti, the time it took to retrieve the document • ttli, the time to live of the document (i.e. the expected time until the document will be updated at the remote site, which is also the time interval until the cached document becomes stale). • Assign a weight to each cached document i Wi = W(ti, Si, rtti, ttli). • W(ti, Si, rtti, ttli) = 1/ti, documents are replaced according to the time of last reference. This models the LRU algorithm. • With W(ti, Si, rtt i, ttl i) = Si, documents are cached on the basis of size only 38

  39. Bolot and Hoschka's algorithm • Proposed Weight function: • W(ti, Si, rtti, ttli) = (w1rtti+w2Si)/ttli + (w3 +w4)/ti • where w1, w2, w3 and w4 have constant value. • The second term on the right-hand side captures the temporal locality. • The first term captures the cost associated with retrieving documents (waiting cost, storage cost in the cache), while the multiplying factor 1/ttli indicates that the cost associated with retrieving a document increases as the useful lifetime of the document decreases. • ttli is the expiration time provided by web servers 39

  40. Bolot and Hoschka's algorithm • There remains to define parameters wi. • This goal might be to maximize the hit ratio, or to minimize the perceived retrieval time for a random user, or to minimize the cache size for a given hit ratio, etc. • expressed as a standard optimization problem, solved using variants of the Lagrange multiplier technique. • Authors uses the following algorithms • Algo 1: W(ti, Si, rtti, ttli) = w3/ti • Algo 2: W(ti, Si, rtti, ttli) = w1rtti+w2Si+(w3+w4 Si)/ti • We express W(ti, Si, rtti, ttli) in terms of bytes, and we take in all cases w1=5000 b/s, w2=1000, w3=10000 bs, and w4=10 s. 40

  41. Key-based Replacement (P.4) 41

  42. Key-based Replacement • Removal policies is a taxonomy defined in terms of a sorting procedure. Two phases: • First, it sorts documents in the cache according to one or more keys (e.g., primary key, secondary key, etc.). • Then it removes zero or more documents from the head of the sorted list until a criteria is satisfied. 42

  43. Williams’s Paper • Undergrad (U): • About 30 workstations in an undergraduate CS lab from April to October 1995 (190 days), containing 173,384 valid accesses requiring transmission of 2.19GB of static web documents and is representative of a group of clients working in close confines (within speaking distance). • Classroom (C): • 26 workstations in a classroom containing 30,316 valid accesses requiring transmission of 405.7MB of static documents. • tend to make requests when asked to do so by an instructor. • However results for workloads BR, BL, and G are upper bounds for what real proxies would experience, because a real proxy would probably not cache requests from clients in .cs.vt.edu to servers in .cs.vt.edu. • workload BR is representative of a cache that is positioned at the point of connection of the Virginia Tech campus to the Internet. Such a cache is useful because it avoids consuming bandwidth 43

  44. Williams’s Paper • Graduate (G): • at least 25 users, containing 46,834 valid accesses requiring transmission of 610.92MB of static web pages for most of the spring 1995 semester. • representative of clients in one department dispersed throughout a building in separate or in common work areas. • Remote Client Backbone Accesses (BR): • Every URL request appearing on the Ethernet backbone of domain .cs.vt.edu with a client outside that domain naming a Web server inside that domain for a 38 day period in September and October 1995, representing 180,132 requests requiring transmission of 9.61GB of static Web pages. • This workload may be representative of a few servers on one large departmental LAN serving documents to world-wide clients. 44

  45. Williams’s Paper • Local Client Backbone Accesses (BL): • Every URL request appearing on the Computer Science Department backbone with a client from in the department, naming any server in the world, for a 37 day period in September and October 1995, representing 53,881 accesses requiring transmission of 644.55MB of static Web pages. The requests are for servers both within and outside the .cs.vt.edu domain. 45

  46. Work load Days Accesses Size(Gb) %Refs %Bytes U 185 188,674 2.26 graphics graphics C 95 13,127 0.15 text graphics G 78 45,400 0.56 graphics graphics BR 37 227,210 9.38 graphics audio BL 37 91,188 0.64 graphics graphics Workload Summary (Paper) 46

  47. Experiment Overview • Trace-driven simulation • Compare removal policies, viewed as sorting problems • Answer: • 1. Maximum theoretical HR, WHR • 2. Best replacement policy • 3. Effectiveness of second level cache • 4. Effectiveness of partitioning cache by media type (Question raised by Kwan, McGrath, Reed, Nov. 95, IEEE Computer) 47

  48. Simulation Assumptions • Valid Access: • a legal request • document "passes" the cache (Simulate only requests with HTTP return code 200.) • Definition of hit: • In reality, a "hit" is either • proxy has doc, and doc estimated consistent • proxy has doc, doc estimated inconsistent, and CONDITIONAL-GET returns no doc(304 not modified) • But 3 workloads traces lack last-modified times. Thus we use alternate definition: • Hit = match in URL and size 48

  49. Simulation Assumptions • When URL in common log file has size zero: • If URL appeared earlier with non-zero size, use last size in simulation • Otherwise URL is probably a dynamic doc - don’t cache in simulation 49

  50. Exp 1: Max Theoretical HR, WHR • Simulate infinite cache (plot 7 day moving average) • Workload U (undergrad): • Seasonal variation (e.g., new students in fall access new URLs) • Cumulative HR=44.9%, WHR=31.4% • Workload C (classroom): • Did not show high hit rate as expected • Increased HR near exams • Workload BR (remote clients on backbone): • Hit rates over 90% due to proximity of proxy to servers 50

More Related