560 likes | 575 Views
This paper explores objective-optimal algorithms for long-term web prefetching, including hit rate and bandwidth greedy algorithms, and evaluates their performance using simulation results. The aim is to reduce retrieval latency and user access time while minimizing bandwidth usage.
E N D
Objective-Optimal Algorithms for Long-term Web Prefetching Bin Wu & Ajay Kshemkalyani Dept. of Computer Science, Univ. of Illinois at Chicago ajayk@cs.uic.edu
Outline • Problem definition and background • Web prefetching algorithms • Performance metrics • Objective-Greedy algorithms (O(n) time) • Hit rate greedy (also hit rate optimal) • Bandwidth greedy (also bandwidth optimal) • H/B greedy • H/B-Optimal algorithm (expected O(n) time) • Simulation results • Conclusions
Introduction Web caching reduces user-perceived latency • Client-Server mode • Bottleneck occurs at server side • Means of improving performance: • local cache, proxy server, server farm, etc. • Cache management: LRU, Greedy dual-size, etc. On-demand caching vs. (long-term) prefetching • Prefetching is effective in dynamic environments. • Clients subscribe to web objects • Server “pushes” fresh copies into web caches • Selection of prefetched objects based on long-term statistical characteristics, maintained by CDS
Introduction • Web prefetching • Caches web objects in advance • Updated by web server • Reduces retrieval latency and user access time • Requires more bandwidth and increases traffic. • Performance metrics • Hit rate • Bandwidth usage • Balance of the two
Object Selection Criteria • Popularity (Access frequency) • Lifetime • Good Fetch • APL
Web Object Characteristics • Access frequency • Zipf-like request model is used in web traffic modeling. • The relationship between access frequency p and popularity rank i of web object:
Web Object Characteristics • The generalized “Zipf’s-like” distribution of web requests is calculated as: • k is a normalization constant, iis the object ID (popularity rank), and α is a Zipf’s parameter: • 0.986 (Cunha et al.), • 0.75 (Nishikawa et al.) and • 0.64 (Breslau et al.)
Web Object Characteristics • Size of Objects • Average object size:10–15 KB. • No strong correlation between object size and its access frequency. • Lifetime of web objects • Average time interval between updates • Weak correlation between access frequency and lifetime.
Caching Architecture • Prefetching selection algorithms use as an input these global statistics: • Estimates of object reference frequencies • Estimates of object lifetimes • Content distribution servers cooporate to maintain these statistics • When an object is updated in the original server, the new version will be sent to any cache that has subscribed to it.
Solution space for web prefetching • Two extreme cases: • Passive caches (non-prefetching) • Least network bandwidth and lowest cache hit rate • Prefetching all objects • 100% cache hit rate • Huge amount of unnecessary bandwidth • Existing algorithms use different object-selecting criteria and fetch objects exceeding the threshold.
Steady State Properties • Steady state hit rate for object i is defined as freshness factor, f(i) • Overall hit rate: • Especially, (Venkataramani et al.)
Steady State Properties • Steady state bandwidth for object i • Total bandwidth: • Especially:
Objective Metrics • Hit rate – benefit • Bandwidth – cost • H/B model – balance of benefit and cost • Basic H/B • Enhanced H/B • (Jiang, et al.)
Existing Prefetching Algorithms • Popularity [Markatos et al.] • Keep the most popular objects in the system • Update these objects immediately when they change • Criterion – object’s popularity • Expected to achieve high hit rate • Lifetime [Jiang et al.] • Keep objects with longest lifetimes • Mostly consider the network resource demands • Threshold – the expected lifetime of object • Expected to minimize bandwidth usage
Existing Prefetching Algorithms • Good Fetch [Venkataramani et al.] • Computes the probabilitythat an object is accessed before it changes. • Prefetch objects with “high probability of being accessed during their average lifetime” • Prefetch object iif the probability exceeds threshold. • Objects with higher access frequencies and longer update intervals are more likely to be prefetched • Balance the benefit (hit rate increase) against the cost (bandwidth increase) of keeping an object.
Existing Prefetching Algorithms • APL [Jiang et al.] • Computes apl values of web objects. • apl of an objectrepresents “expected number of accesses during its lifetime” • Prefetch object iif its apl exceeds threshold. • Tends to improve hit rate; attempts to balance benefit (hit rate) against cost (bandwidth).
Existing Prefetching Algorithms • Enhanced APL • n>1, prefers objects with higher popularity (emphasize hit rate) • n<1, prefers objects with longer lifetime (emphasize network bandwidth)
Objective-Greedy Algorithms • Existing algorithms choose prefetching criteria based on intuitions • These intuitions are not aimed at any specific performance metrics • These intuitions consider only individual objects’ characteristics, not the global impact • None of them gave optimal performance based on any metric • Simple counter-examples can be shown
Objective-Greedy Algorithms • Objective-Greedy algorithms select criteria to intentionally improve performance based on various metrics. • E.g., Hit Rate-Greedy algorithm aims to improve the overall hit rate, thus, reduce the latency of object requests.
H/B-Greedy Prefetching • Consider the H/B value of on-demand caching: • If object j is prefetched, then H/B is updated to:
H/B-Greedy Prefetching • We define as the increase factor of object j, incr(j). • incr(j) indicates the amount by which H/B can be increased if object j is selected.
H/B-Greedy Prefetching • H/B-Greedy prefetching prefetches those m objects with greatest increase factors. • The selection is based on the effect on the hit rate by prefetching individual objects. • H/B-Greedy is still not an optimal algorithm in terms of H/B value.
Hit Rate-Greedy Prefetching • To maximize the overall hit rate given the number of objects to prefetch, m, we select the m objects with the greatest hit rate contribution: • This algorithm is optimal in terms of hit rate.
Bandwidth-Greedy Prefetching • To minimize the total bandwidth given m, the number of objects to prefetch, we select the m objects with least bandwidth contribution: • Bandwidth-Greedy Prefetching is optimal in terms of bandwidth consumption.
H/B-Optimal Prefetching • Optimal algorithm for H/B metric provided by a solution to the following selection problem. • This is equivalent to maximum weighted average problem with pre-selected items.
Maximum Weighted Average Maximum Weighted Average Problem: • Totally n courses, with different credit hours and scores • select m (m < n ) courses • maximize the GPA of m selected courses Solution: • If m=1 Then select course with highest score What if m>1? • A misleading intuition: select the m courses with highest scores.
A Course Selection Problem • If m=2 If we select the 2 courses with highest scores: C and B. then GPA: 93.33 But if we select C and D, then GPA: 93.57 • Question: how to select m courses such that the GPA is maximized? Answer: Eppstein & Hirschberg solved this
With Pre-selected items Maximum Weighted Average with pre-selected items: • Totally n courses, with different credit hours and scores • Course A and E (for example) must be selected, plus: • Select additional m (m is given, m<n) courses, such that: the resulting GPA is maximized
Pre-selection is not trivial • Selection domain B~I, no pre-selection, m=2 optimal subset: {B,C}, GPA: 88.33 • Selection domain B~I, A is pre-selected, m=2 one candidate subset: {A,D,H}, GPA: 75.61 better than: {A,B,C}, GPA: 70.625 Conclusion: {B,C} not contained in optimal subset for pre- selected problem.
H/B-Optimal v.s. Course selection • The problem is formulated as: Where v0=5.0*70+2.0*75=500, and w0=5.0+2.0=7.0, in the previous example. • Equivalent to H/B-Optimal selection problem:
H/B-Optimal algorithm design • The selection of m courses is not trivial • For course i, we define auxiliary function • And for a given number m, we define a Utility function
H/B-Optimal algorithm • Lemma 1 Suppose A* is the maximum GPA we are computing, then for any subset S’S and |S|=m Lemma 1 indicates that the optimal subset contains those courses that have the m largestri (A*) values
H/B-Optimal algorithm design • n=6, m=4 • Each line is ri (x) • Assume we know A* • Optimal subset has the 4 courses with largest ri (A*) values. • Dilemma: A* is unknown
Lemma 2: lemma 2 narrows range of A* (Xl , Xr) is the current A*-range H/B-Optimal algorithm design
H/B-Optimal algorithm design • If F (xl) > 0 and F (xr) < 0, then A* in (xl, xr) • Compute the value of F((xl+xr)/2) -if F((xl+xr)/2) > 0, then A* > (xl+xr)/2 - if F((xl+xr)/2) < 0, then A* < (xl+xr)/2 - if F((xl+xr)/2) = 0, then A* = (xl+xr)/2; (Lemma 2) • Narrow down the range of A* by half
H/B-Optimal algorithm design • Why keep on narrowing down the range of A*? • If intersection of rj (x) and rk (x) falls out of range, then the ordering of rj (x) and rk (x) is determined within the range, so is rj (A*) and rk (A*), by comparing their slopes. • If the range is narrow enough that there are no intersections of r (x) lines within the range then the total ordering of all r (A*) values is determined. • Now our optimal problem is solved: just select the m candidates with highest r (A*) values. • Main idea to solve this optimal problem.
H/B-Optimal algorithm design • However, the total ordering requires O(n2) time complexity • A randomized approach is used instead, this randomized algorithm: • Iteratively reduces the problem domain into a smaller one. • The algorithm maintains 4 sets: X, Y, E, Z, initially empty
H/B-Optimal algorithm design In each iteration, randomly selects a course i, and compare it with each of the other courses, k. There are 4 possibilities: 1). if rk(A*) > ri(A*): insert k into set X 2). if rk(A*) < ri(A*): insert k into set Y 3). if wk=wi and vk=vi: insert k into set E 4). if undetermined: insert k into set Z Now do the following loop: loop: narrow the range of A* by half compare ri(A*) with rk’(A*) for k’ in Z if appropriate, move k’ to X or Y, accordingly until |Z| is sufficiently small (i.e., |Z| < |S|/32)
H/B-Optimal algorithm design • The sets X or Y have enough members. • Next, examine and compare the sizes of X, Y and E:
H/B-Optimal algorithm design 1). If |X|+|E| > m: At least m courses whose r(A*) values are greater than r(A*) value of all courses in Y. All members in Y may be removed. Then: |S| = |S| - |Y|
H/B-Optimal algorithm design 2). If |Y|+|E| > |S|-m: All members in X are among the top m courses. All members in X must be in the optimal set. Collapse X into a single course (This course is included in the final optimal set). Then: |S| = |S| - |X| + 1; m = m - |X| + 1.
H/B-Optimal algorithm design • In either case, the resulting domain has reduced size. • By iteratively removing or collapsing courses, the problem domain finally has only one course remaining: a course formed by collapsing all courses in optimal set. • Complexity: Expected time complexity, briefly: (Assume Sb is the domain before iteration and Saafter.) 1). Each iteration takes expected time O(|Sb|) 2). Expected size |Sa| = (207/256) |Sb| The recurrence relation of the iteration: T(n) = O(n) + T[(207/256)n] Resolves to linear time complexity.
H/B-Greedy v.s. H/B-Optimal • H/B-greedy is an approximation to H/B-Optimal • H/B-greedy achieves higher H/B metric than any existing algorithms. • H/B greedy is more easy to implement than H/B-Optimal.
Simulation Results • Evaluation of H/B Greedy Prefetching • Figure 1: H/B,for total object number =1,000. • Figure 2: H/B,for total object number =10,000. • Figure 3: H/B,for total object number =100,000. • Figure 4: H/B,for total object number =1,000,000. • Evaluation of H-Greedy and B-Greedy algorithm • Figure 5: H-Greedy algorithm. • Figure 6: B-Greedy algorithm. • Figure 7: B-Greedy algorithm, zoomed in.