560 likes | 673 Views
Objective-Optimal Algorithms for Long-term Web Prefetching. Bin Wu & Ajay Kshemkalyani Dept. of Computer Science, Univ. of Illinois at Chicago ajayk@cs.uic.edu. Outline. Problem definition and background Web prefetching algorithms Performance metrics
E N D
Objective-Optimal Algorithms for Long-term Web Prefetching Bin Wu & Ajay Kshemkalyani Dept. of Computer Science, Univ. of Illinois at Chicago ajayk@cs.uic.edu
Outline • Problem definition and background • Web prefetching algorithms • Performance metrics • Objective-Greedy algorithms (O(n) time) • Hit rate greedy (also hit rate optimal) • Bandwidth greedy (also bandwidth optimal) • H/B greedy • H/B-Optimal algorithm (expected O(n) time) • Simulation results • Conclusions
Introduction Web caching reduces user-perceived latency • Client-Server mode • Bottleneck occurs at server side • Means of improving performance: • local cache, proxy server, server farm, etc. • Cache management: LRU, Greedy dual-size, etc. On-demand caching vs. (long-term) prefetching • Prefetching is effective in dynamic environments. • Clients subscribe to web objects • Server “pushes” fresh copies into web caches • Selection of prefetched objects based on long-term statistical characteristics, maintained by CDS
Introduction • Web prefetching • Caches web objects in advance • Updated by web server • Reduces retrieval latency and user access time • Requires more bandwidth and increases traffic. • Performance metrics • Hit rate • Bandwidth usage • Balance of the two
Object Selection Criteria • Popularity (Access frequency) • Lifetime • Good Fetch • APL
Web Object Characteristics • Access frequency • Zipf-like request model is used in web traffic modeling. • The relationship between access frequency p and popularity rank i of web object:
Web Object Characteristics • The generalized “Zipf’s-like” distribution of web requests is calculated as: • k is a normalization constant, iis the object ID (popularity rank), and α is a Zipf’s parameter: • 0.986 (Cunha et al.), • 0.75 (Nishikawa et al.) and • 0.64 (Breslau et al.)
Web Object Characteristics • Size of Objects • Average object size:10–15 KB. • No strong correlation between object size and its access frequency. • Lifetime of web objects • Average time interval between updates • Weak correlation between access frequency and lifetime.
Caching Architecture • Prefetching selection algorithms use as an input these global statistics: • Estimates of object reference frequencies • Estimates of object lifetimes • Content distribution servers cooporate to maintain these statistics • When an object is updated in the original server, the new version will be sent to any cache that has subscribed to it.
Solution space for web prefetching • Two extreme cases: • Passive caches (non-prefetching) • Least network bandwidth and lowest cache hit rate • Prefetching all objects • 100% cache hit rate • Huge amount of unnecessary bandwidth • Existing algorithms use different object-selecting criteria and fetch objects exceeding the threshold.
Steady State Properties • Steady state hit rate for object i is defined as freshness factor, f(i) • Overall hit rate: • Especially, (Venkataramani et al.)
Steady State Properties • Steady state bandwidth for object i • Total bandwidth: • Especially:
Objective Metrics • Hit rate – benefit • Bandwidth – cost • H/B model – balance of benefit and cost • Basic H/B • Enhanced H/B • (Jiang, et al.)
Existing Prefetching Algorithms • Popularity [Markatos et al.] • Keep the most popular objects in the system • Update these objects immediately when they change • Criterion – object’s popularity • Expected to achieve high hit rate • Lifetime [Jiang et al.] • Keep objects with longest lifetimes • Mostly consider the network resource demands • Threshold – the expected lifetime of object • Expected to minimize bandwidth usage
Existing Prefetching Algorithms • Good Fetch [Venkataramani et al.] • Computes the probabilitythat an object is accessed before it changes. • Prefetch objects with “high probability of being accessed during their average lifetime” • Prefetch object iif the probability exceeds threshold. • Objects with higher access frequencies and longer update intervals are more likely to be prefetched • Balance the benefit (hit rate increase) against the cost (bandwidth increase) of keeping an object.
Existing Prefetching Algorithms • APL [Jiang et al.] • Computes apl values of web objects. • apl of an objectrepresents “expected number of accesses during its lifetime” • Prefetch object iif its apl exceeds threshold. • Tends to improve hit rate; attempts to balance benefit (hit rate) against cost (bandwidth).
Existing Prefetching Algorithms • Enhanced APL • n>1, prefers objects with higher popularity (emphasize hit rate) • n<1, prefers objects with longer lifetime (emphasize network bandwidth)
Objective-Greedy Algorithms • Existing algorithms choose prefetching criteria based on intuitions • These intuitions are not aimed at any specific performance metrics • These intuitions consider only individual objects’ characteristics, not the global impact • None of them gave optimal performance based on any metric • Simple counter-examples can be shown
Objective-Greedy Algorithms • Objective-Greedy algorithms select criteria to intentionally improve performance based on various metrics. • E.g., Hit Rate-Greedy algorithm aims to improve the overall hit rate, thus, reduce the latency of object requests.
H/B-Greedy Prefetching • Consider the H/B value of on-demand caching: • If object j is prefetched, then H/B is updated to:
H/B-Greedy Prefetching • We define as the increase factor of object j, incr(j). • incr(j) indicates the amount by which H/B can be increased if object j is selected.
H/B-Greedy Prefetching • H/B-Greedy prefetching prefetches those m objects with greatest increase factors. • The selection is based on the effect on the hit rate by prefetching individual objects. • H/B-Greedy is still not an optimal algorithm in terms of H/B value.
Hit Rate-Greedy Prefetching • To maximize the overall hit rate given the number of objects to prefetch, m, we select the m objects with the greatest hit rate contribution: • This algorithm is optimal in terms of hit rate.
Bandwidth-Greedy Prefetching • To minimize the total bandwidth given m, the number of objects to prefetch, we select the m objects with least bandwidth contribution: • Bandwidth-Greedy Prefetching is optimal in terms of bandwidth consumption.
H/B-Optimal Prefetching • Optimal algorithm for H/B metric provided by a solution to the following selection problem. • This is equivalent to maximum weighted average problem with pre-selected items.
Maximum Weighted Average Maximum Weighted Average Problem: • Totally n courses, with different credit hours and scores • select m (m < n ) courses • maximize the GPA of m selected courses Solution: • If m=1 Then select course with highest score What if m>1? • A misleading intuition: select the m courses with highest scores.
A Course Selection Problem • If m=2 If we select the 2 courses with highest scores: C and B. then GPA: 93.33 But if we select C and D, then GPA: 93.57 • Question: how to select m courses such that the GPA is maximized? Answer: Eppstein & Hirschberg solved this
With Pre-selected items Maximum Weighted Average with pre-selected items: • Totally n courses, with different credit hours and scores • Course A and E (for example) must be selected, plus: • Select additional m (m is given, m<n) courses, such that: the resulting GPA is maximized
Pre-selection is not trivial • Selection domain B~I, no pre-selection, m=2 optimal subset: {B,C}, GPA: 88.33 • Selection domain B~I, A is pre-selected, m=2 one candidate subset: {A,D,H}, GPA: 75.61 better than: {A,B,C}, GPA: 70.625 Conclusion: {B,C} not contained in optimal subset for pre- selected problem.
H/B-Optimal v.s. Course selection • The problem is formulated as: Where v0=5.0*70+2.0*75=500, and w0=5.0+2.0=7.0, in the previous example. • Equivalent to H/B-Optimal selection problem:
H/B-Optimal algorithm design • The selection of m courses is not trivial • For course i, we define auxiliary function • And for a given number m, we define a Utility function
H/B-Optimal algorithm • Lemma 1 Suppose A* is the maximum GPA we are computing, then for any subset S’S and |S|=m Lemma 1 indicates that the optimal subset contains those courses that have the m largestri (A*) values
H/B-Optimal algorithm design • n=6, m=4 • Each line is ri (x) • Assume we know A* • Optimal subset has the 4 courses with largest ri (A*) values. • Dilemma: A* is unknown
Lemma 2: lemma 2 narrows range of A* (Xl , Xr) is the current A*-range H/B-Optimal algorithm design
H/B-Optimal algorithm design • If F (xl) > 0 and F (xr) < 0, then A* in (xl, xr) • Compute the value of F((xl+xr)/2) -if F((xl+xr)/2) > 0, then A* > (xl+xr)/2 - if F((xl+xr)/2) < 0, then A* < (xl+xr)/2 - if F((xl+xr)/2) = 0, then A* = (xl+xr)/2; (Lemma 2) • Narrow down the range of A* by half
H/B-Optimal algorithm design • Why keep on narrowing down the range of A*? • If intersection of rj (x) and rk (x) falls out of range, then the ordering of rj (x) and rk (x) is determined within the range, so is rj (A*) and rk (A*), by comparing their slopes. • If the range is narrow enough that there are no intersections of r (x) lines within the range then the total ordering of all r (A*) values is determined. • Now our optimal problem is solved: just select the m candidates with highest r (A*) values. • Main idea to solve this optimal problem.
H/B-Optimal algorithm design • However, the total ordering requires O(n2) time complexity • A randomized approach is used instead, this randomized algorithm: • Iteratively reduces the problem domain into a smaller one. • The algorithm maintains 4 sets: X, Y, E, Z, initially empty
H/B-Optimal algorithm design In each iteration, randomly selects a course i, and compare it with each of the other courses, k. There are 4 possibilities: 1). if rk(A*) > ri(A*): insert k into set X 2). if rk(A*) < ri(A*): insert k into set Y 3). if wk=wi and vk=vi: insert k into set E 4). if undetermined: insert k into set Z Now do the following loop: loop: narrow the range of A* by half compare ri(A*) with rk’(A*) for k’ in Z if appropriate, move k’ to X or Y, accordingly until |Z| is sufficiently small (i.e., |Z| < |S|/32)
H/B-Optimal algorithm design • The sets X or Y have enough members. • Next, examine and compare the sizes of X, Y and E:
H/B-Optimal algorithm design 1). If |X|+|E| > m: At least m courses whose r(A*) values are greater than r(A*) value of all courses in Y. All members in Y may be removed. Then: |S| = |S| - |Y|
H/B-Optimal algorithm design 2). If |Y|+|E| > |S|-m: All members in X are among the top m courses. All members in X must be in the optimal set. Collapse X into a single course (This course is included in the final optimal set). Then: |S| = |S| - |X| + 1; m = m - |X| + 1.
H/B-Optimal algorithm design • In either case, the resulting domain has reduced size. • By iteratively removing or collapsing courses, the problem domain finally has only one course remaining: a course formed by collapsing all courses in optimal set. • Complexity: Expected time complexity, briefly: (Assume Sb is the domain before iteration and Saafter.) 1). Each iteration takes expected time O(|Sb|) 2). Expected size |Sa| = (207/256) |Sb| The recurrence relation of the iteration: T(n) = O(n) + T[(207/256)n] Resolves to linear time complexity.
H/B-Greedy v.s. H/B-Optimal • H/B-greedy is an approximation to H/B-Optimal • H/B-greedy achieves higher H/B metric than any existing algorithms. • H/B greedy is more easy to implement than H/B-Optimal.
Simulation Results • Evaluation of H/B Greedy Prefetching • Figure 1: H/B,for total object number =1,000. • Figure 2: H/B,for total object number =10,000. • Figure 3: H/B,for total object number =100,000. • Figure 4: H/B,for total object number =1,000,000. • Evaluation of H-Greedy and B-Greedy algorithm • Figure 5: H-Greedy algorithm. • Figure 6: B-Greedy algorithm. • Figure 7: B-Greedy algorithm, zoomed in.