340 likes | 494 Views
Updating Web views distributed over wide area networks. Sidiropoulos Antonis Katsaros Dimitrios Aristotle Univ. of Thessaloniki , Greece. Presentation by: Katsaros Dimitrios. Web client. Origin Web server. 1. 1. 2. 2. INTERNET. 3. 4. 3. 4. CDN Cache Servers.
E N D
Updating Web views distributed over wide area networks Sidiropoulos Antonis Katsaros Dimitrios Aristotle Univ. of Thessaloniki, Greece Presentation by: Katsaros Dimitrios
Web client Origin Web server 1 1 2 2 INTERNET 3 4 3 4 CDN Cache Servers Content Distribution Networks
Content Distribution Networks • Advantages • prevention of the flush crowd problem • avoidance of network congestion • reduction of user-perceived latency • e.g., Akamai • launced in early 1999 • 12,000 servers • in 1,000 networks
Outline • Related work & Motivation • Proposed method • Preliminary performance evaluation • Conclusions & Future work
Presentation Outline • Related work & Motivation • Proposed method • Preliminary performance evaluation • Conclusions & Future work
Best-effort cache coherency • Lack of bandwidth to disseminate all updates • Many caches • Single point of updates generation
Related work • Static Web object caching/prefetching (Katsaros & Manolopoulos, ACM SAC’04) (Nanopoulos, Katsaros & Manolopoulos, IEEE TKDE’03) • Dynamic Web object caching/prefetching • cache plays the central role i.e., prefetching (Cho & Garcia-Molina, SIGMOD’00) and (Gal & Eckstein, J.ACM’01) • minimizing the bandwidth consumption and query latency in the presence of constraints on the age or accuracy of cached objects (Bright & Raschid, VLDB’02; Cohen & Kaplan, Computer Networks’02; Olston & Widom, SIGMOD’01) • strong cache coherence maintenance (Challenger, Iyengar & Dantzig, INFOCOM’99) • update dissemination, best-effort but with a single cache(Labrinidis & Roussopoulos, VLDB’01) • caches and sources cooperate, best effort caching, (Olston & Widom, SIGMOD’02) • optimal tranmission of updates, but fixed assumptions about update rates and transmission capabilities (Wang, Evans & Kwok, Information Systems Frontiers,’03)
Presentation Outline • Related work & Motivation • Proposed method • Preliminary performance evaluation • Conclusions & Future work
Web object freshness Freshness of object O over period [ti,tj] Freshness of database D with N objects
Weighted Web object freshness • The access pattern of Web objects is skewed • Objects with higher access rates contribute more to what is perceived as database freshness • For a database with N objects Oi each with popularity fOi the freshness is defined as :
Maintain best-effort coherency • Devise a sequence of update disseminations so as to maximize F(D,T) • Hence: The “best-effort” cache coherence maintenance is a nonpreemptive scheduling problem
FIFO scheduling • Assume that there are sufficient • network resources • processing resources • Use of the FIFO scheduling (First-Come-first-Served) • Visualize our scheduling problem with the 2-dimensional Gantt charts(Goemans & Williamson, SIAM Journal on Discrete Mathematics’00)
Example of updates • We have three pending refreshes in the server's queue, i.e., Refresh1, Refresh2 and Refresh3, which occurred with the order mentioned
11 popularity 1 8 6 2 4 2 3 cost 2 4 6 8 2-D Gantt chart for FIFO Divergence = 1 - Freshness = Area under the thick polygonal line = 64
11 popularity 1 8 6 2 4 2 3 cost 2 4 6 8 Can we do better ?
11 popularity 1 8 6 2 4 2 3 cost 2 4 6 8 Can we do better ?
3 11 popularity 2 8 6 4 1 2 cost 2 4 6 8 Yes ! Schedule the max(pop/cost) Divergence = 1 - Freshness = Area under the thick polygonal line = 58 (10% gains even for this small example)
Largest Slope Rule scheduling • Select for dissemination the update with the largest popularity/cost ratio • It can be proved that this rule is optimal • No longer optimal in the presence of dependencies • Very efficient heuristic even when there exist dependencies
Presentation Outline • Related work & Motivation • Proposed method • Preliminary performance evaluation • Conclusions & Future work
Parasol Node MasterCDN Parasol CPU CPU:0 Parasol Network Link CPU:1 CPU:2 Router Router Router Routers/Gateways Router Router Router CDN server 1 CDN server 2 CDN server n Simulated System Hardware
Relation updates Scheduler algorithm 4 Request for view update ViewUpdater Dispatcher 1 DB updates 3 2 5 6 DBMS Master CDN CDN1 updater CDN2 updater CDNn updater CDN1 CDN2 CDNn Simulated System Model
Node:MasterCDN Scheduler algorithm Pool of views to be updated Rel. Queue Relation update CPU:0 CPU:2 Dispatcher CPU:1 DBMS ViewUpdater Pool of views to transmit Pool of views to transmit Pool of views to transmit CDN1updater CDN2updater CDNnupdater masterCDN components
Methodology • Synthetic (sample CDN with 10 edge servers) • Synthetic data generator • Modeling network nodes, network bandwidth, size of documents, relations, views, view derivation hierarchy, update rates, popularity • Examine the impact of: • update rate • number of relations
Freshness vs. (#Rel, dep_density) Top: 100 Rels Left: Sparse dep. Right: Dense dep. Botom: 500 Rels
Presentation Outline • Related work & Motivation • Proposed method • Preliminary performance evaluation • Conclusions & Future work
Conclusions & Future work • Conclusions • we proposed a best-effort cache coherence maintenance scheme for the edge servers of a CDN • it is a pure push-based dissemination method • the scheme is based on the LSR scheduling algorithm • we presented preliminary results to justify its efficiency • Future work • Organize the edge serves into a (possibly) deep hierarchy, so as to parallelize the update dissemination
References • L. Bright and L. Raschid, Using Latency-Recency Profiles for Data Delivery on the Web, Proc. of the VLDB, pp. 550-561, 2002. • J. Challenger, A. Iyengar, and P. Dantzig, A Scalable System for Consistently Caching Dynamic Web Data, Proc. of the IEEE INFOCOM, 1999. • J. Cho and H. Garcia-Molina, Synchronizing a Database to Improve Freshness, Proc. of the ACM SIGMOD, pp. 117-128, 2000. • E. Cohen and H. Kaplan, Refreshment Policies for Web Content Caches, Computer Networks, 38(6), 795-808, 2002. • A. Gal and J. Eckstein, Managing Periodically Updated Data in Relational Databases: A Stochastic Modeling Approach, Journal of the ACM, 48(6), pp. 1141-1183, 2001. • M.X. Goemans and D.P. Williamson, Two-Dimensional Gantt Charts and a Scheduling Algorithm of Lawler, SIAM Journal on Discrete Mathematics, 13(3), pp. 281-294, 2000. • D. Katsaros and Y. Manolopoulos, Caching in Web Memory Hierarchies, Proc. of the ACM SAC, 2004. • A. Labrinidis and N. Roussopoulos, Update Propagation Strategies for Improving the Quality of Data on the Web, Proc. of the VLDB, 2001. • A. Nanopoulos, D. Katsaros and Y. Manolopoulos, A Data Mining Algorithm for Generalized Web Prefetching, IEEE Trans. on Knowledge and Data Engineering, 15(5), pp.1155-1169, 2003. • C. Olston and J. Widom, Adaptive Precision Setting for Cached Approximate Values, Proc. of the ACM SIGMOD, pp. 355-366, 2001. • C. Olston and J. Widom, Best-Effort Cache Synchronization with Source Cooperation, Proc. of the ACM SIGMOD, pp. 73-84, 2002. • J.W. Wang, D. Evans and M. Kwok, On Staleness and the Delivery of Web Pages, Information Systems Frontiers, 5(2), pp. 129-136, 2003.
Contact information Sidiropoulos Antonis Dept. of Informatics Aristotle University Thessaloniki, 54124, Greece asidirop@csd.auth.gr http://users.auth.gr/~asidirop Katsaros Dimitrios Dept. of Informatics Aristotle University Thessaloniki, 54124, Greece dkatsaro@csd.auth.gr http://skyblue.csd.auth.gr