190 likes | 337 Views
Mining Web Logs for Prediction Models in WWW Caching and Prefetching Qiang Yang Haining Henry Zhang Carolina Ruiz. Professor: Wan-Shiou Yang Presenter: He-Min Chu Date: 2005/11/04. Outline. Introduction Previous Work In Proxy caching And Prefetching
E N D
Mining Web Logs for Prediction Models in WWW Caching and PrefetchingQiang Yang Haining Henry Zhang Carolina Ruiz Professor: Wan-Shiou Yang Presenter: He-Min Chu Date: 2005/11/04
Outline • Introduction • Previous Work In Proxy caching And Prefetching • Building Association-based Prediction Models • Experimental Results • Integrated Predictive Caching And Prefetching • Conclusions And Future Work
Introduction • WWW is growing fast, researchers need to contain network traffic --> web caching • Performance improvement Strategy • Web caching maintain highly efficient but small set of retrieved results in a cache. • Prefetch documents that are highly likely to occur in the near future.
Introduction (Con.) • That many web servers keep a server access log of its users. • Logs can be used to train a prediction model for future document accesses. • Obtain frequent access patterns in web logs and mine association rules for path prediction. • Using association-based prediction model into proxy caching and prefetching algorithms.
Previous Work In Proxy caching And Prefetching • “page replacement policy” : which a new page will replace an existing one. • Rank objects according to a key value computed by factors such as size, frequency and cost. When a replacement is to be made, lower-ranked objects will be evicted from the cache. • EX.GDSF as K(p)= L + F(p) * C(p) / S(p)
Previous Work In Proxy caching And Prefetching (Con.) • Previous work • prefetching popular documents • prefetch the referenced pages from hyperlinks • considering the frequency of accesses of the hyperlinks • This Work • extracts useful knowledge from large-scale web logs and application in web caching and prefetching.
Building Association-based Prediction Models • Extracting Embedded Objects • as images, audio and video files • Mining Frequent Sequences • accumulating the occurrence counts of sequence and pruning sequence with support lower than minimum support
Building Association-based Prediction Models (Con.) • Constructing Association Rules • S1S2…SK-1->SK (conf) • S1S2…SK-1->Oi (conf)
Building Association-based Prediction Models (Con.) • Prediction Algorithm
Experimental Results • future access frequency • Rank key value K(p)= L + ( W(p) + F(p) ) * C(p) / S(p)
Experimental Results (Con.) • Data logs source • EPA 24 hours • NASA 17 days • GDSP • Hit ratio : access_hits / access_times • Byte hit ratio : hit_bytes / access_bytes
Experimental Results (Con.) • N-gram-based algorithm outperforms the other algorithms using all of the selected cache sizes. • Users' access patterns are much more stable over this extended period of time.
Integrated Predictive Caching And Prefetching • Hit rate or byte hit rate does not increase as much as the cache size does. • Trade the minor hit rate loss in caching with the greater reduction of network latency in prefetching. • Almost all prefetching methods require a prediction model -> n-gram model
Integrated Predictive Caching And Prefetching (Con.) • Partition memory • cache-buffer • prefetch-buffer • A prefetching agent keeps pre-loading the prefetch-buffer with documents predicted to have the highest Wi. • If a hit occurs in the prefetch-buffer, the requested object will be moved into the cache-buffer according to original replacement algorithm.
Integrated Predictive Caching And Prefetching (Con.) • Reduce network latency
Integrated Predictive Caching And Prefetching (Con.) • Increase network loading
Conclusions And Future Work • Applied association rules minded from web logs to improve the GDSF algorithm. • By integrating path-based prediction caching and prefetching, it is possible to improve both the hit rate and byte hit rate. • Can extend by taking into account other statistical features such as the data transmission rates.