440 likes | 535 Views
On Filter Effects in Web Caching Hierarchies. Carey Williamson Department of Computer Science University of Calgary. Introduction. “The Web is both a blessing and a curse…” Blessing: Internet available to the masses Seamless exchange of information Curse: Internet available to the masses
E N D
On Filter Effects inWeb Caching Hierarchies Carey Williamson Department of Computer Science University of Calgary
Introduction • “The Web is both a blessing and a curse…” • Blessing: • Internet available to the masses • Seamless exchange of information • Curse: • Internet available to the masses • Stress on networks, protocols, servers, users • Motivation: techniques to improve the performance and scalability of the Web
Why is the Web so slow? • Client-side bottlenecks (PC, modem) • Solution: better access technologies • Server-side bottlenecks (busy Web site) • Solution: faster, scalable server designs • Network bottlenecks (Internet congestion) • Solutions: caching, replication; improved protocols for client-server communication
Web Client Web Client Web Client Web Client Web Client Example of a Web Proxy Cache Web server Web server Web server Proxy server
Our Previous Work • Evaluation of Canada’s national Web caching infrastructure for CANARIE’s CA*net II backbone • Workload characterization and evaluation of CA*net II Web caching hierarchy (IEEE Network, May/June 2000) • Developed Web proxy caching simulator for trace-driven simulation evaluation of Web proxy caching architectures • Developed synthetic Web proxy workload generator called ProWGen [Busari/Williamson INFOCOMM 2001]
CA*net II Web Caching Hierarchy (Dec 1998) (selected measurement points for our traffic analyses; 6-9 months of data from each) USask CANARIE (Ottawa) To NLANR
Caching Hierarchy Overview Cache Hit Ratios Top-Level/International (20-50 GB) 5-10% Proxy (empirically observed) Proxy National (10-20 GB) Proxy 15-20% Regional/Univ. (5-10 GB) 30-40% Proxy Proxy Proxy ... ... C C C C C C C
Some Observationson Multi-Level Caching... • Caching hierarchy not very effective, due to a “diminishing returns” effect • Reason: workload characteristics change as you move up the caching hierarchy (due to filtering effects, etc) • Bigger caches aren’t really the answer • Better caching system design might be...
Research Goals • Develop better understanding of cache filter effects (intuitively, quantitatively) • Try to do something about it! • Idea #1: Try different cache replacement policies at different levels of hierarchy • Idea #2: Try partitioning cache content in overall hierarchy based on size or type to limit replication, etc.
Talk Overview • Background/Motivation • Understanding Cache Filtering Effects • Exploiting Cache Filtering Effects • Summary and Conclusions
Upper Level (Parent) Lower Level (Children) Proxy server Proxy server Proxy server Simulation Model Web Servers Web Clients
Experimental Methodology • Trace-driven simulation (empirical traces) • Multi-factor experimental design • Cache size • 1 MB to 32 GB • Cache Replacement Policy • Recency-based LRU (currently active docs) • Frequency-based LFU-Aging (popular docs) • Size-based GD-Size (favours smaller docs) • Analyze workload characteristics
Web Workload Characteristics • “One-timers” (60-70% docs are useless!!!) • Zipf-like document referencing popularity • Heavy-tailed file size distribution (i.e., most files small, but most bytes are in big files) • Zero correlations between document size and document popularity (debate!) • Temporal locality (temporal correlation between recent past and near future references) [Mahanti et al. PER 2000]
Zipf-Like Referencing • An intrinsic “power-law” relationship in the way that humans organize, access, and use information (e.g., library books, English words in text, movie rentals, Web sites, Web pages, ...) • Plot item popularity versus relative rank, on a log-log scale, results in straight line
Example: Zipf-Like Document Popularity Profile for UofS Trace
Quiz Time: What do you get AFTER the cache? (a) (b) (c)
Quiz Time: What do you get AFTER the cache? (a) (b) (d) (c)
Quiz Time: What do you get AFTER the cache? Answer: (c) (c)
Simulation Results for Input Workload Traces with Different Initial Zipf Slopes
Research Questions:Multi-Level Caches • In a multi-level caching hierarchy, can overall caching performance be improved by using different cache replacement policies at different levels of the hierarchy? • In a multi-level caching hierarchy, can overall performance be improved by keeping disjoint document sets at each level of the hierarchy?
Upper Level (Parent) Complete Overlap No Overlap Lower Level (Children) Partial Overlap (50%) Proxy server Proxy server Proxy server Simulation Model Web Servers Web Clients
Performance Metrics • Document Hit Ratio • Percent of requested docs found in cache (HR) • Byte Hit Ratio • Percent of requested bytes found in cache (BHR)
Parent Parent Children Children Experiment 1: Different Policies at Different Levels of the hierarchy (a) Hit Ratio (b) Byte Hit Ratio
Parent Children
Experiment 2:Sensitivity to Workload Overlap • The greater the degree of workload overlap amongst the child proxies, the greater the role for the parent cache • In the “no overlap” scenario, the parent cache has negligible hit ratios, particularly when child caches are large
Experiment 3:Size-based Partitioning • Partition files across the two levels of the hierarchy based on size (e.g., keep small files at the lower level and large files at the upper level) (or vice versa) • Three size thresholds for “small”... • 5,000 bytes • 10,000 bytes • 100,000 bytes
Small files at the lower level; Large files at the upper level Children Size threshold = 10,000 bytes Parent Size threshold = 5,000 bytes
Large files at the lower level; Small files at the upper level Children Parent Size threshold = 10,000 bytes Size threshold = 5,000 bytes
Summary: Multi-Level Caches • Different Policies at different levels • LRU/LFU-Aging at the lower level + GD-Size at the upper level provided improvement in performance • GD-Size + GD-Size provided better performance in hit ratio, but with some penalty in byte hit ratio • Size-threshold approach • small files at the lower level + large files at the upper level provided improvement in performance • reversing this policy offered no perf advantage
Conclusions • Existing multi-level caching hierarchies are not always that effective, due to cache filtering effects • “Heterogeneous” caching architectures may better exploit workload characteristics and improve Web caching performance
For More Information... • M. Busari, “Simulation Evaluation of Web Caching Hierarchies”, M.Sc. Thesis, Dept of Computer Science, U. Saskatchewan, June 2000 • C. Williamson, “On Filter Effects in Web Caching Hierarchies”, ACM Transactions on Internet Technology, 2002 (to appear). • Email: carey@cpsc.ucalgary.ca • http://www.cpsc.ucalgary.ca/~carey/