1 / 35

On the Sensitivity of Web Proxy Cache Performance to Workload Characteristics

This research paper discusses the sensitivity of web proxy cache performance to various workload characteristics and evaluates the effectiveness of synthetic web proxy workloads in improving the performance and scalability of the web.

lambc
Download Presentation

On the Sensitivity of Web Proxy Cache Performance to Workload Characteristics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. On the Sensitivity of Web Proxy Cache Performance to Workload Characteristics Mudashiru Busari Carey Williamson Department of Computer Science University of Saskatchewan

  2. Talk Outline • Introduction and Motivation • ProWGen: Proxy Workload Generator • Tool for Synthetic Web Proxy Workloads • Simulation Study • Simulation Evaluation of Web Proxy Caches • Conclusions and Future Work

  3. Introduction • “The Web is both a blessing and a curse…” • Blessing: • Internet available to the masses • Seamless exchange of information • Curse: • Internet available to the masses • Stress on networks, protocols, servers, users • Motivation: techniques to improve the performance and scalability of the Web

  4. Why is the Web so slow? • Client-side bottlenecks (PC, modem) • Solution: better access technologies • Server-side bottlenecks (busy Web site) • Solution: faster, scalable server designs • Network bottlenecks (Internet congestion) • Solutions: caching, replication; improved protocols for client-server communication

  5. Our Previous Work • Evaluation of Canada’s national Web caching infrastructure for CANARIE’s CA*net II backbone • Workload characterization and evaluation of CA*net II Web caching hierarchy (IEEE Network, May/June 2000) • Developed Web proxy caching simulator for trace-driven simulation evaluation of Web proxy caching architectures

  6. CA*net II Web Caching Hierarchy (Dec 1998) (selected measurement points for our traffic analyses; 3-6 months of data from each) USask CANARIE (Ottawa) To NLANR

  7. Caching Hierarchy Overview Cache Hit Ratios Top-Level/International (20-50 GB) 5-10% Proxy (empirically observed) Proxy National (10-20 GB) Proxy 15-20% Regional/Univ. (5-10 GB) 30-40% Proxy Proxy Proxy ... ... C C C C C C C

  8. Overview of This Paper • Constructed synthetic Web proxy workload generation tool (ProWGen) that captures the salient characteristics of empirical Web proxy workloads • Use ProWGen to evaluate sensitivity of proxy caches to selected Web proxy workload characteristics

  9. Research Methodology • Design, construction, and parameterization of aggregate workload models, based on empirical traces (Web proxy access logs) • Validation of ProWGen (statistically, and versus empirical workloads) • Simulation evaluation of single-level caches • Sensitivity to workload characteristics • Effect of cache size • Effect of cache replacement policy

  10. ProWGen:Key Workload Characteristics • “One-timers” (60-70% docs are useless!!!) • Zipf-like document referencing popularity • Heavy-tailed file size distribution (i.e., most files small, but most bytes are in big files) • Correlations (if any) between document size and document popularity (debate!) • Temporal locality (temporal correlation between recent past and near future references) [Mahanti et al. Perf.Eval. 2000]

  11. ProWGen (Conceptual View) ProWGen Software Input Parameters Synthetic Workload 1 Z a c L

  12. ProWGen (Conceptual View) Zipf P r ProWGen Software Input Parameters Synthetic Workload 1 Z a c L

  13. Zipf P r ProWGen (Conceptual View) ProWGen Software Input Parameters Synthetic Workload 1 Z a c L

  14. Zipf LLCD P F r s ProWGen (Conceptual View) ProWGen Software Input Parameters Synthetic Workload 1 Z a c L

  15. Zipf LLCD P F Correlation r s -1 0 +1 ProWGen (Conceptual View) ProWGen Software Input Parameters Synthetic Workload 1 Z a C L

  16. ProWGen: Workload Modeling Details • Modeled workload characteristics • One-time referencing • Zipf-like referencing behaviour (Zipf’s Law) • File size distribution • Body – lognormal distribution • Tail – Pareto Distribution • Correlation between file size and popularity • Temporal locality • Static probabilities in finite-size LRU stack model • Dynamic probabilities in finite-size LRU stack model

  17. Validation of ProWGen • To establish that the synthetic workloads possess the desired characteristics (quantitative and qualitative), and that the characteristics are similar to those in empirical workloads • Example: analyze 5 million requests from a proxy server trace and parameterize ProWGen to generate a similar workload

  18. Parameter Value Total number of requests Unique documents (of total requests) One-timers (of unique documents) Zipf slope Tail Index Documents in the tail Beginning of the tail (bytes) Mean of the lognormal file size distribution Standard deviation Correlation between file size and popularity LRU Stack Model for temporal locality LRU Stack Size 5,000,000 34% 72% 0.807 1.322 22% 10,000 7,000 11,000 Zero Static and Dynamic 1,000 Workload Synthesis

  19. Zipf-like Referencing Behaviour Empirical Trace Slope = 0.81 Synthetic Trace Slope = 0.83

  20. References Bytes transferred Transfer Size Distribution

  21. Simulation Evaluation ofSingle-Level Web Proxy Caches:Some Research Questions • In a single-level proxy cache, how sensitive is Web proxy caching performance to certain workload characteristics (one-timers, Zipf slope, heavy-tail index)? • How does the degree of sensitivity change depending on the cache replacement policy?

  22. Simulation Model Aggregate Workload Proxy server Web Servers Web Clients

  23. Experimental Design: Factors and Levels • Cache size • 1 MB to 32 GB • Cache Replacement Policy • Recency-based LRU • Frequency-based LFU-Aging • Size-based GD-Size • Workload Characteristics • One-timers, Zipf slope, tail index, correlation, temporal locality model

  24. Performance Metrics • Document Hit Ratio • Percent of requested docs found in cache (HR) • Byte Hit Ratio • Percent of requested bytes found in cache (BHR)

  25. Simulation Results (Preview) • Cache performance is very sensitive to: • Slope of Zipf-like doc referencing popularity • Temporal locality property • Correlations between size and popularity • Cache performance relatively insensitive to: • One-timers • Tail index of heavy-tailed file size distribution

  26. Sensitivity to One-timers (LRU) (a) Doc Hit Ratio (a) Byte Hit Ratio

  27. Sensitivity to Zipf Slope (LRU) Difference of 0.2 in Zipf slope impacts performance by as much as 10-15% in hit ratio and byte hit ratio (a) Hit Ratio (b) Byte Hit Ratio

  28. Sensitivity to Heavy Tail Index (LRU Replacement Policy) (a) Doc Hit Ratio (b) Byte Hit Ratio

  29. Sensitivity to Heavy Tail Index (GD-Size Replacement Policy) Difference of 0.2 in heavy tail index impacts performance by less than 3% (a) Hit Ratio (a) Byte Hit Ratio

  30. Sensitivity to Correlation (LRU) (a) Doc Hit Ratio (a) Byte Hit Ratio

  31. Sensitivity to Temporal Locality (LRU) (a) Doc Hit Ratio (b) Byte Hit Ratio

  32. Summary: Single-Level Caches • Cache performance is sensitive to: • Slope of Zipf-like document referencing popularity (steeper slope implies better caching) • Temporal locality • Correlation between size and popularity • Cache Performance is insensitive to: • One-timers • Tail index of heavy-tailed file size distribution

  33. Conclusions • ProWGen is a useful tool for the generation of synthetic Web proxy workloads for the evaluation of Web proxy caches and Web proxy caching architectures • Web proxy cache performance is quite sensitive to Zipf slope, temporal locality, and correlations (if any) between document size and document popularity

  34. Future Work • Extend and improve ProWGen • Request arrival process (timestamps) • File modifications, types, and lifetimes • Web page structure (spatial locality) • Scaling the workload model(s)... • Evaluate multi-level Web proxy caches • Port to network emulation testbed

  35. For More Information... • M. Busari, “Simulation Evaluation of Web Caching Hierarchies”, M.Sc. Thesis, Dept of Computer Science, U. Saskatchewan, June 2000 • ProWGen tool: • http://www.cs.usask.ca/faculty/carey/software/ • Email: carey@cs.usask.ca • http://www.cs.usask.ca/faculty/carey/

More Related