470 likes | 660 Views
Proxy Caching: Duct Tape of the Internet. Origin Server. Proxy Cache. Object Store. Blake Scholl <bscholl@cmu.edu> Vishal Soni <vsoni@andrew.cmu.edu>. ((( ))). Please feel free to ask questions!. The Internet, circa 1996-1997. Stupid Website!. Election Results. Web Site. ISP.
E N D
Proxy Caching: Duct Tape of the Internet Origin Server ProxyCache Object Store Blake Scholl <bscholl@cmu.edu> Vishal Soni <vsoni@andrew.cmu.edu>
((( ))) Please feel free to ask questions!
The Internet, circa 1996-1997 Stupid Website! Election Results Web Site ISP Women in Lingerie ISP Web Site Where’s my paper Playboy?
Cache CDN The Internet, circa 1998-2001 Why does Bill get all the women? Starr Report Web Site ISP Contested Election! ISP Web Site When is Gore going to give up?
The Value of a Proxy • Reduced bandwidth consumption • Reduced access latency • Reduced overload on web servers • Improved reliability • Improved usage data collection
Reverse Proxying (“server accelerator”) Internet ProxyCache Web Server ProxyCache
ProxyCache Forward Proxying Web Server Internet Web Server ProxyCache Web Server
There are billions of unique pages on the Internet, totaling at least in the terabytes. • The total amount of data on the Internet is growing rapidly. • A proxy can hope to store only gigabytes of data. • How can forward caching ever work? Page requests are heavy-tailed! AOL’s caches see around 40% hit rates!
Proxy Jargon Origin Server ProxyCache End User Object Store
The Anatomy of a Proxy Origin Server Proxy Cache End User Server Personality Client Personality Object Store
Multilevel Caching Origin Server ProxyCache End User ProxyCache ProxyCache ProxyCache ProxyCache ProxyCache
Arbitrary Graphs of Caches! Origin Server ProxyCache ProxyCache ProxyCache ProxyCache ProxyCache ProxyCache Origin Server ProxyCache ProxyCache
Research Questions I • What cache architecture is good? • How do users find caches? • How do caches find upstream caches? • Where should I place the caches? • What data should I cache? • Should my caches cooperate? How?
Research Questions II • Should I prefetch data from servers? • What gets cached and what gets bumped? (placement/replacement) • How do I keep content fresh? (Cache coherency) • How do I manage my caches? • What should I do about dynamic content?
Consequences of Poor System: • Stale Content • Increased Latency • New failure point • Underestimated statistics • Difficult Administration • Bottleneck (?)
What makes a good cache system? • Fast Access • Robustness • Transparency • Scalability • Efficiency • Adaptivity • Network Stability • Load balancing • Tolerance of heterogeneity • Simplicity
AOL First Hack: Distribute Proxies Origin Server AT&T UUNet Sprint Proxy Proxy Proxy Proxy
An Improvement: Cache Hierarchy Origin Server BackboneNetworks Proxy Proxy Proxy Proxy Proxy Proxy ISP POPs Proxy Proxy Proxy Proxy Proxy Proxy CustomerNetworks Proxy Proxy Proxy Proxy Proxy Proxy Proxy Proxy Proxy
Hierarchical Cache Issues • Difficult to place cache servers at network core • Increased latency at each level • Bottleneck at high-level caches (?) • Redundant data storage (?)
Try Again: Distributed Caches Proxy Proxy Proxy Origin Server Proxy Proxy
Distributed Caching • Pros: • Load Sharing • Fault Tolerant • Cons: • Higher Connection Times • Higher Bandwidth usage
Even Better: Hybrid Scheme www.news.com www.cmu.edu Proxy Proxy Proxy Proxy Proxy Proxy
Even Better: Hybrid Scheme www.news.com www.cmu.edu Proxy Proxy Proxy Proxy Proxy Proxy http://www.news.com/index.html?
Even Better: Hybrid Scheme www.news.com www.cmu.edu Proxy Proxy Proxy Proxy Proxy Proxy http://www.news.com/index.html?
Even Better: Hybrid Scheme www.news.com www.cmu.edu Proxy Proxy Proxy Proxy Proxy Proxy http://www.cmu.edu?
Even Better: Hybrid Scheme www.news.com www.cmu.edu Proxy Proxy Proxy Proxy Proxy Proxy http://www.cmu.edu?
Evaluation: Hybrid Caching • The Good: • Improved flexibility • Better load-balancing of hot spots • Shorter Connection Times • The Bad: • Much more complex • Good cache routing / resolution is essential
Cache Resolution & Routing • How do users find caches? • How do caches find other caches? • Goal: • Quick data location and retrieval. • Two Fundamentally Different Approaches: • Configuration/Indirection • Transparent Proxying
Transparent Proxying:Cache Routing via Magic Origin Server Smarts ProxyCache
Transparent Proxying:Cache Routing via Magic Origin Server Smarts Smarts ProxyCache ProxyCache
Transparent Proxying:“Hybrid” Architecture Magically! www.news.com www.cmu.edu Proxy Proxy Proxy Proxy Proxy Proxy
Indirect Resolution & Routing • Indirection via: DNS, HTTP redirects, or embedded URLs • Common approaches • Grow a caching distribution tree away from each popular server towards sources of high demand. Do resolution via cache routing table or hash function.
Cache Resolution / Routing. • Cache Routing Table • Harvest cache organizes caches in hierarchy • Adaptive Web caching uses a mesh of caches. • Provey and Harrison scheme • Cachemesh system • Legedza and Guttag
Cache Resolution / Routing • Hashing Function • Cache Array Routing Protocol • Array membership list, URL • Summary Cache • Summary of URLs of cached docs • Karger, Lewin, Leighton, et al. • Consistent hashing (Akamai System)
Prefetching • Anticipate a document requests and preload / prefetch into local cache • Between browser clients and web servers • Traces… • Between proxies and web servers • Pushing… • Between browser clients and proxies
Prefetching • Summary • Browser <-> Server, Proxy <-> Server • Increase WAN traffic • Browser <-> Proxy • Affects traffic only over LANs. • Either fetch based on popularity or access pattern
Cache replacement • Traditional replacement • LRU, LFU, Pitkow/Recker • Key based replacement • Breaking ties… • Size, LRU-Min, LRU-threshold, Hyper-G, Lowest latency first • Cost based replacement • GreedyDual size, Hybrid, Lowest Relative value, Least Normalized Cost, Bolot/Hoschka, SLRU, Server assisted, Hierarchical GreedyDual
Cache Replacement • Oooo… Caveat • Performance of replacement depends on traffic characteristics. No known policy can outperform others for all types of web access patterns
Cache Coherency • Stale pages need to be update. • Web cache coherency are different from issues in distributed systems • Different access patterns, larger scale, single update location (web servers) • Weak/Strong coherency …
Cache Coherency • Strong coherency • Client validation • Server invalidation • Weak coherency • Adaptive TTL • Piggyback invalidation
Caching Contents • Three roles of a cache… • Data cache, Connection cache, computation cache. • Dynamic caching How to make more data cacheable?
User Access Pattern Prediction • Client’s access pattern to predict future requests. • Group resources likely to be accessed together. • Use Prediction by partial match model to determine which page is likely to be accessed in the near future. • Privacy concerns(?)
Load Balancing • Eliminate Hot Spots • Replication to store copies of hot pages/services throughout Internet. Spread work across several servers.
Aside: Cache Clusters ProxyCache ProxyCache Layer 4+ Switch ProxyCache ProxyCache ProxyCache
Additional issues • Proxy Placement(?) • Web Traffic characteristics
Conclusion • Alleviate server bottlenecks • Minimize user access latency • Proxy placement – under researched • Other issues • Dynamic caching, security, fault tolerance • Buzz words • Scalable, robust, adaptive, stable.