540 likes | 723 Views
A Survey of Web Caching Schemes for the Internet. Jia Wang. Agenda. The World Wide Web Problem and solution (caching) Proxy servers Advantages of web caching Disadvantages of web caching Elements of A WWW caching system Desirable properties of WWW caching system
E N D
A Survey of Web Caching Schemes for the Internet Jia Wang Web Caching Schemes
Agenda • The World Wide Web • Problem and solution (caching) • Proxy servers • Advantages of web caching • Disadvantages of web caching • Elements of A WWW caching system • Desirable properties of WWW caching system • Problems in designing caching systems for the WWW • Caching architecture Web Caching Schemes
The World Wide Web • The WWW can be considered as a large distributed information system. • Exponential growth in size. • On may 1999 included 600 millions of static web pages. • Increases 15% per month. • Very popular. Web Caching Schemes
SIZE OF DISTINCT STATIC WEB PAGES Web Caching Schemes
The World Wide Web • Usage is relatively inexpensive • Accessing information is very fast • Documents appeal to a wide range of interests • But….. Web Caching Schemes
The World Wide Web • Network congestion • Server overloading Web Caching Schemes
Problem • Internet backbone capacity increases 60% per year. • Bandwidth is not growing fast enough. • Without solution WWW will become too congested and its entire appeal would be lost. Web Caching Schemes
Solution • Caching: Placing popular objects at locations close to the clients. Web Caching Schemes
proxy servers • HTTP servers handled by companies for security reasons. • The bottleneck of the connection between the client and the internet. • Shared by all clients inside the firewall. Web Caching Schemes
proxy servers • Belonging to same organization, clients share common interests. • They probably access the same set of documents. Web Caching Schemes
thus • On the proxy server, a previously requested and cached documents would likely result in future hits. Web Caching Schemes
proxy severs • Caching most popular web pages on the proxy server can: • Save network bandwidth • Lower access latency for the client Web Caching Schemes
Advantages of web caching • Reduces bandwidth consumption Decreases network traffic Lessens network congestion • Access latency: • frequently used docs are cached nearby • less traffic shorter delay for docs not cached Web Caching Schemes
Advantages of web caching (cont.) • Reduces workload of remote server • Data can be accessed when remote server is down (enhanced robustness). • Allows analysis of organization usage patterns • cooperation between caches increases efficiency. Web Caching Schemes
Disadvantages of web caching • Data not updated automatically • Cache miss can cause increase in latency (extra proxy processing). • Bottleneck effect – limit # of clients per proxy. • A single proxy is a single point of failure • Information providers can not monitor # of visits per site. Web Caching Schemes
Elements of A WWW caching system • Documents can be cached at the clients, the proxies and the servers. Web Caching Schemes
Elements of a WWW caching system Web Caching Schemes
fast access robustness transparency scalability efficiency adaptivity stability load balance ability to deal with heterogeneity simplicity Desirable properties of WWW caching system Web Caching Schemes
Fast access • Reduce web access latency to a minimum. • Especially comparing to other servers not using caching techniques. Web Caching Schemes
Robustness • Robustness = Availability to user • eliminate single point failure • in case of failure – fall down gracefully • easy to recover from failure Web Caching Schemes
Transparency • Transparent to the user • The user should only notice: • Faster response • Higher availability Web Caching Schemes
Scalability • Scale well along the increasing size and density of the network. • All protocols should be as lightweight as possible. Web Caching Schemes
Efficiency • impose minimal additional burden on the network (in control & data packets) • do not adopt any scheme which leads to under-utilization of the network Web Caching Schemes
Adaptivity • adapt to dynamic changing in the user demand and network environment • achieve optimal performance Web Caching Schemes
Stability • Do not introduce instabilities into the network Web Caching Schemes
Load balancing • distribute load evenly through the entire network • no bottlenecks / hot-spots Web Caching Schemes
Ability to deal with heterogeneity • Adapt to a range of network architecture (hardware & software) Web Caching Schemes
Simplicity • Mechanism simple to deploy • simpler schemes are easier to implement and likely to be accepted as international standards Web Caching Schemes
What Problems do we face in designing caching systems for the WWW ??? Web Caching Schemes
Problems in designing caching systems for the WWW • Caching system architecture • how cache proxies are organized – hierarchically, distributed or hybrid. Web Caching Schemes
Problems in designing caching systems for the WWW • Proxy placement • were to place a cache proxy in order to optimize performance Web Caching Schemes
Problems in designing caching systems for the WWW • Caching contents • What can be cached in the caching system Web Caching Schemes
Problems in designing caching systems for the WWW • Proxy cooperation • How do proxies cooperate with each other Web Caching Schemes
Problems in designing caching systems for the WWW • Data sharing • what kind of data/information can be shared among among cooperative proxies Web Caching Schemes
Problems in designing caching systems for the WWW • Cache resolution/routing • how does a proxy decide where to fetch a page requested by a client. Web Caching Schemes
Problems in designing caching systems for the WWW • Prefetching • How does a proxy decide what and when to prefetch from webservers or other proxies to reduce access latency. Web Caching Schemes
Problems in designing caching systems for the WWW • Cache placement/ replacement • how the proxy decides which page to be stored in its cache and which page to be removed from it. Web Caching Schemes
Problems in designing caching systems for the WWW • Cache coherency • how does a proxy maintain data consistency Web Caching Schemes
Problems in designing caching systems for the WWW • Control information distribution • how is the control information (e.g URL) distributed among proxies. Web Caching Schemes
Problems in designing caching systems for the WWW • Dynamic data caching • how to deal with data that is not cachable Web Caching Schemes
Caching architecture • Hierarchical • Caches are placed at multiple levels of the network. national regional institutional bottom Web Caching Schemes
Hierarchicalarchitecture • Bottom– clients/browsers caches. web page not found national regional web page not found institutional web page not found bottom Web Caching Schemes
Hierarchicalarchitecture • after web page is found forward page, leave copy national regional forward page, leave copy institutional forward page, leave copy bottom Web Caching Schemes
Hierarchicalarchitecture • Advantages: • Bandwidth efficient – especially when cache servers are slow. • Allows to efficiently diffuse popular web pages towards the demand. Web Caching Schemes
Hierarchicalarchitecture • Disadvantages • Cache server needs to be placed at key access points of the network requires coordinationamong caches. • Each level adds a delay. • High levels are bottlenecks. • multiple copies at different cache levels. Web Caching Schemes
Distributed architecture • Caches at the bottom level only. • No other intermediate caching levels. • Each cache server contains meta-data on the data stored on other servers. • Hierarchy used only for distributing information about location of the copy. • No copying of actual documents. Web Caching Schemes
Distributed architecture • Advantages: • Traffic flows through low network levels which are less congested. • No additional disk space required for intermediate network levels. • Better load sharing. • More fault tolerant. Web Caching Schemes
Distributed architecture • Disadvantages: • High connection times • Higher bandwidth usage • Administrative issues. Web Caching Schemes
Distributed architecture • Examples • ICP– Internet Cache Protocol (Harvest group) • Retrieve data from neighboring caches + parent caches • CARP– Cache Array Routing Protocol • URL space divided to an array of caches. • Each cache stores only documents whose URL are hashed to it. Web Caching Schemes