1 / 33

Caching and Content Distribution Networks

Explore key insights on caching mechanisms like web browser caching, proxy caches, and push/pull-based approaches for optimized content delivery. Learn about cache efficiency, consistency, and cooperative caching infrastructure.

lcromer
Download Presentation

Caching and Content Distribution Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Caching and Content Distribution Networks

  2. Some Interesting Observations • Top 1 % of all documents account for 20% - 35% of proxy requests • Top 10% account for 45% - 55% of requests • It takes 25% to 40% of all documents to account for 70% of requests • It takes 70% to 80% of all documents to account for 90% of requests

  3. request Web server browser response request request WebProxy cache Web server browser response response Web Caching • As an example, we use the web to illustrate caching and other related issues

  4. Web Browser Caching • Web browsers have their own caches. When a page is downloaded from a site the web page is put into the browser cache. • This is especially useful in those cases when the back button is pressed. • If a new copy is needed then a “refresh” can be done. • No page stays permanently in the cache. There is limited room. • A replacement algorithm is needed to determine which cached page should be purged.

  5. Web Browser Caching • Client pull • The server provides the content with instructions on when the client should ask for a refreshed copy of the content or if the content should be cached. • Server push • The server transmits page information to the screen. • The browser application displays the information and leaves the connection to the server open. • With an open connection, the server can continue to push updated pages for your screen to display on an ongoing basis. You can close the connection by closing the page. • The server is in control • Browser caches are different from proxy caches (discussed next).

  6. Web Caching • Proxy caches (also called proxy server) • Intercepts HTTP requests from client • Serves object if in its cache • If not goes to object’s home server • On behalf of user, gets the object and possibly deposits in its cache before returning to user • Usually deployed at edges of a network • Wide area bandwidth savings, improved response time and increased availability of static web-based objects • A browser may have to be configured to point to the proxy server. • Usually a proxy cache is purchased and installed by an ISP

  7. push Web server proxy Push-Based Approach • Server tracks all proxies that have requested objects • If a web page is modified, notify each proxy • Notification types • Indicate object has changed [invalidate] • Send new version of object [update] • How to decide between invalidate and updates? • Pros and cons? • One approach: send updates for more frequently accessed objects, invalidate for rest

  8. Push-Based Approaches • Advantages • Provide tight consistency [minimal stale data] • Proxies can be passive • Disadvantages • Need to maintain state at the server • Recall that HTTP is stateless • Need mechanisms beyond HTTP • State may need to be maintained indefinitely • Not resilient to server crashes • The disadvantage is the reason why push-based approaches are not used

  9. poll Web server proxy response Pull-Based Approaches • The proxy is entirely responsible for maintaining consistency • The proxy periodically polls the server to see if object has changed • Use if-modified-since HTTP messages: This type of message can be used by a proxy to tell a remote server to return a copy only if it has been modified. • Key question: When should a proxy poll? • Server-assigned Time-to-Live (TTL) values • No guarantee if the object will change in the interim

  10. Pull-Based Approach: Intelligent Polling • Proxy can dynamically determine the refresh interval • Compute based on past observations • Start with a conservative refresh interval • Increase interval if object has not changed between two successive polls • Decrease interval if object is updated between two polls • Adaptive: No prior knowledge of object characteristics needed

  11. Pull-Based Approach • Advantages • Server remains stateless • Resilient to both server and proxy failures • Disadvantages • Weaker consistency guarantees (objects can change between two polls and proxy will contain stale data until next poll) • High message overhead

  12. Get + lease req read Client Proxy Reply + lease Server Invalidate/update A Hybrid Approach: Leases • Lease: Duration of time for which server agrees to notify proxy of modification • Issue lease on first request, send notification until expiry • Need to renew lease upon expiry • Smooth tradeoff between state and messages exchanged • Zero duration => polling, Infinite leases => server-push • Efficiency depends on the lease duration • Limited use

  13. Cooperative Caching • Caching infrastructure can have multiple web proxies • Proxies can be arranged in a hierarchy or other structures • Proxies can cooperate with one another • Answer client requests • Propagate server notifications • Uses a combination of HTTP and ICP (Internet Caching Protocol). • ICP can be used by one cache to quickly ask another cache if it has an object. • HTTP is used to actually retrieve the object.

  14. Problems • Caching proxies do not serve all Internet users. • Content providers (say, Web servers) cannot rely on existence and correct implementation of caching proxies. • Accounting issues with caching proxies: • Example: www.cnn.comneeds to know the number of hits to the advertisements displayed on the web page.

  15. Content Distribution Networks (CDN) • Business Model: A content provider such as www.cnn.com or Yahoo pays a CDN company (such as Akamai) to get its content to the requesting users with short delays. • A CDN provides a mechanism for • Replicating content on multiple servers in the Internet • Providing clients with a means to determine the servers that can deliver the content fastest.

  16. Terminology • Content: Any publicly accessible combination of text, images, applets, frames, MP3, video, flash, virtual reality objects, etc. • Content Provider: Any individual, organization, or company that has content that it wishes to make available to users. • Origin Server: Content provider’s server , where the content is first uploaded. • Surrogate Server (sometimes called edge server): Content distributor’s server, where the replicated content is kept.

  17. Players Yahoo, MSNBC, CNN Content Provider Send content Akamai, Digital Island, AT&T Content Distributor Sells servers Install servers Cisco, Lucent, Inktomi, CacheFlow H/W and S/W Vendor Hosting Provider Exodus

  18. CDN: Distribution • The CDN company places hundreds of CDN servers in Internet hosting centers. • The CDN replicates its customers’ content in the CDN servers. Whenever, a customer updates its content (e.g., web page), the CDN redistributes the fresh content to the CDN servers. • The CDN provides a mechanism so that when a user requests content, the content is provided by the CDN server that can most rapidly deliver the content to the user. • This can be the closest CDN server to the user (perhaps in the same ISP as the user) or may be a CDN server with a congestion-free path to the user.

  19. CDN: Distribution Origin server in North America push content Akamai CDN CDN distribution node push content push content CDN server in South America push content CDN server in Asia CDN server in Europe

  20. CDN: Functional Components • Distribution Service • Redirection Service • Accounting and Billing system

  21. CDN:Distribution Service • The content provider determines which of its objects it wants the CDN to distribute. • The content provider tags and then pushes this content to a CDN node, which in turn replicates and pushes the content to all its CDN servers.

  22. CDN: Distribution Service • When a browser in a user’s host is instructed to retrieve a specific object (specified using a URL), how does the browser determine whether it should retrieve the object from the origin server or from one of the CDN servers? • As an example, suppose the hostname of the content provider is www.cnn.com • Suppose the hostname of the CDN company is www.akamai.com

  23. CDN: Redirection • Users get an html document from www.cnn.com; this could be index.html • The file index.html uses a modified URL for content that has been replicated. • Example: If the gif files are what has been replicated then <img src=“http://cnn.com/af/x.gif> may be modified as follows: <img src=http://a73.g.akamaitech.net/7/23/cnn.com/af/x.gif> • The browser needs to resolve aXYZ.g.akamaitech.net hostname for replicated content.

  24. CDN: Redirection • DNS is configured so that all queries about g.akamaitech.net that arrive at a DNS server are sent to an authoritative DNS server for g.akamaitech.net. This is referred to as a Akamai DNS server (authoritative DNS server) • When the Akamai DNS server receives the query, it extracts the IP address of the requesting browser. • Based on the IP address and information that it has about the Internet (called a map), the IP address of an Akamai server(surrogateserver) is returned to the requesting browser based on policy e.g., select the server that is the fewest hops away.

  25. CDN Redirection • The Akamai DNS server IP address is now in the cache of the local DNS server. • This implies that it is not always necessary to go to the root DNS server. • The TTL associated with the IP address of an Akamai server(surrogate) is relatively small. • This is done for performance reasons. • Akamai content distribution servers are caches

  26. CDN Redirection • What if content is not there? • If the request content is not found then the surrogate will ask other surrogates within a specified region for information. • If requested information is still not found or is stale, then a request is made to the original web site.

  27. CDN Redirection CNN.com Authoritative DNS server for cdn.com PUT /images/*.gif 64.236.24.28 Index.html DNS query: cdn.com ? GET www.cnn.com/index.html Index.html 64.236.24.28 ... <img src="http://www.cdn.com/cnn/images/1.gif”> ... GET /cnn/images/1.gif 1.gif DNS query: cdn.com ? 64.236.24.28 Client Local DNS server

  28. CDN Selection • The tricky issue is selecting which local content server to use for a particular request • Want to spread load evenly • Want minimal impact if server is added or removed. • In Akamai, each surrogate server sends measurement results to the Network Operations Communications Center (NOCC). • Measurement results include number of active TCP connections, HTTP request arrival rate, bandwidth availability, etc • This information is used by the Akamai DNS server.

  29. Accounting Mechanism • Accounting mechanisms collect and track information related to request routing, distribution and delivery. • Information is gathered in real time and put into log files for each CDN component. • This gets sent to the Network Operations Communications Center (NOCC).

  30. Full Site Delivery vs. Partial Site Delivery • Full Site Delivery : All the contents are delivered by the CDN (including HTML, images, and other objects). • Partial Site delivery: Only images, streaming media and other bandwidth intensive objects delivered by the CDN.

  31. CDNs and Content • Content Suitable for CDNS • Images • Streaming media • Java applets • Static information • Content not suitable • Dynamic information • Personalized information

  32. Current Akamai Customers

  33. Summary • We have examined replication and issues related to the design and implementation of a replicated system. • Many choices and tradeoffs to consider

More Related