Squirrel: A peer-to-peer web cache

Squirrel: A peer-to-peer web cache Sitaram Iyer Joint work with Ant Rowstron (MSRC) and Peter Druschel

Peer-to-peer Computing Decentralize a distributed protocol: • Scalable • Self-organizing • Fault tolerant • Load balanced Not automatic!!

Web Caching 1. Latency, 2. External bandwidth, 3. Server load. ISPs, Corporate network boundaries, etc. Cooperative Web Caching: group of web caches tied together and acting as one web cache.

Web Cache Browser Cache Browser Centralized Web Cache Web Server Browser Cache Browser Internet LAN Sharing!

Decentralized Web Cache Browser Cache Browser Web Server Browser Cache Browser Internet LAN • Why? • How?

Why peer-to-peer ? • Cost of dedicated web cache No additional hardware • Administrative costs Self-organizing • Scaling needs upgrading Resources grow with clients • Single point of failure Fault-tolerant by design

Setting • Corporate LAN • 100 - 100,000 desktop machines • Single physical location • Each node runs an instance of Squirrel • Sets it as the browser’s proxy

Pastry Peer-to-peer object location and routing substrate Distributed Hash Table: reliably map an object key to a live node Routes in log2b(N)steps (e.g. 3-4steps for 100,000 nodes, with b=16)

Internet LAN Home-store model client URL hash home

Home-store model client home …that’s how it works!

Directory model Client nodes always store objects in local caches. Main difference between the two schemes: whether the home node also stores the object. In the directory model, it only stores pointers to recent clients, and forwards requests to them.

Net LAN Directory model client home

Directory model client delegate random entry home

other req other req req a : no dir, go to origin. Also d a , d : req 1 1 home 2 2 client b : not-modified dir a , d origin c ,e : object 3 3 2 4 c ,e : req server 1 1 delegate object or e 3 not-modified origin e : cGET req 2 server (skip) Full directory protocol

Recap • Two endpoints of design space, based on the choice of storage location. • At first sight, both seem to do about as well. (e.g. hit ratio, latency).

Quirk Consider a • Web page with many images, or • Heavily browsing node In the Directory scheme, Many home nodes pointing to one delegate Home-store: natural load balancing .. evaluation on trace-based workloads ..

Trace characteristics

105 No web cache 100 (in GB) [lower is better] Total external bandwidth 95 Directory Home-store 90 Centralized cache 85 0.001 0.01 0.1 1 10 100 Per-node cache size (in MB) Total external bandwidth Redmond

6.1 No web cache 6 5.9 Directory (in GB) [lower is better] Total external bandwidth 5.8 Home-store 5.7 5.6 Centralized cache 5.5 0.001 0.01 0.1 1 10 100 Per-node cache size (in MB) Total external bandwidth Cambridge

100% 80% 60% Fraction of cacheable requests 40% 20% 0% 0 1 2 3 4 5 6 Total hops within the LAN Centralized Home-store Directory LAN Hops Redmond

100% 80% 60% Fraction of cacheable requests 40% 20% 0% 0 1 2 3 4 5 Total hops within the LAN Centralized Home-store Directory LAN Hops Cambridge

Load in requests per sec 100000 Home-store Directory 10000 1000 Redmond Number of such seconds 100 10 1 0 10 20 30 40 50 Max objects served per-node / second

Load in requests per sec 1e+07 Home-store Directory 1e+06 100000 10000 Cambridge Number of such seconds 1000 100 10 1 0 10 20 30 40 50 Max objects served per-node / second

Load in requests per min 100 Home-store Directory 10 Redmond Number of such minutes 1 0 50 100 150 200 250 300 350 Max objects served per-node / minute

Load in requests per min Home-store Directory 10000 1000 Cambridge Number of such minutes 100 10 1 0 20 40 60 80 100 120 Max objects served per-node / minute

Conclusion Possible to decentralize web caching Performance comparable to centralized cache Is better in terms of cost, administration, scalability and fault tolerance.

(backup) Storage utilization

(backup) Fault tolerance

(backup) Full home-store protocol other req other req req (LAN) (WAN) a : object or notmod from home b : req home client 1 b b : object or notmod from origin 2 3 origin server

other req other req req a : no dir, go to origin. Also d a , d : req 1 1 home 2 2 client b : not-modified dir a , d origin c ,e : object 3 3 2 4 c ,e : req server 1 1 delegate object or e 3 not-modified origin e : cGET req 2 server (backup) Full directory protocol

Squirrel: A peer-to-peer web cache

Squirrel: A peer-to-peer web cache

Presentation Transcript

Squirrel: A decentralized peer-to-peer web cache

One Hop Lookups for Peer-to-Peer Overlays

CS 552 Peer 2 Peer Networking

Peer-to-Peer Streaming

Lecture XIV: P2P

Peer-to-Peer (P2P) Distributed Storage

Peer to Peer Networks vs. Server Based Networks

Peer-to-Peer Youth Enterprises and the Peer-to-Peer Youth Enterprises Investment Fund

Squirrel: A peer-to-peer web cache

Peer-to-Peer Programming with .NET 3.5

Peer to Peer Discovery

Making Peer-to-Peer Work for SIP

Cache Updates in a Peer-to-Peer Network of Mobile Agents

One Hop Lookups for Peer-to-Peer Overlays

Peer-to-Peer Supported Cache System for File Transfer

Is Peer Review Peerless?

Peer to Peer Interactions