530 likes | 791 Views
Peer-to-Peer Networking. Credit slides from J. Pang, B. Richardson, I. Stoica. Why Study P2P. Huge fraction of traffic on networks today >=50%! Exciting new applications Next level of resource sharing Vs. timesharing, client-server E.g. Access 10’s-100’s of TB at low cost.
E N D
Peer-to-Peer Networking Credit slides from J. Pang, B. Richardson, I. Stoica
Why Study P2P • Huge fraction of traffic on networks today • >=50%! • Exciting new applications • Next level of resource sharing • Vs. timesharing, client-server • E.g. Access 10’s-100’s of TB at low cost.
Number of Users Others include BitTorrent, eDonkey, iMesh, Overnet, Gnutella BitTorrent (and others) gaining share from FastTrack (Kazaa).
What is P2P used for? • Use resources of end-hosts to accomplish a shared task • Typically share files • Play game • Search for patterns in data (Seti@Home)
What’s new? • Taking advantage of resources at the edge of the network • Fundamental shift in computing capability • Increase in absolute bandwidth over WAN
Peer to Peer • Systems: • Napster • Gnutella • KaZaA • BitTorrent • Chord
Key issues for P2P systems • Join/leave • How do nodes join/leave? Who is allowed? • Search and retrieval • How to find content? • How are metadata indexes built, stored, distributed? • Content Distribution • Where is content stored? How is it downloaded and retrieved?
4 Key Primitives • Join • How to enter/leave the P2P system? • Publish • How to advertise a file? • Search • how to find a file? • Fetch • how to download a file?
Publish and Search • Basic strategies: • Centralized (Napster) • Flood the query (Gnutella) • Route the query(Chord) • Different tradeoffs depending on application • Robustness, scalability, legal issues
Napster: History • In 1999, S. Fanning launches Napster • Peaked at 1.5 million simultaneous users • Jul 2001, Napster shuts down
Napster: Overiew • Centralized Database: • Join: on startup, client contacts central server • Publish: reports list of files to central server • Search: query the server => return someone that stores the requested file • Fetch: get the file directly from peer
Publish Napster: Publish insert(X, 123.2.21.23) ... I have X, Y, and Z! 123.2.21.23
Fetch Query Reply Napster: Search 123.2.0.18 search(A) --> 123.2.0.18 Where is file A?
Napster: Discussion • Pros: • Simple • Search scope is O(1) • Controllable (pro or con?) • Cons: • Server maintains O(N) State • Server does all processing • Single point of failure
Gnutella: History • In 2000, J. Frankel and T. Pepper from Nullsoft released Gnutella • Soon many other clients: Bearshare, Morpheus, LimeWire, etc. • In 2001, many protocol enhancements including “ultrapeers”
Gnutella: Overview • Query Flooding: • Join: on startup, client contacts a few other nodes; these become its “neighbors” • Publish: no need • Search: ask neighbors, who as their neighbors, and so on... when/if found, reply to sender. • Fetch: get the file directly from peer
I have file A. I have file A. Reply Query Gnutella: Search Where is file A?
Gnutella: Discussion • Pros: • Fully de-centralized • Search cost distributed • Cons: • Search scope is O(N) • Search time is O(???) • Nodes leave often, network unstable
1.5Mbps DSL 1.5Mbps DSL 56kbps Modem 1.5Mbps DSL 10Mbps LAN 1.5Mbps DSL 56kbps Modem 56kbps Modem Aside: All Peers Equal?
Aside: Network Resilience Partial Topology Random 30% die Targeted 4% die from Saroiu et al., MMCN 2002
KaZaA: History • In 2001, KaZaA created by Dutch company Kazaa BV • Single network called FastTrack used by other clients as well: Morpheus, giFT, etc. • Eventually protocol changed so other clients could no longer talk to it • Most popular file sharing network today with >10 million users (number varies)
KaZaA: Overview • “Smart” Query Flooding: • Join: on startup, client contacts a “supernode” ... may at some point become one itself • Publish: send list of files to supernode • Search: send query to supernode, supernodes flood query amongst themselves. • Fetch: get the file directly from peer(s); can fetch simultaneously from multiple peers
“Super Nodes” KaZaA: Network Design
Publish KaZaA: File Insert insert(X, 123.2.21.23) ... I have X! 123.2.21.23
search(A) --> 123.2.22.50 search(A) --> 123.2.0.18 123.2.22.50 Query Replies 123.2.0.18 KaZaA: File Search Where is file A?
KaZaA: Discussion • Pros: • Tries to take into account node heterogeneity: • Bandwidth • Host Computational Resources • Host Availability (?) • Rumored to take into account network locality • Cons: • Mechanisms easy to circumvent • Still no real guarantees on search scope or search time
P2P systems • Napster • Launched P2P • Centralized index • Gnutella: • Focus is simple sharing • Using simple flooding • Kazaa • More intelligent query routing • BitTorrent • Focus on Download speed, fairness in sharing
BitTorrent: History • In 2002, B. Cohen debuted BitTorrent • Key Motivation: • Popularity exhibits temporal locality (Flash Crowds) • E.g., Slashdot effect, CNN on 9/11, new movie/game release • Focused on Efficient Fetching, not Searching: • Distribute the same file to all peers • Single publisher, multiple downloaders • Has some “real” publishers: • Blizzard Entertainment using it to distribute the beta of their new game
BitTorrent: Overview • Swarming: • Join: contact centralized “tracker” server, get a list of peers. • Publish: Run a tracker server. • Search: Out-of-band. E.g., use Google to find a tracker for the file you want. • Fetch: Download chunks of the file from your peers. Upload chunks you have to them.
BitTorrent: Sharing Strategy • Employ “Tit-for-tat” sharing strategy • “I’ll share with you if you share with me” • Be optimistic: occasionally let freeloaders download • Otherwise no one would ever start! • Also allows you to discover better peers to download from when they reciprocate
BitTorrent: Summary • Pros: • Works reasonably well in practice • Gives peers incentive to share resources; avoids freeloaders • Cons: • Central tracker server needed to bootstrap swarm (is this really necessary?)
DHT: History • In 2000-2001, academic researchers said “we want to play too!” • Motivation: • Frustrated by popularity of all these “half-baked” P2P apps :) • We can do better! (so we said) • Guaranteed lookup success for files in system • Provable bounds on search time • Provable scalability to millions of node • Hot Topic in networking ever since
DHT: Overview • Abstraction: a distributed “hash-table” (DHT) data structure: • put(id, item); • item = get(id); • Implementation: nodes in system form a distributed data structure • Can be Ring, Tree, Hypercube, Skip List, Butterfly Network, ...
DHT: Overview (2) • Structured Overlay Routing: • Join: On startup, contact a “bootstrap” node and integrate yourself into the distributed data structure; get a node id • Publish: Route publication for file id toward a close node id along the data structure • Search: Route a query for file id toward a close node id. Data structure guarantees that query will meet the publication. • Fetch: • P2P fetching
Tapestry: a DHT-based P2P system focusing on fault-tolerance Danfeng (Daphne) Yao
A very big picture – P2P networking • Observe • Dynamic nature of the computing environment • Network Expansion • Require • Robustness, or fault tolerance • Scalability • Self-administration, or self-organization
Goals to achieve in Tapestry • Decentralization • Routing uses local data • Efficiency, scalability • Robust routing mechanism • Resilient to node failures • An easy way to locate objects
IDs in Tapestry • NodeID • ObjectID • IDs are computed using a hash function • The hash function is a system-wide parameter • Use ID for routing • Locating an object • Finding a node • A node stores its neighbors’ IDs
Tapestry routing: 03254598 Suffix-based routing
rootId = O Copy of O Each object is assigned a root node • Root node: a unique node for object O with the longest matching suffix • Root knows where O is S • Publish: node S routes a msg <objectID(O), nodeID(S)> to root of O • Closest location is stored if exist multiple copies • Location query: a msg for O is routed towards O’s root • How to choose the root node for an object?
Surrogate routing: unique mapping without global knowledge • Problem: how to choose a unique root node of an object deterministically • Surrogate routing • Choose an object’s root (surrogate) node such that whose nodeId is the closest to the objectId • Small overhead, but no need of global knowledge • The number of additional hops is small
……. empty empty 6534 6534 7534 7534 2534 7534 Surrogate example: find the root node for objectId = 6534 2234 root 1114 object 6534
Tapestry: adaptable, fault-resilient,self-management • Basic routing and location • Backpointer list in neighbor map • Multiple <objectId(O), nodeId(S)> mappings are stored • More semantic flexibility: selection operator for choosing returned objects
Neighbor Map For “1732” (Octal) 0732 x032 xx02 xxx0 1732 x132 1732 xxx1 2732 x232 xx22 1732 3732 x332 xx32 xxx3 4732 x432 xx42 xxx4 5732 x532 xx52 xxx5 6732 x632 xx62 xxx6 7732 1732 xx72 xxx7 4 3 2 1 A single Tapestry node Object location Pointers <objectId, nodeId> Hotspot monitor <objId, nodeId, freq> Object store
Fault-tolerant routing: detect, operate, and recover • Expected faults • Server outages (high load, hw/sw faults) • Link failures (router hw/sw faults) • Neighbor table corruption at the server • Detect: TCP timeout, heartbeats to nodes on backpointer list • Operate under faults: backup neighbors • Second chance for failed server: probing msgs
Fault-tolerant location: avoid single point of failure • Multiple roots for each object • rootId = f(objectId + salt) • Redundant, but reliable • Storage servers periodically republish location info • Cached location info on routers times out • New and recovered objects periodically advertise their location info
Dynamic node insertion • Populate new node’s neighbor maps • Routing messages to its own nodeId • Copy and optimize neighbor maps • Inform relevant nodes of its entries to update neighbor maps