1 / 53

Peer-to-Peer Networking

Peer-to-Peer Networking. Credit slides from J. Pang, B. Richardson, I. Stoica. Why Study P2P. Huge fraction of traffic on networks today >=50%! Exciting new applications Next level of resource sharing Vs. timesharing, client-server E.g. Access 10’s-100’s of TB at low cost.

Download Presentation

Peer-to-Peer Networking

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Peer-to-Peer Networking Credit slides from J. Pang, B. Richardson, I. Stoica

  2. Why Study P2P • Huge fraction of traffic on networks today • >=50%! • Exciting new applications • Next level of resource sharing • Vs. timesharing, client-server • E.g. Access 10’s-100’s of TB at low cost.

  3. Share of Internet Traffic

  4. Number of Users Others include BitTorrent, eDonkey, iMesh, Overnet, Gnutella BitTorrent (and others) gaining share from FastTrack (Kazaa).

  5. What is P2P used for? • Use resources of end-hosts to accomplish a shared task • Typically share files • Play game • Search for patterns in data (Seti@Home)

  6. What’s new? • Taking advantage of resources at the edge of the network • Fundamental shift in computing capability • Increase in absolute bandwidth over WAN

  7. Peer to Peer • Systems: • Napster • Gnutella • KaZaA • BitTorrent • Chord

  8. Key issues for P2P systems • Join/leave • How do nodes join/leave? Who is allowed? • Search and retrieval • How to find content? • How are metadata indexes built, stored, distributed? • Content Distribution • Where is content stored? How is it downloaded and retrieved?

  9. 4 Key Primitives • Join • How to enter/leave the P2P system? • Publish • How to advertise a file? • Search • how to find a file? • Fetch • how to download a file?

  10. Publish and Search • Basic strategies: • Centralized (Napster) • Flood the query (Gnutella) • Route the query(Chord) • Different tradeoffs depending on application • Robustness, scalability, legal issues

  11. Napster: History • In 1999, S. Fanning launches Napster • Peaked at 1.5 million simultaneous users • Jul 2001, Napster shuts down

  12. Napster: Overiew • Centralized Database: • Join: on startup, client contacts central server • Publish: reports list of files to central server • Search: query the server => return someone that stores the requested file • Fetch: get the file directly from peer

  13. Publish Napster: Publish insert(X, 123.2.21.23) ... I have X, Y, and Z! 123.2.21.23

  14. Fetch Query Reply Napster: Search 123.2.0.18 search(A) --> 123.2.0.18 Where is file A?

  15. Napster: Discussion • Pros: • Simple • Search scope is O(1) • Controllable (pro or con?) • Cons: • Server maintains O(N) State • Server does all processing • Single point of failure

  16. Gnutella: History • In 2000, J. Frankel and T. Pepper from Nullsoft released Gnutella • Soon many other clients: Bearshare, Morpheus, LimeWire, etc. • In 2001, many protocol enhancements including “ultrapeers”

  17. Gnutella: Overview • Query Flooding: • Join: on startup, client contacts a few other nodes; these become its “neighbors” • Publish: no need • Search: ask neighbors, who as their neighbors, and so on... when/if found, reply to sender. • Fetch: get the file directly from peer

  18. I have file A. I have file A. Reply Query Gnutella: Search Where is file A?

  19. Gnutella: Discussion • Pros: • Fully de-centralized • Search cost distributed • Cons: • Search scope is O(N) • Search time is O(???) • Nodes leave often, network unstable

  20. Aside: Search Time?

  21. 1.5Mbps DSL 1.5Mbps DSL 56kbps Modem 1.5Mbps DSL 10Mbps LAN 1.5Mbps DSL 56kbps Modem 56kbps Modem Aside: All Peers Equal?

  22. Aside: Network Resilience Partial Topology Random 30% die Targeted 4% die from Saroiu et al., MMCN 2002

  23. KaZaA: History • In 2001, KaZaA created by Dutch company Kazaa BV • Single network called FastTrack used by other clients as well: Morpheus, giFT, etc. • Eventually protocol changed so other clients could no longer talk to it • Most popular file sharing network today with >10 million users (number varies)

  24. KaZaA: Overview • “Smart” Query Flooding: • Join: on startup, client contacts a “supernode” ... may at some point become one itself • Publish: send list of files to supernode • Search: send query to supernode, supernodes flood query amongst themselves. • Fetch: get the file directly from peer(s); can fetch simultaneously from multiple peers

  25. “Super Nodes” KaZaA: Network Design

  26. Publish KaZaA: File Insert insert(X, 123.2.21.23) ... I have X! 123.2.21.23

  27. search(A) --> 123.2.22.50 search(A) --> 123.2.0.18 123.2.22.50 Query Replies 123.2.0.18 KaZaA: File Search Where is file A?

  28. KaZaA: Discussion • Pros: • Tries to take into account node heterogeneity: • Bandwidth • Host Computational Resources • Host Availability (?) • Rumored to take into account network locality • Cons: • Mechanisms easy to circumvent • Still no real guarantees on search scope or search time

  29. P2P systems • Napster • Launched P2P • Centralized index • Gnutella: • Focus is simple sharing • Using simple flooding • Kazaa • More intelligent query routing • BitTorrent • Focus on Download speed, fairness in sharing

  30. BitTorrent: History • In 2002, B. Cohen debuted BitTorrent • Key Motivation: • Popularity exhibits temporal locality (Flash Crowds) • E.g., Slashdot effect, CNN on 9/11, new movie/game release • Focused on Efficient Fetching, not Searching: • Distribute the same file to all peers • Single publisher, multiple downloaders • Has some “real” publishers: • Blizzard Entertainment using it to distribute the beta of their new game

  31. BitTorrent: Overview • Swarming: • Join: contact centralized “tracker” server, get a list of peers. • Publish: Run a tracker server. • Search: Out-of-band. E.g., use Google to find a tracker for the file you want. • Fetch: Download chunks of the file from your peers. Upload chunks you have to them.

  32. BitTorrent: Sharing Strategy • Employ “Tit-for-tat” sharing strategy • “I’ll share with you if you share with me” • Be optimistic: occasionally let freeloaders download • Otherwise no one would ever start! • Also allows you to discover better peers to download from when they reciprocate

  33. BitTorrent: Summary • Pros: • Works reasonably well in practice • Gives peers incentive to share resources; avoids freeloaders • Cons: • Central tracker server needed to bootstrap swarm (is this really necessary?)

  34. DHT: History • In 2000-2001, academic researchers said “we want to play too!” • Motivation: • Frustrated by popularity of all these “half-baked” P2P apps :) • We can do better! (so we said) • Guaranteed lookup success for files in system • Provable bounds on search time • Provable scalability to millions of node • Hot Topic in networking ever since

  35. DHT: Overview • Abstraction: a distributed “hash-table” (DHT) data structure: • put(id, item); • item = get(id); • Implementation: nodes in system form a distributed data structure • Can be Ring, Tree, Hypercube, Skip List, Butterfly Network, ...

  36. DHT: Overview (2) • Structured Overlay Routing: • Join: On startup, contact a “bootstrap” node and integrate yourself into the distributed data structure; get a node id • Publish: Route publication for file id toward a close node id along the data structure • Search: Route a query for file id toward a close node id. Data structure guarantees that query will meet the publication. • Fetch: • P2P fetching

  37. Tapestry: a DHT-based P2P system focusing on fault-tolerance Danfeng (Daphne) Yao

  38. A very big picture – P2P networking • Observe • Dynamic nature of the computing environment • Network Expansion • Require • Robustness, or fault tolerance • Scalability • Self-administration, or self-organization

  39. Goals to achieve in Tapestry • Decentralization • Routing uses local data • Efficiency, scalability • Robust routing mechanism • Resilient to node failures • An easy way to locate objects

  40. IDs in Tapestry • NodeID • ObjectID • IDs are computed using a hash function • The hash function is a system-wide parameter • Use ID for routing • Locating an object • Finding a node • A node stores its neighbors’ IDs

  41. Tapestry routing: 03254598 Suffix-based routing

  42. rootId = O Copy of O Each object is assigned a root node • Root node: a unique node for object O with the longest matching suffix • Root knows where O is S • Publish: node S routes a msg <objectID(O), nodeID(S)> to root of O • Closest location is stored if exist multiple copies • Location query: a msg for O is routed towards O’s root • How to choose the root node for an object?

  43. Surrogate routing: unique mapping without global knowledge • Problem: how to choose a unique root node of an object deterministically • Surrogate routing • Choose an object’s root (surrogate) node such that whose nodeId is the closest to the objectId • Small overhead, but no need of global knowledge • The number of additional hops is small

  44. ……. empty empty 6534 6534 7534 7534 2534 7534 Surrogate example: find the root node for objectId = 6534 2234 root 1114 object 6534

  45. Tapestry: adaptable, fault-resilient,self-management • Basic routing and location • Backpointer list in neighbor map • Multiple <objectId(O), nodeId(S)> mappings are stored • More semantic flexibility: selection operator for choosing returned objects

  46. Neighbor Map For “1732” (Octal) 0732 x032 xx02 xxx0 1732 x132 1732 xxx1 2732 x232 xx22 1732 3732 x332 xx32 xxx3 4732 x432 xx42 xxx4 5732 x532 xx52 xxx5 6732 x632 xx62 xxx6 7732 1732 xx72 xxx7 4 3 2 1 A single Tapestry node Object location Pointers <objectId, nodeId> Hotspot monitor <objId, nodeId, freq> Object store

  47. Fault-tolerant routing: detect, operate, and recover • Expected faults • Server outages (high load, hw/sw faults) • Link failures (router hw/sw faults) • Neighbor table corruption at the server • Detect: TCP timeout, heartbeats to nodes on backpointer list • Operate under faults: backup neighbors • Second chance for failed server: probing msgs

  48. Fault-tolerant location: avoid single point of failure • Multiple roots for each object • rootId = f(objectId + salt) • Redundant, but reliable • Storage servers periodically republish location info • Cached location info on routers times out • New and recovered objects periodically advertise their location info

  49. Dynamic node insertion • Populate new node’s neighbor maps • Routing messages to its own nodeId • Copy and optimize neighbor maps • Inform relevant nodes of its entries to update neighbor maps

More Related