1 / 71

Peer-to-Peer Systems

Peer-to-Peer Systems. Presented By: Nazanin Dehghani Supervisor: Dr. Naser Yazdani. Peer-to-Peer Architecture. more dynamic structure while having a distributed system . every client should bound statically to a specific server. Peer-to-Peer Definition.

inara
Download Presentation

Peer-to-Peer Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Peer-to-Peer Systems Presented By: NazaninDehghani Supervisor: Dr. NaserYazdani

  2. Peer-to-Peer Architecture more dynamic structure while having a distributed system. every client should bound statically to a specific server

  3. Peer-to-Peer Definition • “a computer network in which each computer in the network can act as a client or server for the other computers in the network, allowing shared access to files and peripherals without the need for a central server.”

  4. Peer-to-Peer Applications

  5. Peer-to-Peer Systems • Properties • Nodes have to share their resources such as memory, band-width and processing power directly • P2P networks should be robust to node churn

  6. Primitives • Common Primitives • Join: how to I begin participating? • Publish: how do I advertise my file? • Search: how to I find a file? • Fetch: how to I retrieve a file?

  7. Architecture of P2P Systems • Overlay Network • Graph Structure • Structured • Aware of topology of overlay network • Unstructured

  8. How Did it Start? • A killer application: Naptser • Free music over the Internet • Key idea: share the content, storage and bandwidth of individual (home) users Internet

  9. Model • Each user stores a subset of files • Each user has access (can download) files from all users in the system

  10. Main Challenge • Find where a particular file is stored E F D E? C A B

  11. Other Challenges • Scale: up to hundred of thousands or millions of machines • Dynamicity: machines can come and go any time

  12. Napster • Assume a centralized index system that maps files (songs) to machines that are alive • How to find a file (song) • Query the index system  return a machine that stores the required file • Ideally this is the closest/least-loaded machine • ftp the file • Advantages: • Simplicity, easy to implement sophisticated search engines on top of the index system • Disadvantages: • Robustness, scalability (?)

  13. E? E E? m5 Napster: Example m5 E m6 F D m1 A m2 B m3 C m4 D m5 E m6 F m4 C A B m3 m1 m2

  14. Gnutella • Distribute file location • Idea: flood the request • Hot to find a file: • Send request to all neighbors • Neighbors recursively multicast the request • Eventually a machine that has the file receives the request, and it sends back the answer • Advantages: • Totally decentralized, highly robust • Disadvantages: • Not scalable; the entire network can be swamped with request (to alleviate this problem, each request has a TTL)

  15. xyz Gnutella • Queries are flooded for bounded number of hops • No guarantees on recall xyz Query: “xyz”

  16. Distributed Hash Tables (DHTs) • Abstraction: a distributed hash-table data structure • insert(id, item); • item = query(id); (or lookup(id);) • Note: item can be anything: a data object, document, file, pointer to a file… • Proposals • CAN, Chord, Kademlia, Pastry, Tapestry, etc

  17. DHT Design Goals • Make sure that an item (file) identified is always found • Scales to hundreds of thousands of nodes • Handles rapid arrival and failure of nodes

  18. K I K I K I K I K I K I K I K I K I (K1,I1) I1 put(K1,I1) get (K1) Structured Networks • Distributed Hash Tables (DHTs) • Hash table interface: put(key,item), get(key) • O(log n) hops • Guarantees on recall

  19. Chord

  20. Chord In short: a peer-to-peer lookup service Solves problem of locating a data item in a collection of distributed nodes, considering frequent node arrivals and departures Core operation in most p2p systems is efficient location of data items Supports just one operation: given a key, it maps the key onto a node

  21. Chord Characteristics Simplicity, provable correctness, and provable performance Each Chord node needs routing information about only a few other nodes Resolves lookups via messages to other nodes (iteratively or recursively) Maintains routing information as nodes join and leave the system

  22. Napster, Gnutella etc. vs. Chord Compared to Napster and its centralized servers, Chord avoids single points of control or failure by a decentralized technology Compared to Gnutella and its widespread use of broadcasts, Chord avoids the lack of scalability through a small number of important information for rounting

  23. Addressed Difficult Problems (1) Load balance: distributed hash function, spreading keys evenly over nodes Decentralization: chord is fully distributed, no node more important than other, improves robustness Scalability: logarithmic growth of lookup costs with number of nodes in network, even very large systems are feasible

  24. Addressed Difficult Problems (2) Availability: chord automatically adjusts its internal tables to ensure that the node responsible for a key can always be found

  25. Consistent Hashing Hash function assigns each node and key an m-bit identifier using a base hash function such as SHA-1 ID(node) = hash(IP, Port) ID(key) = hash(key) Properties of consistent hashing: Function balances load: all nodes receive roughly the same number of keys When an Nth node joins (or leaves) the network, only an O(1/N) fraction of the keys are moved to a different location

  26. identifier node 6 X key 0 1 7 6 2 5 3 4 2 Successor Nodes 1 successor(1) = 1 identifier circle successor(6) = 0 6 2 successor(2) = 3

  27. Node Joins and Departures 0 1 7 6 2 5 3 4 6 1 6 successor(6) = 7 successor(1) = 3 1 2

  28. Scalable Key Location A very small amount of routing information suffices to implement consistent hashing in a distributed environment Each node need only be aware of its successor node on the circle Queries for a given identifier can be passed around the circle via these successor pointers Resolution scheme correct, BUT inefficient: it may require traversing all N nodes!

  29. Acceleration of Lookups Lookups are accelerated by maintaining additional routing information Each node maintains a routing table with (at most) m entries (where N=2m) called the finger table ithentry in the table at node n contains the identity of the first node, s, that succeeds n by at least 2i-1 on the identifier circle (clarification on next slide) s = successor(n + 2i-1) (all arithmetic mod 2) s is called the ith finger of node n, denoted by n.finger(i).node

  30. Finger Tables (1) finger table keys start int. succ. 6 finger table keys 0 start int. succ. 1 1 2 3 5 [2,3) [3,5) [5,1) 3 3 0 7 6 2 finger table keys 5 3 start int. succ. 2 4 5 7 [4,5) [5,7) [7,3) 0 0 0 4 1 2 4 [1,2) [2,4) [4,0) 1 3 0

  31. Finger Tables (2) - characteristics Each node stores information about only a small number of other nodes, and knows more about nodes closely following it than about nodes fartheraway A node’s finger table generally does not contain enough information to determine the successor of an arbitrary key k Repetitive queries to nodes that immediately precede the given key will lead to the key’s successor eventually

  32. Node Joins – with Finger Tables finger table keys 0 start int. succ. 1 1 2 3 5 [2,3) [3,5) [5,1) 3 3 0 7 finger table keys 6 2 start int. succ. 7 0 2 [7,0) [0,2) [2,6) 0 0 3 5 3 4 finger table keys start int. succ. 6 1 2 4 [1,2) [2,4) [4,0) 1 3 0 6 6 finger table keys start int. succ. 2 4 5 7 [4,5) [5,7) [7,3) 0 0 0 6 6

  33. Node Departures – with Finger Tables 0 1 7 6 2 5 3 4 finger table keys start int. succ. 1 2 4 [1,2) [2,4) [4,0) 1 3 0 3 6 finger table keys start int. succ. 1 2 3 5 [2,3) [3,5) [5,1) 3 3 0 6 finger table keys start int. succ. 6 7 0 2 [7,0) [0,2) [2,6) 0 0 3 finger table keys start int. succ. 2 4 5 7 [4,5) [5,7) [7,3) 6 6 0 0

  34. Chord “Finger Table” 1/2 1/4 1/8 1/16 1/32 1/64 1/128 N80 • Entry i in the finger table of node n is the first node that succeeds or equals n + 2i • In other words, the ith finger points 1/2n-i way around the ring

  35. Chord Routing Succ. Table • Upon receiving a query for item id, a node: • Checks whether stores the item locally • If not, forwards the query to the largest node in its successor table that does not exceed id Items 7 i id+2i succ 0 1 1 1 2 2 2 4 0 0 Succ. Table Items 1 1 i id+2i succ 0 2 2 1 3 6 2 5 6 7 query(7) 6 2 Succ. Table i id+2i succ 0 7 0 1 0 0 2 2 2 Succ. Table i id+2i succ 0 3 6 1 4 6 2 6 6 5 3 4

  36. Node Join • Compute ID • Use an existing node to route to that ID in the ring. • Finds s = successor(id) • ask s for its predecessor, p • Splice self into ring just like a linked list • p->successor = me • me->successor = s • me->predecessor = p • s->predecessor = me

  37. Chord Summary • Routing table size? • Log N fingers • Routing time? • Each hop expects to 1/2 the distance to the desired id => expect O(log N) hops.

  38. Bittorrent

  39. Fairness • How about somebody only download not upload. • What is the policy • Incentive mechanism Distributed Operating Systems

  40. Fetching Data • Once we know which node(s) have the data we want... • Option 1: Fetch from a single peer • Problem: Have to fetch from peer who has whole file. • Peers not useful sources until d/l whole file • At which point they probably log off. :) • How can we fix this?

  41. Chunk Fetching • More than one node may have the file. • How to tell? • Must be able to distinguish identical files • Not necessarily same filename • Same filename not necessarily same file... • Use hash of file • Common: MD5, SHA-1, etc. • How to fetch? • Get bytes [0..8000] from A, [8001...16000] from B • Alternative: Erasure Codes Distributed Operating Systems

  42. BitTorrent • Written by Bram Cohen (in Python) in 2001 • “Pull-based” “swarming” approach • Each file split into smaller pieces • Nodes request desired pieces from neighbors • As opposed to parents pushing data that they receive • Pieces not downloaded in sequential order • Encourages contribution by all nodes

  43. BitTorrent • Piece Selection • Rarest first • Random first selection • Peer Selection • Tit-for-tat • Optimistic un-choking

  44. BitTorrent Swarm • Swarm • Set of peers all downloading the same file • Organized as a random mesh • Each node knows list of pieces downloaded by neighbors • Node requests pieces it does not own from neighbors

  45. How a node enters a swarm for file “popeye.mp4” • File popeye.mp4.torrent hosted at a (well-known) webserver • The .torrent has address of tracker for file • The tracker, which runs on a webserver as well, keeps track of all peers downloading file

  46. How a node enters a swarm for file “popeye.mp4” www.bittorrent.com • File popeye.mp4.torrent hosted at a (well-known) webserver • The .torrent has address of tracker for file • The tracker, which runs on a webserver as well, keeps track of all peers downloading file 1 Peer popeye.mp4.torrent

  47. How a node enters a swarm for file “popeye.mp4” www.bittorrent.com • File popeye.mp4.torrent hosted at a (well-known) webserver • The .torrent has address of tracker for file • The tracker, which runs on a webserver as well, keeps track of all peers downloading file 2 Peer Addresses of peers Tracker

  48. How a node enters a swarm for file “popeye.mp4” www.bittorrent.com • File popeye.mp4.torrent hosted at a (well-known) webserver • The .torrent has address of tracker for file • The tracker, which runs on a webserver as well, keeps track of all peers downloading file Peer 3 Tracker Swarm

  49. Contents of .torrent file • URL of tracker • Piece length – Usually 256 KB • SHA-1 hashes of each piece in file • For reliability • “files” – allows download of multiple files

  50. Terminology • Seed: peer with the entire file • Original Seed: The first seed • Leech: peer that’s downloading the file • Fairer term might have been “downloader”

More Related