740 likes | 1.21k Views
Peer-to-Peer Systems. Presented By: Nazanin Dehghani Supervisor: Dr. Naser Yazdani. Peer-to-Peer Architecture. more dynamic structure while having a distributed system . every client should bound statically to a specific server. Peer-to-Peer Definition.
E N D
Peer-to-Peer Systems Presented By: NazaninDehghani Supervisor: Dr. NaserYazdani
Peer-to-Peer Architecture more dynamic structure while having a distributed system. every client should bound statically to a specific server
Peer-to-Peer Definition • “a computer network in which each computer in the network can act as a client or server for the other computers in the network, allowing shared access to files and peripherals without the need for a central server.”
Peer-to-Peer Systems • Properties • Nodes have to share their resources such as memory, band-width and processing power directly • P2P networks should be robust to node churn
Primitives • Common Primitives • Join: how to I begin participating? • Publish: how do I advertise my file? • Search: how to I find a file? • Fetch: how to I retrieve a file?
Architecture of P2P Systems • Overlay Network • Graph Structure • Structured • Aware of topology of overlay network • Unstructured
How Did it Start? • A killer application: Naptser • Free music over the Internet • Key idea: share the content, storage and bandwidth of individual (home) users Internet
Model • Each user stores a subset of files • Each user has access (can download) files from all users in the system
Main Challenge • Find where a particular file is stored E F D E? C A B
Other Challenges • Scale: up to hundred of thousands or millions of machines • Dynamicity: machines can come and go any time
Napster • Assume a centralized index system that maps files (songs) to machines that are alive • How to find a file (song) • Query the index system return a machine that stores the required file • Ideally this is the closest/least-loaded machine • ftp the file • Advantages: • Simplicity, easy to implement sophisticated search engines on top of the index system • Disadvantages: • Robustness, scalability (?)
E? E E? m5 Napster: Example m5 E m6 F D m1 A m2 B m3 C m4 D m5 E m6 F m4 C A B m3 m1 m2
Gnutella • Distribute file location • Idea: flood the request • Hot to find a file: • Send request to all neighbors • Neighbors recursively multicast the request • Eventually a machine that has the file receives the request, and it sends back the answer • Advantages: • Totally decentralized, highly robust • Disadvantages: • Not scalable; the entire network can be swamped with request (to alleviate this problem, each request has a TTL)
xyz Gnutella • Queries are flooded for bounded number of hops • No guarantees on recall xyz Query: “xyz”
Distributed Hash Tables (DHTs) • Abstraction: a distributed hash-table data structure • insert(id, item); • item = query(id); (or lookup(id);) • Note: item can be anything: a data object, document, file, pointer to a file… • Proposals • CAN, Chord, Kademlia, Pastry, Tapestry, etc
DHT Design Goals • Make sure that an item (file) identified is always found • Scales to hundreds of thousands of nodes • Handles rapid arrival and failure of nodes
K I K I K I K I K I K I K I K I K I (K1,I1) I1 put(K1,I1) get (K1) Structured Networks • Distributed Hash Tables (DHTs) • Hash table interface: put(key,item), get(key) • O(log n) hops • Guarantees on recall
Chord In short: a peer-to-peer lookup service Solves problem of locating a data item in a collection of distributed nodes, considering frequent node arrivals and departures Core operation in most p2p systems is efficient location of data items Supports just one operation: given a key, it maps the key onto a node
Chord Characteristics Simplicity, provable correctness, and provable performance Each Chord node needs routing information about only a few other nodes Resolves lookups via messages to other nodes (iteratively or recursively) Maintains routing information as nodes join and leave the system
Napster, Gnutella etc. vs. Chord Compared to Napster and its centralized servers, Chord avoids single points of control or failure by a decentralized technology Compared to Gnutella and its widespread use of broadcasts, Chord avoids the lack of scalability through a small number of important information for rounting
Addressed Difficult Problems (1) Load balance: distributed hash function, spreading keys evenly over nodes Decentralization: chord is fully distributed, no node more important than other, improves robustness Scalability: logarithmic growth of lookup costs with number of nodes in network, even very large systems are feasible
Addressed Difficult Problems (2) Availability: chord automatically adjusts its internal tables to ensure that the node responsible for a key can always be found
Consistent Hashing Hash function assigns each node and key an m-bit identifier using a base hash function such as SHA-1 ID(node) = hash(IP, Port) ID(key) = hash(key) Properties of consistent hashing: Function balances load: all nodes receive roughly the same number of keys When an Nth node joins (or leaves) the network, only an O(1/N) fraction of the keys are moved to a different location
identifier node 6 X key 0 1 7 6 2 5 3 4 2 Successor Nodes 1 successor(1) = 1 identifier circle successor(6) = 0 6 2 successor(2) = 3
Node Joins and Departures 0 1 7 6 2 5 3 4 6 1 6 successor(6) = 7 successor(1) = 3 1 2
Scalable Key Location A very small amount of routing information suffices to implement consistent hashing in a distributed environment Each node need only be aware of its successor node on the circle Queries for a given identifier can be passed around the circle via these successor pointers Resolution scheme correct, BUT inefficient: it may require traversing all N nodes!
Acceleration of Lookups Lookups are accelerated by maintaining additional routing information Each node maintains a routing table with (at most) m entries (where N=2m) called the finger table ithentry in the table at node n contains the identity of the first node, s, that succeeds n by at least 2i-1 on the identifier circle (clarification on next slide) s = successor(n + 2i-1) (all arithmetic mod 2) s is called the ith finger of node n, denoted by n.finger(i).node
Finger Tables (1) finger table keys start int. succ. 6 finger table keys 0 start int. succ. 1 1 2 3 5 [2,3) [3,5) [5,1) 3 3 0 7 6 2 finger table keys 5 3 start int. succ. 2 4 5 7 [4,5) [5,7) [7,3) 0 0 0 4 1 2 4 [1,2) [2,4) [4,0) 1 3 0
Finger Tables (2) - characteristics Each node stores information about only a small number of other nodes, and knows more about nodes closely following it than about nodes fartheraway A node’s finger table generally does not contain enough information to determine the successor of an arbitrary key k Repetitive queries to nodes that immediately precede the given key will lead to the key’s successor eventually
Node Joins – with Finger Tables finger table keys 0 start int. succ. 1 1 2 3 5 [2,3) [3,5) [5,1) 3 3 0 7 finger table keys 6 2 start int. succ. 7 0 2 [7,0) [0,2) [2,6) 0 0 3 5 3 4 finger table keys start int. succ. 6 1 2 4 [1,2) [2,4) [4,0) 1 3 0 6 6 finger table keys start int. succ. 2 4 5 7 [4,5) [5,7) [7,3) 0 0 0 6 6
Node Departures – with Finger Tables 0 1 7 6 2 5 3 4 finger table keys start int. succ. 1 2 4 [1,2) [2,4) [4,0) 1 3 0 3 6 finger table keys start int. succ. 1 2 3 5 [2,3) [3,5) [5,1) 3 3 0 6 finger table keys start int. succ. 6 7 0 2 [7,0) [0,2) [2,6) 0 0 3 finger table keys start int. succ. 2 4 5 7 [4,5) [5,7) [7,3) 6 6 0 0
Chord “Finger Table” 1/2 1/4 1/8 1/16 1/32 1/64 1/128 N80 • Entry i in the finger table of node n is the first node that succeeds or equals n + 2i • In other words, the ith finger points 1/2n-i way around the ring
Chord Routing Succ. Table • Upon receiving a query for item id, a node: • Checks whether stores the item locally • If not, forwards the query to the largest node in its successor table that does not exceed id Items 7 i id+2i succ 0 1 1 1 2 2 2 4 0 0 Succ. Table Items 1 1 i id+2i succ 0 2 2 1 3 6 2 5 6 7 query(7) 6 2 Succ. Table i id+2i succ 0 7 0 1 0 0 2 2 2 Succ. Table i id+2i succ 0 3 6 1 4 6 2 6 6 5 3 4
Node Join • Compute ID • Use an existing node to route to that ID in the ring. • Finds s = successor(id) • ask s for its predecessor, p • Splice self into ring just like a linked list • p->successor = me • me->successor = s • me->predecessor = p • s->predecessor = me
Chord Summary • Routing table size? • Log N fingers • Routing time? • Each hop expects to 1/2 the distance to the desired id => expect O(log N) hops.
Fairness • How about somebody only download not upload. • What is the policy • Incentive mechanism Distributed Operating Systems
Fetching Data • Once we know which node(s) have the data we want... • Option 1: Fetch from a single peer • Problem: Have to fetch from peer who has whole file. • Peers not useful sources until d/l whole file • At which point they probably log off. :) • How can we fix this?
Chunk Fetching • More than one node may have the file. • How to tell? • Must be able to distinguish identical files • Not necessarily same filename • Same filename not necessarily same file... • Use hash of file • Common: MD5, SHA-1, etc. • How to fetch? • Get bytes [0..8000] from A, [8001...16000] from B • Alternative: Erasure Codes Distributed Operating Systems
BitTorrent • Written by Bram Cohen (in Python) in 2001 • “Pull-based” “swarming” approach • Each file split into smaller pieces • Nodes request desired pieces from neighbors • As opposed to parents pushing data that they receive • Pieces not downloaded in sequential order • Encourages contribution by all nodes
BitTorrent • Piece Selection • Rarest first • Random first selection • Peer Selection • Tit-for-tat • Optimistic un-choking
BitTorrent Swarm • Swarm • Set of peers all downloading the same file • Organized as a random mesh • Each node knows list of pieces downloaded by neighbors • Node requests pieces it does not own from neighbors
How a node enters a swarm for file “popeye.mp4” • File popeye.mp4.torrent hosted at a (well-known) webserver • The .torrent has address of tracker for file • The tracker, which runs on a webserver as well, keeps track of all peers downloading file
How a node enters a swarm for file “popeye.mp4” www.bittorrent.com • File popeye.mp4.torrent hosted at a (well-known) webserver • The .torrent has address of tracker for file • The tracker, which runs on a webserver as well, keeps track of all peers downloading file 1 Peer popeye.mp4.torrent
How a node enters a swarm for file “popeye.mp4” www.bittorrent.com • File popeye.mp4.torrent hosted at a (well-known) webserver • The .torrent has address of tracker for file • The tracker, which runs on a webserver as well, keeps track of all peers downloading file 2 Peer Addresses of peers Tracker
How a node enters a swarm for file “popeye.mp4” www.bittorrent.com • File popeye.mp4.torrent hosted at a (well-known) webserver • The .torrent has address of tracker for file • The tracker, which runs on a webserver as well, keeps track of all peers downloading file Peer 3 Tracker Swarm
Contents of .torrent file • URL of tracker • Piece length – Usually 256 KB • SHA-1 hashes of each piece in file • For reliability • “files” – allows download of multiple files
Terminology • Seed: peer with the entire file • Original Seed: The first seed • Leech: peer that’s downloading the file • Fairer term might have been “downloader”