Peer-to-Peer Systems

Peer-to-Peer Systems Presented By: NazaninDehghani Supervisor: Dr. NaserYazdani

Peer-to-Peer Architecture more dynamic structure while having a distributed system. every client should bound statically to a specific server

Peer-to-Peer Definition • “a computer network in which each computer in the network can act as a client or server for the other computers in the network, allowing shared access to files and peripherals without the need for a central server.”

Peer-to-Peer Applications

Peer-to-Peer Systems • Properties • Nodes have to share their resources such as memory, band-width and processing power directly • P2P networks should be robust to node churn

Primitives • Common Primitives • Join: how to I begin participating? • Publish: how do I advertise my file? • Search: how to I find a file? • Fetch: how to I retrieve a file?

Architecture of P2P Systems • Overlay Network • Graph Structure • Structured • Aware of topology of overlay network • Unstructured

How Did it Start? • A killer application: Naptser • Free music over the Internet • Key idea: share the content, storage and bandwidth of individual (home) users Internet

Model • Each user stores a subset of files • Each user has access (can download) files from all users in the system

Main Challenge • Find where a particular file is stored E F D E? C A B

Other Challenges • Scale: up to hundred of thousands or millions of machines • Dynamicity: machines can come and go any time

Napster • Assume a centralized index system that maps files (songs) to machines that are alive • How to find a file (song) • Query the index system  return a machine that stores the required file • Ideally this is the closest/least-loaded machine • ftp the file • Advantages: • Simplicity, easy to implement sophisticated search engines on top of the index system • Disadvantages: • Robustness, scalability (?)

E? E E? m5 Napster: Example m5 E m6 F D m1 A m2 B m3 C m4 D m5 E m6 F m4 C A B m3 m1 m2

Gnutella • Distribute file location • Idea: flood the request • Hot to find a file: • Send request to all neighbors • Neighbors recursively multicast the request • Eventually a machine that has the file receives the request, and it sends back the answer • Advantages: • Totally decentralized, highly robust • Disadvantages: • Not scalable; the entire network can be swamped with request (to alleviate this problem, each request has a TTL)

xyz Gnutella • Queries are flooded for bounded number of hops • No guarantees on recall xyz Query: “xyz”

Distributed Hash Tables (DHTs) • Abstraction: a distributed hash-table data structure • insert(id, item); • item = query(id); (or lookup(id);) • Note: item can be anything: a data object, document, file, pointer to a file… • Proposals • CAN, Chord, Kademlia, Pastry, Tapestry, etc

DHT Design Goals • Make sure that an item (file) identified is always found • Scales to hundreds of thousands of nodes • Handles rapid arrival and failure of nodes

K I K I K I K I K I K I K I K I K I (K1,I1) I1 put(K1,I1) get (K1) Structured Networks • Distributed Hash Tables (DHTs) • Hash table interface: put(key,item), get(key) • O(log n) hops • Guarantees on recall

Chord

Chord In short: a peer-to-peer lookup service Solves problem of locating a data item in a collection of distributed nodes, considering frequent node arrivals and departures Core operation in most p2p systems is efficient location of data items Supports just one operation: given a key, it maps the key onto a node

Chord Characteristics Simplicity, provable correctness, and provable performance Each Chord node needs routing information about only a few other nodes Resolves lookups via messages to other nodes (iteratively or recursively) Maintains routing information as nodes join and leave the system

Napster, Gnutella etc. vs. Chord Compared to Napster and its centralized servers, Chord avoids single points of control or failure by a decentralized technology Compared to Gnutella and its widespread use of broadcasts, Chord avoids the lack of scalability through a small number of important information for rounting

Addressed Difficult Problems (1) Load balance: distributed hash function, spreading keys evenly over nodes Decentralization: chord is fully distributed, no node more important than other, improves robustness Scalability: logarithmic growth of lookup costs with number of nodes in network, even very large systems are feasible

Addressed Difficult Problems (2) Availability: chord automatically adjusts its internal tables to ensure that the node responsible for a key can always be found

Consistent Hashing Hash function assigns each node and key an m-bit identifier using a base hash function such as SHA-1 ID(node) = hash(IP, Port) ID(key) = hash(key) Properties of consistent hashing: Function balances load: all nodes receive roughly the same number of keys When an Nth node joins (or leaves) the network, only an O(1/N) fraction of the keys are moved to a different location

identifier node 6 X key 0 1 7 6 2 5 3 4 2 Successor Nodes 1 successor(1) = 1 identifier circle successor(6) = 0 6 2 successor(2) = 3

Node Joins and Departures 0 1 7 6 2 5 3 4 6 1 6 successor(6) = 7 successor(1) = 3 1 2

Scalable Key Location A very small amount of routing information suffices to implement consistent hashing in a distributed environment Each node need only be aware of its successor node on the circle Queries for a given identifier can be passed around the circle via these successor pointers Resolution scheme correct, BUT inefficient: it may require traversing all N nodes!

Acceleration of Lookups Lookups are accelerated by maintaining additional routing information Each node maintains a routing table with (at most) m entries (where N=2m) called the finger table ithentry in the table at node n contains the identity of the first node, s, that succeeds n by at least 2i-1 on the identifier circle (clarification on next slide) s = successor(n + 2i-1) (all arithmetic mod 2) s is called the ith finger of node n, denoted by n.finger(i).node

Finger Tables (1) finger table keys start int. succ. 6 finger table keys 0 start int. succ. 1 1 2 3 5 [2,3) [3,5) [5,1) 3 3 0 7 6 2 finger table keys 5 3 start int. succ. 2 4 5 7 [4,5) [5,7) [7,3) 0 0 0 4 1 2 4 [1,2) [2,4) [4,0) 1 3 0

Finger Tables (2) - characteristics Each node stores information about only a small number of other nodes, and knows more about nodes closely following it than about nodes fartheraway A node’s finger table generally does not contain enough information to determine the successor of an arbitrary key k Repetitive queries to nodes that immediately precede the given key will lead to the key’s successor eventually

Node Joins – with Finger Tables finger table keys 0 start int. succ. 1 1 2 3 5 [2,3) [3,5) [5,1) 3 3 0 7 finger table keys 6 2 start int. succ. 7 0 2 [7,0) [0,2) [2,6) 0 0 3 5 3 4 finger table keys start int. succ. 6 1 2 4 [1,2) [2,4) [4,0) 1 3 0 6 6 finger table keys start int. succ. 2 4 5 7 [4,5) [5,7) [7,3) 0 0 0 6 6

Node Departures – with Finger Tables 0 1 7 6 2 5 3 4 finger table keys start int. succ. 1 2 4 [1,2) [2,4) [4,0) 1 3 0 3 6 finger table keys start int. succ. 1 2 3 5 [2,3) [3,5) [5,1) 3 3 0 6 finger table keys start int. succ. 6 7 0 2 [7,0) [0,2) [2,6) 0 0 3 finger table keys start int. succ. 2 4 5 7 [4,5) [5,7) [7,3) 6 6 0 0

Chord “Finger Table” 1/2 1/4 1/8 1/16 1/32 1/64 1/128 N80 • Entry i in the finger table of node n is the first node that succeeds or equals n + 2i • In other words, the ith finger points 1/2n-i way around the ring

Chord Routing Succ. Table • Upon receiving a query for item id, a node: • Checks whether stores the item locally • If not, forwards the query to the largest node in its successor table that does not exceed id Items 7 i id+2i succ 0 1 1 1 2 2 2 4 0 0 Succ. Table Items 1 1 i id+2i succ 0 2 2 1 3 6 2 5 6 7 query(7) 6 2 Succ. Table i id+2i succ 0 7 0 1 0 0 2 2 2 Succ. Table i id+2i succ 0 3 6 1 4 6 2 6 6 5 3 4

Node Join • Compute ID • Use an existing node to route to that ID in the ring. • Finds s = successor(id) • ask s for its predecessor, p • Splice self into ring just like a linked list • p->successor = me • me->successor = s • me->predecessor = p • s->predecessor = me

Chord Summary • Routing table size? • Log N fingers • Routing time? • Each hop expects to 1/2 the distance to the desired id => expect O(log N) hops.

Bittorrent

Fairness • How about somebody only download not upload. • What is the policy • Incentive mechanism Distributed Operating Systems

Fetching Data • Once we know which node(s) have the data we want... • Option 1: Fetch from a single peer • Problem: Have to fetch from peer who has whole file. • Peers not useful sources until d/l whole file • At which point they probably log off. :) • How can we fix this?

Chunk Fetching • More than one node may have the file. • How to tell? • Must be able to distinguish identical files • Not necessarily same filename • Same filename not necessarily same file... • Use hash of file • Common: MD5, SHA-1, etc. • How to fetch? • Get bytes [0..8000] from A, [8001...16000] from B • Alternative: Erasure Codes Distributed Operating Systems

BitTorrent • Written by Bram Cohen (in Python) in 2001 • “Pull-based” “swarming” approach • Each file split into smaller pieces • Nodes request desired pieces from neighbors • As opposed to parents pushing data that they receive • Pieces not downloaded in sequential order • Encourages contribution by all nodes

BitTorrent • Piece Selection • Rarest first • Random first selection • Peer Selection • Tit-for-tat • Optimistic un-choking

BitTorrent Swarm • Swarm • Set of peers all downloading the same file • Organized as a random mesh • Each node knows list of pieces downloaded by neighbors • Node requests pieces it does not own from neighbors

How a node enters a swarm for file “popeye.mp4” • File popeye.mp4.torrent hosted at a (well-known) webserver • The .torrent has address of tracker for file • The tracker, which runs on a webserver as well, keeps track of all peers downloading file

How a node enters a swarm for file “popeye.mp4” www.bittorrent.com • File popeye.mp4.torrent hosted at a (well-known) webserver • The .torrent has address of tracker for file • The tracker, which runs on a webserver as well, keeps track of all peers downloading file 1 Peer popeye.mp4.torrent

How a node enters a swarm for file “popeye.mp4” www.bittorrent.com • File popeye.mp4.torrent hosted at a (well-known) webserver • The .torrent has address of tracker for file • The tracker, which runs on a webserver as well, keeps track of all peers downloading file 2 Peer Addresses of peers Tracker

How a node enters a swarm for file “popeye.mp4” www.bittorrent.com • File popeye.mp4.torrent hosted at a (well-known) webserver • The .torrent has address of tracker for file • The tracker, which runs on a webserver as well, keeps track of all peers downloading file Peer 3 Tracker Swarm

Contents of .torrent file • URL of tracker • Piece length – Usually 256 KB • SHA-1 hashes of each piece in file • For reliability • “files” – allows download of multiple files

Terminology • Seed: peer with the entire file • Original Seed: The first seed • Leech: peer that’s downloading the file • Fairer term might have been “downloader”

Peer-to-Peer Systems