File Sharing : Hash/Lookup

File Sharing : Hash/Lookup YossiShasho (HW in last slide) Based on Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Partially based on The Impact of DHT Routing Geometry on Resilience and Proximity Partially based on Building a Low-latency, Proximity-aware DHT-Based P2P Network http://www.computer.org/portal/web/csdl/doi/10.1109/KSE.2009.49 Some slides liberally borrowed from: Carnegie Melon Peer-2-Peer 15-411 PetarMaymounkov and David Mazières’ Kademlia Talk, New York University

Peer-2-Peer • Distributed systems without any centralized control or hierarchical organization. • Long list of applications: • Redundant storage • Permanence • Selection of nearby servers • Anonymity, search, authentication, hierarchical naming and more • Core operation in most p2p systems is efficient location of data items

Outline

Think Big • /home/google/ • One namespace, thousands of servers • Map each key (=filename) to a value (=server) • Hash table? Think again • What if a new server joins? server fails? • How to keep track of all servers? • What about redundancy? And proximity? • Not scalable, Centralized, Fault intolerant • Lots of new problems to come up…

DHT: Overview • Abstraction: a distributed “hash-table” (DHT) data structure: • put(id, item); • item = get(id); • Scalable, Decentralized, Fault Tolerant • Implementation: nodes in system form a distributed data structure • Can be Ring, Tree, Hypercube, Skip List, Butterfly Network, ...

DHT: Overview (2) • Many DHTs:

DHT: Overview (3) • Good properties: • Distributed construction/maintenance • Load-balanced with uniform identifiers • O(log n) hops / neighbors per node • Provides underlying network proximity

Consistent Hashing • When adding rows (servers) to hash-table, we don’t want all keys to change their mappings • When adding the Nth row, we want ~1/N of the keys to change their mappings. • Is this achievable? Yes.

Chord: Overview • Just one operation: item = get(id) • Each node needs routing info about few other nodes • O(logN) for lookup, O(log2N) for join/leave • Simple, provable correctness, provable performance • Apps built on top of it do the rest

Chord: Geometry • Identifier space [1,N], example: binary strings • Keys (filenames) and values (server IPs) on the same identifier space • Keys & values evenly distributed • Now, put this identifier space on a circle • Consistent Hashing: A key is stored at its successor.

Chord: Geometry (2) • A key is stored at its successor: node with next higher ID Node 105 Key 5 K5 N105 • Get(5)=32 • Get(20)=32 • Get(80)=90 • Who maps to 105? Nobody. K20 Circular ID space N32 N90 K80

Chord: Back to Consistent Hashing • “When adding the Nth row, we want ~1/N of the keys to change their mappings.” (The problem, a few slides back) Node 105 Key 5 K5 N15 N105 • Get(5)=3215 • Get(20)=32 • Get(80)=90 • Who maps to 105? Nobody. • Get(5)=32 • Get(20)=32 • Get(80)=90 • Who maps to 105? Nobody. K20 Circular ID space N32 N90 N50 K80

Chord: Basic Lookup • get(k): • If (I have k) • Return “ME” • Else • P next node • Return P.get(k) N120 N10 “Where is key 80?” N105 N32 “N90 has K80” • Each node remembers only next node • O(N) lookup time – no good! N90 K80 K80 N60

Chord: “Finger Table” • Previous lookup was O(N). We want O(logN) 1/2 1/4 1/8 Finger Table 1/16 1/32 i id+2i succ 0 80+20 = 81 __ 1 82+21 = 82 __2 84+22 = 84 __ 1/64 1/128 N80 • Entry i in the finger table of node n is the first node n’such that n’ ≥ n + 2i • In other words, the ith finger of n points 1/2n-i way around the ring

Chord: “Finger Table” Lookups • get(k): • If (I have k) • Return “ME” • Else • P next nodeClosest finger i ≤ k • Return P.get(k) 1/2 1/4 1/8 Finger Table 1/16 1/32 i id+2i succ 0 80+20 = 81 __ 1 82+21 = 82 __2 84+22 = 84 __ 1/64 1/128 N80 • Entry i in the finger table of node n is the first node n’such that n’ ≥ n + 2i • In other words, the ith finger of n points 1/2n-i way around the ring

Chord: “Finger Table” Lookups • get(k): • If (I have k) • Return “ME” • Else • P  Closest finger i ≤ k • Return P.get(k) N2 N9 N90 Finger Table i id+2i succ 0 20 N31 1 21 N31 4 35 N49 N81 N19 N74 Finger Table i id+2i succ 0 65+20 = 66 N74 1 65+21 = 67 N746 65+26 = 29 N19 “40!” N31 N49 N65 K40 “Where is key 40?” K40

Chord: Example • Assume an identifier space [0..8] • Node n1 joins • Responsible for all keys • (Succ == successor) Succ. Table 0 i id+2i succ 0 1+20 = 2 1 1 1+21 = 3 1 2 1+22 = 5 1 1 7 6 2 5 3 4

Chord: Example • Node n2 joins Succ. Table 0 i id+2i succ 0 2 12 1 3 1 2 5 1 1 7 6 2 Succ. Table i id+2i succ 0 3 1 1 4 1 2 6 1 5 3 4

Chord: Example • Node n0, n6 join Succ. Table i id+2i succ 0 1 1 1 2 2 2 4 0 Succ. Table 0 i id+2i succ 0 2 12 1 3 16 2 5 16 1 7 Succ. Table i id+2i succ 0 7 0 1 0 0 2 2 2 6 2 Succ. Table i id+2i succ 0 3 16 1 4 16 2 6 16 5 3 4

Chord: Example • Nodes: n1, n2, n0, n6 • Items: 1,7 Succ. Table Items 7 i id+2i succ 0 1 1 1 2 2 2 4 0 0 Succ. Table Items 1 1 i id+2i succ 0 2 12 1 3 16 2 5 16 7 6 2 Succ. Table i id+2i succ 0 7 0 1 0 0 2 2 2 Succ. Table i id+2i succ 0 3 16 1 4 16 2 6 16 5 3 4

Chord: Routing Upon receiving a query for item id, a node: Checks if it stores the item locally If not, forwards query tolargest node i in its fingertable such that i ≤ id Succ. Table Items i id+2i succ 0 1 1 1 2 2 2 4 0 7 0 Succ. Table Items 1 i id+2i succ 0 2 2 1 3 6 2 5 6 7 1 query(7) 6 2 Succ. Table i id+2i succ 0 7 0 1 0 0 2 2 2 Succ. Table i id+2i succ 0 3 6 1 4 6 2 6 6 5 3 4

Chord: Node Join Node njoins: Need one existing node - n', in hand • Initialize fingers of n • Ask n' to look them up (logN fingers to init) • Update fingers of the rest • Fewnodes need to be updated • Look them up and tell them n is new in town • Transfer keys

Chord: Improvements • Every 30s, ask successor for its predecessor • Fix your own successor based on this • Also, pick and verify a random finger • Rebuild finger table entries this way • keep successor list of r successors • Deal with unexpected node failures • Can use these to replicate data

Chord: Performance • Routing table size? • Log N fingers • Routing time? • Each hop expects to half the distance to the desired id => expect O(log N) hops. • Node joins • Query for the fingers => O(log N) • Update other nodes’ fingers => O(log2N)

Chord: Performance (2) • Real time: Lookup time / #nodes

Chord: Performance (3) • Comparing to other DHTs

Chord: Performance (4) • Promises few O(logN) hops on the overlay • But, on the physical network, this can be quite far f A Chord network with N(=8) nodes and m(=8)-bit key space

Applications employing DHTs • eMule(KAD implements Kademlia - a DHT) • A anonymous network (≥ 2 mil downloads to day) • BitTorrent (≥ 4.1.2 beta) • TrackerlessBitTorrent, allows anonymity(thank god) • Clients A & B handshake • A: “I have DHT, its on port X” • B: ping port X of A • B gets a reply => start adjusting - nodes, rows…

Kademlia (KAD) • Distance between A and B is A XOR B • Nodes are treated as leafs in binary tree • Node’s position in A’s tree is determined by the longest postfix it shares with A • A’s ID: 010010101 • B’s ID: 101000101

Space of 160-bit numbers 11…11 00…00 1 0 0 1 1 0 1 0 0 1 0 1 0 1 1 0 1 0 1 0 0 common prefix: 00 1 0 1 0 1 0 1 0 1 0 1 0 common prefix: 0 common prefix: 001 No common prefix Node / Peer Our node Kademlia: Postfix Tree • Node’s position in A’s tree is determined by the longest postfix it shares with A (=> logN subtrees)

11…11 00…00 1 0 0 1 1 0 1 0 0 1 0 1 0 1 1 0 1 0 1 0 0 1 0 1 0 1 0 1 0 1 0 1 0 Node / Peer Our node Kademlia: Lookup • Consider a query for ID 111010… initiated by node 0011100…

11…11 00…00 1 0 0 1 1 0 1 0 0 1 0 1 0 1 1 0 1 0 1 0 0 1 0 1 0 ` 1 0 1 0 1 0 1 0 Node / Peer Our node Its binary tree is divided into a series of subtrees The routing table is composed of a k-bucket s corresponding to each of these subtrees Consider a 2-bucket example, each bucket will have atleast 2 contacts for its key range Kademlia: K-Buckets Consider routing table for a node with prefix 0011 A contact consist of <IP:Port, NodeID>

1. The Problem • 3. Chord: a DHT scheme • Geometry • Lookup • Node Joins • Performance Summary 2. Distributed hash tables (DHT) 4. Extras

Homework • Load balance is achieved when all Servers in the Chord network are responsible for (roughly) the same amount of keys • Still, with some probability, one server can be responsible for significantly more keys • How can we lower the upper bound to the number of keys assigned to a server? • Hint: Simulation

File Sharing : Hash/Lookup