330 likes | 479 Views
Peer To Peer Distributed Systems. Pete Keleher. Why Distributed Systems?. Aggregate resources! memory disk CPU cycles Proximity to physical stuff things with sensors things that print things that go boom other people Fault tolerance! Don’t want one tsunami to take everything down.
E N D
Peer To PeerDistributed Systems Pete Keleher
Why Distributed Systems? • Aggregate resources! • memory • disk • CPU cycles • Proximity to physical stuff • things with sensors • things that print • things that go boom • other people • Fault tolerance! • Don’t want one tsunami to take everything down
Why Peer To Peer Systems? • What’s peer to peer?
(Traditional) Client-Server Server Clients
Peer To Peer • Lots of reasonable machines • No one machine loaded more than others • No one machine irreplacable!
Peer-to-Peer (P2P) • Where do the machines come from? • “found” resources • SETI @ home • BOINC • existing resources • computing “clusters” (32, 64, ….) • What good is a peer to peer system? • all those things mentioned before, including Storage: files, MP3’s, leaked documents, porn …
The lookup problem N2 N1 N3 Key=“title” Value=MP3 data… Internet ? Client Publisher Lookup(“title”) N4 N6 N5
Centralized lookup (Napster) N2 N1 SetLoc(“title”, N4) N3 Client DB N4 Publisher@ Lookup(“title”) Key=“title” Value=MP3 data… N8 N9 N7 N6 Simple, but O(N) states and a single point of failure
Flooded queries (Gnutella) N2 N1 Lookup(“title”) N3 Client N4 Publisher@ Key=“title” Value=MP3 data… N6 N8 N7 N9 Robust, but worst case O(N) messages per lookup
Routed queries (Freenet, Chord, etc.) N2 N1 N3 Client N4 Lookup(“title”) Publisher Key=“title” Value=MP3 data… N6 N8 N7 N9 Bad load balance.
Routing challenges • Define a useful key nearness metric. • Keep the hop count small. • O(log N) • Keep the routing tables small. • O(log N) • Stay robust despite rapid changes.
Distributed Hash Tables to the Rescue! • Load Balance: Distributed hash function spreads keys evenly over the nodes (Consistent hashing). • Decentralization: Fully distributed (Robustness). • Scalability: Lookup grows as a log of number of nodes. • Availability: Automatically adjusts internal tables to reflect changes. • Flexible Naming: No constraints on key structure.
What’s a Hash? • Wikipedia: any well-defined procedure or mathematical function that converts a large, possibly variable-sized amount of data into a small datum, usually a single integer • Example: Assume: N is a large prime ‘a’ means the ASCII code for the letter ‘a’ (it’s 97) H(“pete”) = • H(“pet”) x N + ‘e’ H(“pete”) mod 1000 = 507 H(“peter”) mod 1000 = 131 H(“petf”) mod 1000 = 986 = (H(“pe”) x N + ‘t’) x N + ‘e’ = (H(“pe”) x N + ‘t’) x N + ‘e’ = 451845518507 It’s a deterministic random number generator!
Chord (a DHT) • m-bit identifier space for both keys and nodes. • Key identifier = SHA-1(key). • Node identifier = SHA-1(IP address). • Both are uniformly distributed. • How to map key IDs to node IDs?
Consistent hashing [Karger 97] Key 5 K5 Node 105 N105 K20 Circular 7-bit ID space N32 N90 K80 A key is stored at its successor: node with next higher ID
Basic lookup N120 N10 “Where is key 80?” N105 N32 “N90 has K80” N90 K80 N60
Basic lookup N120 N10 “Where is key 80?” N105 N32 “N90 has K80” N90 K80 N60
Basic lookup N120 N10 “Where is key 80?” N105 N32 “N90 has K80” N90 K80 N60
Basic lookup N120 N10 “Where is key 80?” N105 N32 “N90 has K80” N90 K80 N60
Basic lookup N120 N10 “Where is key 80?” N105 N32 “N90 has K80” N90 K80 N60
“Finger table” allows log(N)-time lookups ½ ¼ 1/8 1/16 1/32 1/64 1/128 N80 Every node knows m other nodes in the ring
Finger i points to successor of n+2i-1 N120 112 ½ ¼ 1/8 1/16 1/32 1/64 1/128 N80 Each node knows more about portion of circle close to it
Lookups take O(log(N)) hops N5 N10 N110 K19 N20 N99 N32 Lookup(K19) N80 N60
Lookups take O(log(N)) hops N5 N10 N110 K19 N20 N99 N32 Lookup(K19) N80 N60
Lookups take O(log(N)) hops N5 N10 N110 K19 N20 N99 N32 Lookup(K19) N80 N60
Lookups take O(log(N)) hops N5 N10 N110 K19 N20 N99 N32 Lookup(K19) N80 N60
Lookups take O(log(N)) hops N5 N10 N110 K19 N20 N99 N32 Lookup(K19) N80 N60
Joining: linked list insert N25 N36 1. Lookup(36) K30 K38 N40 1. Each node’s successor is correctly maintained. 2. For every key k, node successor(k) is responsible for k.
Join (2) N25 2. N36 sets its own successor pointer N36 K30 K38 N40 Initialize the new node finger table
Join (3) N25 3. Set N25’s successor pointer N36 K30 K38 N40 Update finger pointers of existing nodes
Join (4) N25 4. Copy keys 26..36 from N40 to N36 N36 K30 N40 K38 Transferring keys
Stabilization Protocol • To handle concurrent node joins/fails/leaves. • Keep successor pointers up to date, then verify and correct finger table entries. • Incorrect finger pointers may only increase latency, but incorrect successor pointers may cause lookup failure. • Nodes periodically run stabilization protocol. • Won’t correct a Chord system that has split into multiple disjoint cycles, or a single cycle that loops multiple times around the identifier space.
Take Home Points • Hash used to uniformly distribute data, nodes across a range. • Random distribution balances load. • Awesome systems paper: • identify commonality across algorithms • restrict work to implementing that one simple abstraction • use as building block