140 likes | 264 Views
CSE6809-Distributed Search Techniques. Lecture-2 DHT. Nov 24, 2007. What is a DHT?. Hash Table data structure that maps “keys” to “values” Interface put(key, value) get(key) Distributed Hash Table (DHT) similar, but spread across the Internet challenge: locate content.
E N D
CSE6809-Distributed Search Techniques Lecture-2 DHT Nov 24, 2007
What is a DHT? • Hash Table • data structure that maps “keys” to “values” • Interface • put(key, value) • get(key) • Distributed Hash Table (DHT) • similar, but spread across the Internet • challenge: locate content
What is a DHT? (cont.) • Single-node hash table: Key = hash (data) put(key, value) get(key)->value • Distributed Hash Table (DHT): Key = hash (data) Lookup (key) -> node-IP@ Put (node-IP@, PUT, key, value) Get(node-IP@, GET, key) -> value • Idea: • Assign particular nodes to hold particular content (or reference to content) • Every node supports a routing function (given a key,route messages to node holding key)
What is a DHT? (cont.) Distributed application value put(key, value) get (key) Distributed hash table lookup(key) node IP address Lookup service …. node node node
(K1,V1) K V K V K V K V K V K V K V K V K V K V K V K V DHT in action put(K1,V1) get (K1)
K V K V K V K V K V K V K V K V K V K V K V Iterative vs. Recursive Routing
Peers vs Infrastructure • Peer: • Application users provide nodes for DHT • Examples: file sharing, etc • Infrastructure: • Set of managed nodes provide DHT service • Perhaps serve many applications
DHT Design Goals • An “overlay” network with: • Decentralization and self-organization, i.e. no central authority, local routing decisions • Flexibilityin mapping keys to physical nodes and routing • Robustness to joining/leaving • Scalability, i.e. low communication overhead • Efficiency, i.e. low latency • A consistent “storage” mechanism with • No guarantees on persistence • Maintenance via soft state
0 1 .0011 .1010 .1100 .000 .010 .1101 .1011 .011 .0010 .111 Internet .100 The Partitioning Problem SOLUTION:ID Selection Scheme
Lookup Problem Internet SOLUTION:Overlay Routing Network
Partition Problem Lookup Problem Overlay Routing Network ID Selection Scheme P2P-related challenges: Dynamism Scale Design Goals: Equi-sized partitions Replicas for fault tolerance Low add/delete cost Design Goals: Small no of connections Low lookup latency IP-layer network proximity Routing load balance Resilience to network-partitions Low add/delete cost The Big Picture
DHT Applications • global file systems • OceanStore, CFS, PAST, Pastiche, UsenetDHT • naming services • Chord-DNS, Twine, SFR • DB query processing • PIER, Wisc • Internet-scale data structures • PHT, Cone, SkipGraphs • communication services • i3, MCAN, Bayeux • event notification • Scribe, Herald • File sharing • OverNet
Systems We Will Study • Basic DHT techniques • Chord • CAN • Pastry/Tapestry • Kademlia • SkipGraph/SkipNet • DHT-extensions • Squid • pSearch • Twine • i3
Distributed Search Requirements • Decentralization • Efficiency • Scalability • Flexibility • Completeness • Fault-resilience • Load balancing • Others • Autonomy • Anonymity • Ranking of results