180 likes | 545 Views
Distributed Hash Tables. Abdo Achkar 11-22-05,Villanova University. 1. Overview. Intro to Hash tables Distributed Hash tables IDA encoding Chord protocol DHash API. 2. Hash tables. Definition: Array of pointers to linked lists Has a hash function. 3. Hash Tables, The data structure.
E N D
Distributed Hash Tables Abdo Achkar 11-22-05,Villanova University 1
Overview • Intro to Hash tables • Distributed Hash tables • IDA encoding • Chord protocol • DHash API 2
Hash tables • Definition: • Array of pointers to linked lists • Has a hash function 3
Hash Tables,The data structure • Array of pointers to linked lists of a type T where T is the type of the data structure that contains both the key and data. * * * * * * * Key Data * Key Data * Key Data * Key Data * Key Data T=typeof(<Key,Data>) 4
Hash TablesThe hash function • Takes some data as input, and returns an integer based on the data. • Ex: • int hash(char* data) { int sum = 0; for (int i=0;i<strlen(data);i++) sum = (sum + data[i]) % _tableSize; return sum; } 5
Benefits of Hash tables • Seek time of O(1) • Easy to implement (c++ source) • Improves the performance drastically when working with files. 6
Distributed Hash Tables • Definition: A hash table that is handled by many nodes in a network. Node 0 Node 1 Keys fragment of data 7
Why is DHash important? • Load Balance • Decentralization • Scalability • Availability 8
IDA algorithm • Splits a block of data into f fragments of size s/k. • k distinct fragments are sufficient to reconstruct the original block. f fragments 9
Choosing values for k and f • k and f are selected to optimize for 8192-byte blocks. • k=7 creates 1170-bytes fragments that can fit inside a single IP packet when combined with RPC overhead • Having k=7, we can have f=14 and still be able to reconstruct a block 10
Chord protocol • Implements hash-like look-up operation that maps 160-bit data keys to hosts. • Assigns hosts identifiers from the same 160-bits space as the keys. • The space can be viewed as a sorted by identifier circular linked list. 11
Chord (cont’) • Each node knows the identity of its successor (IP, Chord identifier and synthetic coordinates) • Updates successor list when a node • Joins • Exists 12
Chord API 13
HTab API 14
Block Insert: put(Key k, Block b) • Void put(k,b) // place one fragment on each successor{frags[] = IDAencode(b);succs = lookup(k, 14);for i from 0 to 13 send(succs[i].ipaddr,k,frags[i]);} 15
Block get (k) • Block get (k) {// collect fragments from the successorsfrags = []; succs = lookup(k,7); //lookup at least 7 successorssort_by_latency(succs);for (i=0;i< succs# && I < 14;i++) { // download fragment <ret,data> = download(key,succ[i]) if (ret == OK) frags.push(data); // decode fragments to recover block <ret,block> = IDAdecode(frags); if (ret == OK) return (SHA-1(block) != k) ? FAILURE : block; if (i == #succs -1) { newsuccs = get_successor_list(succs[i]); sort_by_latency(newsuccs); succs.append(newsuccs) }}return FAILURE;} 16
Questions? 17
References • C++ In Action (Bartosz Milewski) • Robust and Efficient Data Management for a Distributed Hash Table by Josh Cates (Ms Thesis, MIT) • Chort: A scalable Peer-to-peer Lookup Service for Internet Applications (Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan, MIT) • Building Peer-to-Peer Systems With Chord, a Distributed Lookup Service (Frank Dabek, Emma Brunskill, M. Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balalkrishnan) • Distributed Hash Tables: Architecture and Implementationhttp://www.usenix.org/events/osdi2000/full_papers/gribble/gribble_html/node4.html 18