500 likes | 746 Views
DHT* Applications. Jeffrey Pang CMU NetTalk, Dec. 5, 2003. * and DOLR. Brief Review of DHTs. Many DHTs: PRR Trees, Pastry, Tapestry Chord, Symphony CAN SkipNet, Kademlia, Koorde, Viceroy, etc., etc. Good Properties: Distributed construction/maintenance
E N D
DHT* Applications Jeffrey Pang CMU NetTalk, Dec. 5, 2003 * and DOLR
Brief Review of DHTs • Many DHTs: • PRR Trees, Pastry, Tapestry • Chord, Symphony • CAN • SkipNet, Kademlia, Koorde, Viceroy, etc., etc. • Good Properties: • Distributed construction/maintenance • Load-balanced with uniform identifiers • O(log n) hops / neighbors per node • Provides underlying network proximity Jeffrey Pang, Carnegie Mellon, NetTalk
Zone source 2 9 9 9 4 F A A E 7 7 1 2 6 0 B key Brief Review of DHTs Identifier Circle succ(x) 010111110 010110110 x pred(x) 010110000 Jeffrey Pang, Carnegie Mellon, NetTalk
Overview of Talk • Review of DHTs • DHT vs DOLR • Storage • Multicast • Database • Misc. • API and Infrastructure Proposals Jeffrey Pang, Carnegie Mellon, NetTalk
Overview of Talk • Review of DHTs • DHT vs DOLR • Storage • Multicast • Database • Misc. • API and Infrastructure Proposals Jeffrey Pang, Carnegie Mellon, NetTalk
DHT vs. DOLR • Distributed Hash Table Paradigm: • Location of objects determined by overlay • put(key, object) • get(key, object) • Distributed Object Location and Routing Paradigm: • Location of objects determined by application • Application publishes pointers in overlay • publish(key, id) • locate(key) Jeffrey Pang, Carnegie Mellon, NetTalk
DHT Paradigm obj key obj put(key, object) get(key, object) Jeffrey Pang, Carnegie Mellon, NetTalk
DOLR Paradigm - back pointer key key publish(key, id) locate(key) Of course, many apps use a little bit of both paradigms... Jeffrey Pang, Carnegie Mellon, NetTalk
Overview of Talk • Review of DHTs • DHT vs DOLR • Storage • Multicast • Database • Misc. • API and Infrastructure Proposals Jeffrey Pang, Carnegie Mellon, NetTalk
Storage Systems • Mnemosnye [Hand & Roscoe, IPTPS02] • stenographic storage • PAST [Rowstron & Druschel, SOSP01] • file-based storage substrate • CFS [Dabek, et al., SOSP01] • single writer cooperative storage • Ivy [Muthitacharoen, et al., OSDI02] • small group read/write storage • OceanStore [Kubiatowicz, et al., ASPLOS00, FAST03] • global-scale persistent storage Jeffrey Pang, Carnegie Mellon, NetTalk
Mnemosnye • Target: • Data that requires privacy and plausible deniability • Uses: • Tapestry as DHT • Basic idea: • Compute n hashes for a block: h0,h1 = H(h0), ..., hn-1 = H(hn-2) • Store the (encrypted) block at the addresses h0, ..., hn-1 (mod X = size of store). • Given h0 and key, try to lookup and decrypt each replica in turn (success if passes validity check) • In a p2p overlay, use part of the hash value as a node address, the other part as the block addr on that node • Importance: • Simple. Only uses the basic get/put operators. • ... but requires end nodes to obey block addresses Jeffrey Pang, Carnegie Mellon, NetTalk
PAST • Target: • Wide area heterogeneous storage (e.g., web) • Uses: • Pastry as DHT • Basic Idea: • Store a file at h = H(file); Lookup with h • Replicate file at leaf-set of root (l nearest nodes in id-space) • Cache file along lookup paths • Deal with heterogeneity using virtual nodes and replica diversion • Importance: • Graceful degradation under high utilization Jeffrey Pang, Carnegie Mellon, NetTalk
PAST Space Management Jeffrey Pang, Carnegie Mellon, NetTalk
PAST Caching Jeffrey Pang, Carnegie Mellon, NetTalk
CFS • Target: • Single writer, multiple readers (e.g., FTP) • Uses: • Chord as DHT • Basic Idea: • FS implemented on top of DHash layer • DHash replication, caching, load balancing same as PAST • Secure updates and deletion using signed root block and cryptographic hashes to identify directory and file blocks • Pre-fetch blocks of the same file/directory • Importance: • “Real-life” evaluation comparable to FTP Jeffrey Pang, Carnegie Mellon, NetTalk
CFS File System Structure D H(D) H(F) public key File Block F Directory Block signature H(B1) Root Block H(B2) B1 B2 Jeffrey Pang, Carnegie Mellon, NetTalk
CFS “Real-Life” Evaluation CFS Pair-wise TCP Jeffrey Pang, Carnegie Mellon, NetTalk
Ivy • Target: • Read/write storage for small groups (e.g., CVS) • Uses: • Chord as DHT • Basic Idea: • Implemented on top of DHash layer (identical to CFS) • Each FS has a view consisting of n logs, one per writer • Write operations go to personal log • Reads reconstruct data by reading all logs in view; occasionally snapshot FS to prevent long traversals • Consistency using version vectors (application resolvers for concurrent versions; e.g., created during partition) • Importance: • Another “real-life” evaluation, but disappointing • Practical model for read/write in a p2p environment Jeffrey Pang, Carnegie Mellon, NetTalk
Ivy Log Structure Log head Alice write create delete link Log head Bob delete ex-create View write Jeffrey Pang, Carnegie Mellon, NetTalk
Ivy Wide Area Performance Modified Andrew Benchmark CVS Jeffrey Pang, Carnegie Mellon, NetTalk
OceanStore • Target: • Global storage as a “utility” • Uses: • Tapestry as DOLR • Basic Idea: • Use Tapestry for (all) object and service location. • Writes go to an Inner-Ring, serialized using Byzantine Agreement • Writes create new versions of blocks, which are permanently dispersed into archive using erasure codes • Reads go to closest replica in a dissemination tree rooted at Inner-Ring • Importance: • Wide area Byzantine commit • Performance of strong crypto in critical path • Caching in a DOLR (only participating nodes involved) Jeffrey Pang, Carnegie Mellon, NetTalk
OceanStore Update Path Jeffrey Pang, Carnegie Mellon, NetTalk
OceanStore Object Model Jeffrey Pang, Carnegie Mellon, NetTalk
OceanStore Inner Ring Perf. Jeffrey Pang, Carnegie Mellon, NetTalk
OceanStore Read Perf. Streaming Reads from Replicas Archive Read Jeffrey Pang, Carnegie Mellon, NetTalk
Overview of Talk • Review of DHTs • DHT vs DOLR • Storage • Multicast • Database • Misc. • API and Infrastructure Proposals Jeffrey Pang, Carnegie Mellon, NetTalk
Multicast Applications • Bayeux [Zhuang, et al., NOSSDAV01] • Simple single tree per source on DOLR • Scribe [Rowstron, et al., NGC01, INFOCOMM03] • Simple single tree per source on DHT • SplitStream [Castro, et al., SOSP03] • Multiple disjoint trees per source • i3 [Stoica, et al. SIGCOMM02] • Internet Indirection Infrastructure (mobility, {multi,any}cast, service composition) Jeffrey Pang, Carnegie Mellon, NetTalk
Bayeux • Target: • Multimedia Streaming • Uses: • Tapestry as DOLR • Basic Idea: • Advertise session with fake file in Tapestry • Clients join by routing message to source id (after learning of it by lookup up the session) • All intermediate routers on path join tree • Support multiple roots by having multiple sources advertise a session (lookups converge to “closest”) • Take advantage of routing redundancy to provide best performance (shortest link) / tolerate faults (predict link reliability) • Importance: • Relatively simple (no “frills”) multicast on a DOLR Jeffrey Pang, Carnegie Mellon, NetTalk
Scribe • Target: • Event notification / pubsub systems (e.g., IM) • Uses: • Pastry or CAN as DHTs • Basic Idea: • Publications routed to root in Pastry • Recursively forwarded to all children in tree • Subscriptions cause all nodes on path to root to join tree • When your parent dies repair by routing to a new parent • More complex ways to load balance (e.g., make children into grandchildren) described in later JSAC article • Importance: • Another simple multicast on a DHT • Building block for more complex applications Jeffrey Pang, Carnegie Mellon, NetTalk
SplitStream • Target: • P2P streaming / bulk file transfer • Uses: • Pastry • Basic Idea: • Split content into k stripes • Construct k interior-node disjoint Scribe trees • Distribute one stripe per tree • Receivers choose number of stripes to receive (e.g., trade off quality for inbound capacity) • Limit out-degree of nodes with join-heuristics (later) • Importance: • All nodes share in forwarding of data (w.h.p.) • Nifty use of Pastry ids to construct forest (next slide) Jeffrey Pang, Carnegie Mellon, NetTalk
SplitStream Forest Construction • Notice that all interior nodes must have the same first digit in their node id • Pastry routing: first hop will match first digit • Source sends stripes to k different trees • Root trees at nodes with different first digits • If each digit is b bits, make k = 2b stripes • Each node will be interior node of at most one tree (the tree that matches their first digit) Jeffrey Pang, Carnegie Mellon, NetTalk
SplitStream Limiting Out-Degree • If too many children, kick one out • First, orphaned child tries “push-down” • Can I join a sibling? • And continue recursively on sibling’s children • Second, use the spare capacity group • Independent scribe multicast tree • Composed of nodes that have spare capacity • Orphan anycasts message to this group • Receiver of anycast starts DFS of spare capacity tree until it finds a node that has the desired stripe • Orphan joins that node • If in-degree = k, this never fails* Jeffrey Pang, Carnegie Mellon, NetTalk
SplitStream Overhead Forest Construction Control Message Overhead under High Churn Jeffrey Pang, Carnegie Mellon, NetTalk
i3 - Internet Indirection Infrastructure • Target: • Rendezvous-based communication (IP indirection) • Uses: • Chord • Basic Idea: • Receivers insert triggers (id, receiver_id) into DHT • Senders send to id, meet at triggers, which send to receivers • Supports: • Mobility: reinsert your trigger when you move • Multicast & anycast: use longest-prefix match on ids to build tree • Service composition: use stacks of triggers, which act like source routing in IP • Importance: • Very low level, best-effort service built on DHT Jeffrey Pang, Carnegie Mellon, NetTalk
Overview of Talk • Review of DHTs • DHT vs DOLR • Storage • Multicast • Database • Misc. • API and Infrastructure Proposals Jeffrey Pang, Carnegie Mellon, NetTalk
PIER [Huebsch, et al., VLDB03] • Target: • in situ distributed querying (e.g., network monitoring) • Uses: • CAN as DHT • Basic Idea: • Tables named by (namespace, resourceID); e.g., (application, primary_key) • Store tables in DHT keyed by this pair • Lookup tuples by routing to a table(s)’ key and having the end nodes do an lscan for you • Join NR and NS by creating a new namespace NQ in DHT and rehashing tuples to NQ which determines matches • Importance: • Another simple multicast on a DHT • Building block for more complex applications Jeffrey Pang, Carnegie Mellon, NetTalk
PIER Performance Jeffrey Pang, Carnegie Mellon, NetTalk
Overview of Talk • Review of DHTs • DHT vs DOLR • Storage • Multicast • Database • Misc. • API and Infrastructure Proposals Jeffrey Pang, Carnegie Mellon, NetTalk
Misc. Applications • POST [Mislove, et al., HotOS03] • Collaborative Applications • Approximate Object Location [Zhou, et al. Middleware03] • Collaborative Spam Filtering Jeffrey Pang, Carnegie Mellon, NetTalk
POST • Target: • Toolbox for collaborative apps (e.g., email, IM, etc.) • Uses: • Pastry as DHT • Basic Idea: • Use PAST as storage substrate • Use Scribe as notification system • Assume certificate authority for assigning user IDs, keys • Example: Email • Insert new mail into PAST (encrypted) • Notify recipient using Scribe (delegate if not online) • Importance: • Use second level systems as substrate for more complex applications (see also OceanStore: email, nfs, web cache) Jeffrey Pang, Carnegie Mellon, NetTalk
Approximate Object Location • Target: • Collaborative filtering (e.g., Spam detection) • Uses: • Tapestry as DOLR • Basic Idea: • Calculate checksums of all strings of length L in message. Select N of them deterministically (“feature” vector) • Two messages match if enough features match • To mark spam, insert my node into Tapestry keyed by each feature • To detect spam, lookup its features. Will get back a set of nodes that marked each feature as spam (“votes”). • Importance: • Scary, but looking more and more useful. E.G., recent DoS attacks on RBLs. • They have a plug-in for Outlook that works Jeffrey Pang, Carnegie Mellon, NetTalk
Overview of Talk • Review of DHTs • DHT vs DOLR • Storage • Multicast • Database • Misc. • API and Infrastructure Proposals Jeffrey Pang, Carnegie Mellon, NetTalk
API & Infrastructure Proposals • One Ring to Rule them All [Castro, et al. SIGOPS02] • Bootstrapping multiple overlays • Common P2P API [Dabek, et al. IPTPS02] • DHT/DOLR as a library • OpenHash [Karp, et al. IPTPS04*] • DHT as a service *submitted Jeffrey Pang, Carnegie Mellon, NetTalk
One Ring to Rule them All • Goal: • Bootstrap multiple overlays • Basic Idea: • Everyone joins a “universal” Pastry ring • This ring implements PAST, Scribe, and distributed search (see Harren, et al., IPTPS02) • Advertise your overlay service in the search engine • Store your code and certificates in PAST • Upgrades disseminated through Scribe • Importance: • How one might use an overlay to manage overlays • Interesting title for a Microsoft paper :) Jeffrey Pang, Carnegie Mellon, NetTalk
Common P2P API • Goal: • Common API for structured overlays • Basic Idea: • First, described a common layer that both DHT and DOLR could be implemented on • Second, looked at applications developed so far • See what abstractions can be derived • Described what DHT “library” functions might be • Importance: • How much has to be exposed to application developers? • Any DHT App can be implemented on any DHT Jeffrey Pang, Carnegie Mellon, NetTalk
Common P2P API Classification Jeffrey Pang, Carnegie Mellon, NetTalk
Common P2P API: API • void route(key,msg,nodeHint) • void forward(key,msg,nextHop) • void deliver(key,msg) • node[] localLookup(key,num,safeFlag) • node[] neighborSet(num) • node[] replicaSet(key,maxRank) • void update(node,joinedFlag) • bool range(node,rank,keyRange) Jeffrey Pang, Carnegie Mellon, NetTalk
OpenHash • Goal: • DHT as a single service multiple apps can use • Basic Idea: • Some simple apps only require get/put. Support these “out of box” • App operations can be classified as “endpoint” operators (at root/successor) or “hop-by-hop” operators (on path to root) • Support endpoint operators • App specific code lives on nodes “outside” the main DHT • Route app specific requests only to nodes that have the app’s code • Argue that don’t need to support hop-by-hop operators • Most functionality can be achieved another way • Importance: • How one might deploy DHT as an active service • Allow people other than academics to deploy these apps? Jeffrey Pang, Carnegie Mellon, NetTalk
OpenHash ReDir Algorithm rendezvous points for X find successor for k Jeffrey Pang, Carnegie Mellon, NetTalk
Conclusion • DHT Apps not going away • Are they still struggling to find a purpose? • Would any of these apps be better off not on top of a DHT? • Using basic apps to build more complex ones: • CFS, Ivy build on DHash • POST, OneRing build on PAST, Scribe • SplitStream builds on Scribe • Starting to notice that no one besides researchers using DHTs • 3+ years of research... • How to make them useful to real people? Jeffrey Pang, Carnegie Mellon, NetTalk