170 likes | 282 Views
“Tuple Space” Scalability: Use a DHT!. Antony Rowstron Microsoft Research Cambridge. Linda-like languages: Looking back to the early days. Originally proposed for parallel processing Shared memory versus message passing Simple: in, out, rd, ( inp , rdp ) Complex compile-time analysis
E N D
“Tuple Space” Scalability: Use a DHT! Antony Rowstron Microsoft Research Cambridge
Linda-like languages: Looking back to the early days • Originally proposed for parallel processing • Shared memory versus message passing • Simple: in, out, rd, (inp, rdp) • Complex compile-time analysis • Closed systems • Translate “shared memory” to “message passing” • Challenge: performance better than message passing • Limited success
Linda: a paradigm for open systems“The second wave” • Exploit temporal and spatial separation • Many different extensions proposed • New primitives • Multiple tuple spaces • Access-control • Open systems • New run-time systems required • Scale: • Networks of Workstations through to the Internet
Linda runtimes: An overview out(<10, “hello”>) in(<?int, “hello”>) <10, “hello”> <10, “hello”> Linda runtime
Linda runtimes I in(<?int, “hello”>) out(<10, “hello”>) <10, “hello”> <10, “hello”> <10, “hello”>
Linda runtimes II <?int, “hello”> H( <int,string> ) <10, “hello”> H( <int,string> ) <int, string>
The main challenge: Hashing <10, “hello”> H( <int,string> )
The challenge: The hashing issue • Distributing the load needs a good function • Uniform distribution • But, Linda: • Tuples and templates • Open systems: resorts to types only • Small set of input symbols for hash function • <?int>,<?bool>,<?float>,<?string>… etc • 1-element templates map to ~ 10 unique keys • 2-element templates map to ~ 100 unique keys • Outcome: Difficult to implement scalable runtimes
Get rid of the hash function • Move the hash function into the application • E.g. Distributed Hash Table • Simple API: • Put(key, value) • Get(key) • Looks very familiar (in,out) • Outcome: Possible to implement scalable runtimes
key nodeId DHTs: Peeking under the covers • Large id space • NodeIds picked randomly from space • Keys picked randomly from space • Key is managed by its rootnode: • Live node with id closest to the key id space root node for key
Node routing state 203231 nodeId leaf set • Topology aware routing table • NodeIds and keys in some base 2b (e.g., 4) • Prefix constraints on nodeIds for each slot • Pick closest node satisfying slot constraints
key nodeId Routing • Prefix matching: each hop resolves an extra key digit 323310 323211 route(m, 323310) 203231 322021 313221
Example: DNS service • Linda: • Add DNS entry: • Out(“msrc401.europe.microsoft.com”,157.58.16.56) • Lookup DNS entry: • Rd(“msrc401.europe.microsoft.com”, ?IP address) • DHTs • Add DNS entry: • Put(SHA1(msrc401.europe.microsoft.com”), 157.58.16.56) • Lookup DNS entry: • IP Address = Get(SHA1(msrc401.europe.microsoft.com”))
Example: DNS service • Linda: • Add DNS entry: • Out(“msrc401.europe.microsoft.com”,157.58.16.56) • Lookup DNS entry: • Rd(“msrc401.europe.microsoft.com”, ?IP address) • DHTs • Add DNS entry: • Put(SHA1(msrc401.europe.microsoft.com”), 157.58.16.56) • Lookup DNS entry: • IP Address = Get(SHA1(msrc401.europe.microsoft.com”))
Example: DNS service • Linda: • Add DNS entry: • Out(“msrc401.europe.microsoft.com”,157.58.16.56) • Lookup DNS: • In(“msrc401.europe.microsoft.com”, ?IP address) • DHTs • Add DNS entry: • Put(SHA1(msrc401.europe.microsoft.com”), 157.58.16.56) • Lookup DNS entry: • IP Address = Get(SHA1(msrc401.europe.microsoft.com”))
The Drawback: Nothing comes free! • Range/complex queries • But in, out, rd, (inp and rdp) does not really do enumeration E.g. Find me the host names associated with IPAddresses 92.10.10.1 to 192.10.10.254 Vanilla Linda: For (inti = 1; i < 255; i++) { IPAddressaddr = new IPAddress(192.10.10.i); Tuple t = rdp(?string,addr) } Extensions: Tuple[] tuples = fetch(?string, 192.10.10.1 -> 192.10.10.254);
Questions? • Question: “Should you be using a DHT?” • Sub-questions: • “Do we need an implicit hash function?” • “Do we need complex querying/matching?” • “Do we need great scalability?”