170 likes | 308 Views
A Case Study in Building Layered DHT Applications. Yatin Chawathe Sriram Ramabhadran, Sylvia Ratnasamy, Anthony LaMarca, Scott Shenker, Joseph Hellerstein. Building distributed applications. Distributed systems are designed to be scalable, available and robust
E N D
A Case Study in Building Layered DHT Applications Yatin Chawathe Sriram Ramabhadran, Sylvia Ratnasamy, Anthony LaMarca, Scott Shenker, Joseph Hellerstein
Building distributed applications • Distributed systems are designed to be scalable, available and robust • What about simplicity of implementation and deployment? • DHTs proposed as simplifying building block • Simple hash-table API: put, get, remove • Scalable content-based routing, fault tolerance and replication
Can DHTs help • Can we layer complex functionality on top of unmodified DHTs? • Can we outsource the entire DHT operation to a third-party DHT service, e.g., OpenDHT? • Existing DHT applications fall into two classes • Simple unmodified DHT for rendezvous or storage, e.g., i3, CFS, FOOD • Complex apps that modify the DHT for enhanced functionality, e.g, Mercury, CoralCDN
Outline • Motivation • A case study: Place Lab • Range queries with Prefix Hash Trees • Evaluation • Conclusion
“War-drivers” submit neighborhood logs Clients download local WiFi maps A Case Study: Place Lab • Positioning service for location-enhanced apps • Clients locate themselves by listening for known radio beacons (e.g. WiFi APs) • Database of APs and their known locations Place Lab service Computes maps of AP MAC address ↔ lat,lon { lat, lon → list of APs } . . . { AP → lat, lon } . . .
Why Place Lab • Developed by group of ubicomp researchers • Not experts in system design and management • Centralized deployment since March 2004 • Software downloaded by over 6000 sites • Concerns over organizational control decentralize the service • But, want to avoid implementation and deployment overhead of distributed service
How DHTs can help Place Lab • Automatic content-based routing • Route logs by AP MAC address to appropriate Place Lab server • Robustness and availability • DHT managed entirely by third party • Provides automatic replication and failure recovery of database content DHT storage and routing Clients download local WiFi maps … “War-drivers” submit neighborhood logs Place Lab servers compute AP location
Downloading WiFi Maps ? • Clients perform geographic range queries • Download segments of the databasee.g., all access points in Philadelphia • Can we perform this entirely on top of unmodified third-party DHT • DHTs provide exact-match queries, not range queries DHT storage and routing Clients download local WiFi maps … “War-drivers” submit neighborhood logs Place Lab servers compute AP location
Supporting range queries • Prefix Hash Trees • Index built entirely with put, get, removeprimitives • No changes to DHT topology or routing • Binary tree structure • Node label is a binary prefix of values stored under it • Nodes split when they get too big • Stored in DHT with node label as key • Allows for direct access to interior and leaf nodes R R0 R1 R01 R10 R11 R00 0 0000 3 0011 8 1000 R111 R110 R011 R010 6 0110 12 1100 14 1110 4 0100 5 0101 13 1101 15 1111
R R1 R01 R11 R110 R011 R010 6 0110 4 0100 5 0101 13 1101 R1101 PHT operations • Lookup(K) • Find leaf node whose label is prefix of K • Binary search across K’s bits • O(log log D) where D = size of key space • Insert(K, V) • Lookup leaf node for K • If full, split node into two • Put value V into leaf node • Query(K1, K2) • Lookup node for P, where P=longest common prefix of K1,K2 • Traverse subtree rooted at node for P R R0 R1 R01 R10 R11 R00 0 0000 3 0011 8 1000 R111 R110 R011 R010 6 0110 12 1100 14 1110 4 0100 5 0101 13 1101 15 1111
P(=R000…00) ( 5 , 6 ) (0101,0110) 00110110 (54) P0 P1 P01 P00 P10 P11 (1,0) P011 P100 P010 P110 P111 P101 P0111 P1100 P1101 P0100 P0101 P0110 (0,4) (1,5) (0,7) (1,6) (1,7) (2,4) (2,5) (3,5) (3,6) (3,7) 2-D geographic queries … 7 • Convert lat/lon into 1-D key • Use z-curve linearization • Interleave lat/lon bits to create z-curve key • Linearized query results may not be contiguous • Start at longest prefix subtree • Visit child nodes only if they can contribute to query result 6 5 4 latitude 3 2 1 0 … 0 1 2 3 4 5 6 7 longitude P10
Ease of implementation and deployment • 2,100 lines of code to hook Place Lab into underlying DHT service • Compare with 14,000 lines for the DHT • Runs entirely on top of deployed OpenDHT service • DHT handles fault tolerance and robustness, and masks failures of Place Lab servers
Flexibility of DHT APIs • Range queries use only the get operation • Updates use combination of put, get, remove • But… • Concurrent updates can cause inefficiencies • No support for concurrency in existing DHT APIs • A test-and-set extension can be beneficial to PHTs and a range of other applications • put_conditional: perform the put only if value has not changed since previous get
PHT insert performance • Median insert latency is 1.45 sec • w/o caching = 3.25 sec; with caching = 0.76 sec
PHT query performance • Queries on average take 2–4 seconds • Varies with block size • Smaller (or very large) block size implies longer query time
Conclusion • Concrete example of building complex applications on top of vanilla DHT service • DHT provides ease of implementation and deployment • Layering allows inheriting of robustness, availability and scalable routing from DHT • Sacrifices performance in return