310 likes | 474 Views
Beehive : Achieving O(1) Lookup Performance in P2P Overlays for Zipf-like Query Distributions. Venugopalan Ramasubramanian (Rama) and Emin G ü n Sirer. Cornell University. introduction. caching is widely-used to improve latency and to decrease overhead passive caching
E N D
Beehive: Achieving O(1) Lookup Performance in P2P Overlays for Zipf-like Query Distributions Venugopalan Ramasubramanian (Rama) and Emin Gün Sirer Cornell University
introduction • caching is widely-used to improve latency and to decrease overhead • passive caching • caches distributed throughout the network • store objects that are encountered • not well-suited for a large-class applications
problems with passive caching • no performance guarantees • heavy-tail effect • large percentage of queries to unpopular objects • ad-hoc heuristics for cache management • introduces coherency problems • difficult to locate all copies • weak consistency model
overview of beehive • general replication framework for structured DHTs • decentralization, self-organization, resilience • properties • high performance: O(1) average lookup time • scalable: minimize number of replicas and reduce storage, bandwidth, and network load • adaptive: promptly respond to changes in popularity – flash crowds
0021 0112 0122 prefix-matching DHTs object 0121 • logbN hops • several RTTs on the Internet 2012
key intuition • tunable latency • adjust number of objects replicated at each level • fundamental space-time tradeoff 0021 0112 0122 2012
analytical model • optimization problem minimize: total number of replicas, s.t., average lookup performance C • configurable target lookup performance • continuous range, sub one-hop • minimizing number of replicas decreases storage and bandwidth overhead
analytical model • zipf-like query distributions with parameter • number of queries to rth popular object 1/r • fraction of queries for m most popular objects (m1- - 1) / (M1- - 1) • level of replication • nodes share i prefix-digits with the object • i hop lookup latency • replicated on N/bi nodes
optimization problem minimize (storage/bandwidth) x0 + x1/b + x2/b2 + … + xK-1/bK-1 such that (average lookup time is C hops) K – (x01- + x11- + x21- + … + xK-11-) C and x0 x1 x2 … xK-1 1 b: base K: logb(N) xi: fraction of objects replicated at level i
1 [ ] 1 - dj (K’ – C) 1 + d + … + dK’-1 optimal closed-form solution , 0 i K’ – 1 x*i = , K’ i K 1 where, d = b(1- ) / K’ is determined by setting (typically 2 or 3) x*K’-1 1 dK’-1 (K’ – C) / (1 + d + … + dK’-1) 1
beehive: system overview • estimation • popularity of objects, zipf parameter • local measurement, limited aggregation • replication • apply analytical model independently at each node • push new replicas to nodes at most one hop away
L 2 0 1 * B 0 1 * E 0 1 * I 0 * L 1 0 * 0 * 0 * 0 * 0 * 0 * 0 * 0 * A B C D E F G H I beehive replication protocol home node object 0121 L 3 E 0 1 2 *
mutable objects • leverage the underlying structure of DHT • replication level indicates the locations of all the replicas • proactive propagation to all nodes from the home node • home node sends to one-hop neighbors with i matching prefix-digits • level i nodes send to level i+1 nodes
implementation and evaluation • implemented using Pastry as the underlying DHT • evaluation using a real DNS workload • MIT DNS trace (zipf parameter 0.91) • 1024 nodes, 40960 objects • compared with passive caching on pastry • main properties evaluated • lookup performance • storage and bandwidth overhead • adaptation to changes in query distribution
evaluation: lookup performance passive caching is not very effective because of heavy tail query distribution and mutable objects. beehive converges to the target of 1 hop
evaluation: overhead Storage Bandwidth
evaluation: flash crowds lookup performance
Cooperative Domain Name System (CoDoNS) • replacement for legacy DNS • secure authentication through DNSSEC • incremental deployment path • completely transparent to clients • uses legacy DNS to populate resource records on demand • deployed on planet-lab
advantages of CoDoNS • higher performance than legacy DNS • median latency of 7 ms for codons (planet-lab), 39 ms for legacy DNS • resilience against denial of service attacks • self configuration after host and network failures • fast update propagation
conclusions • model-driven proactive caching • O(1) lookup performance with optimal replicas • beehive: a general replication framework • structured overlays with uniform fan-out • high performance, resilience, improved availability • well-suited for latency sensitive applications www.cs.cornell.edu/people/egs/beehive
typical values of zipf parameter • MIT DNS trace: = 0.91 • Web traces:
security issues in beehive • underlying DHT • corruption in routing tables • [Castro, Druschel, Ganesh, Rowstrom, Wallach] • beehive • misrepresentation of popularity • remove outliers • application • corruption of data • certificates (ex. DNS-SEC)