480 likes | 655 Views
Cost Aware Resource Management for Decentralized Network Services. Venugopalan Ramasubramanian (Rama) Microsoft Research Silicon Valley / Cornell University. Introduction. decentralized services have become increasingly important e.g. name systems, CDNs, publish-subscribe
E N D
Cost Aware Resource Management for Decentralized Network Services Venugopalan Ramasubramanian (Rama) Microsoft Research Silicon Valley / Cornell University
Introduction • decentralized services have become increasingly important • e.g. name systems, CDNs, publish-subscribe • low latency, constant availability, and high scalability • current services often fall short of required performance • ad hoc techniques
Problems with Ad hoc Techniques • no performance guarantees • unable to quantify/bound performance • unable to tune resource utilization to meet performance targets • tailored to specific workloads • e.g. opportunistic caching: on “90/10” rule • heavy-tailed popularity distributions • mutable objects
Principled Approach • fundamental cost-performance tradeoff • e.g. lookup latency vs. memory / bandwidth consumption • resource allocation problem • which node hosts which object? • depends on popularity, size, update rate, etc.
Prior Work • Scalability • high complexity even to express the problem • number of objects x number of nodes (M x N) • Decentralization • objects are distributed among multiple nodes • expensive to perform resource allocation centrally
Cost-Aware Resource Management Framework • high performance, robust, and scalable services • Mathematical Optimization • system-wide performance goals become constraints to optimization problems: Min. cost s.t. performance meets target Max. performance s.t. cost ≤ limit • Structured Overlays • decentralization and self-organization • well-defined topology with bounded diameter and node degree
Decentralized Internet Services • name service for the Internet • Cooperative Domain Name System (CoDoNS) • content distribution network • Cooperative Beehive Web (CoBWeb) • on-line data monitoring • Cornell On-line News Aggregator (CorONA)
Scalable Resource Allocation • structured overlay • each object has a home node • DAG rooted at home node reaching all nodes • uniform branching-factor • allocate resources at well-defined levels • level ℓ means all nodes ℓ hops away from home node • low complexity resource allocation • Number of objects x Diameter (e.g. M x log N) • practical and scalable
object 0121 = hash(“cs.cornell.edu”) 0021 0112 0122 home node Structured Overlays: Pastry prefix-matching logbN hops 2012
object 0121 = hash(“cs.cornell.edu”) 0021 0112 0122 home node Opportunistic Caching in Pastry 2012
Structured Resource Allocation • analytically model performance-overhead tradeoff • object replicated at all nodes with ℓ matching prefix-digits lookup latency:ℓ hops replicas: N/bℓ • inexpensive to locate and update replicas 0021 0112 0122 2012
Outline • Introduction • Honeycomb Framework • Optimization Analysis • Implementation • Applications • Evaluation • Conclusions
Analytical Modeling • level of allocation (ℓ) • object hosted at all nodes ℓ hops from the home node • optimization problem: find optimal values of ℓi • min. Ci(ℓi), s.t. Pi(ℓi) T • max. Pi(ℓi), s.t. Ci(ℓi) T • performance variables • lookup latency, update latency • cost variables • memory consumption, network overhead, number of nodes
Optimization Problem: Lookup Latency min. ci.bℓis.t., qi (D - ℓi) TL total overhead avg. lookup latency TL: target lookup latency in hops qi: relative query frequency ci: replication cost of object i objects M, nodes N, branching factor b, diameter D
Resource Allocation for Lookup Performance • target avg. lookup latency hops • sub-one hop, fractional values (e.g., 0.5 hops) • indirectly specifies cache hit ratio • worst case lookup latency • lower bound on ℓ • optimizes multiple overhead metrics • number of nodes: c = 1 • memory: c = size of object • bandwidth: c = size x update rate
1 [ ] 1 - b’ℓ (D – C) 1 + b’ + … + b’D-1 where b’ = b(1- ) / x*ℓ = Analytical Optimization (Beehive) • Zipf popularity distribution (e.g. DNS, Web, RSS) • analytically tractable (one parameter ) • closed-form solution • inexpensive to compute and apply [Ramasubramanian and Sirer NSDI 04]
Numerical Optimization • general-purpose approach • any popularity distribution (including Zipf) • many cost metrics (fine-grained bandwidth consumption) • many performance metrics (update latency) • optimization problem is NP-Hard • Multiple choice Knapsack problem • discrete, convex, and separable • fast and accurate approximation algorithm • O(M D log(M D)) running time • at most one object per node (more or less than optimum)
Numerical Optimization 2 • Lagrange multiplier min. C(ℓm) + λ [ P(ℓm) – T] • bisection-based bracketing algorithm • upper and lower bound solutions that differ in one channel yields near-optimal solution • pre-computation and sorting of λs before iterating yields O(MD log (MD)) algorithm
Honeycomb • cost-aware resource allocation framework for structured overlays • properties: • system-wide performance goals • scalability and failure resilience • quick adaptation to workload • fast update propagation
independent decisions local aggregation estimate popularity communication only with overlay neighbors replicas managed by one-hop neighbors Scalable Resource Management
independent decisions local aggregation estimate popularity communication only with overlay neighbors replicas managed by one-hop neighbors Scalable Resource Management
Decentralized Optimization • global optimum requires global information • Using local knowledge alone leads to sub-optimal solutions • solution: • approximate tradeoffs for non-local channels • aggregate coarse-grained information between neighbors
Decentralized Optimization 2 • approximate parameters • cluster channels with similar values of P(ℓ) / C(ℓ) • constant number of clusters per level
Decentralized Optimization 3 • Aggregating Clusters • Exchange clusters with one-hop neighbors • Hierarchical aggregation through structured overlay
Adaptation to Workload Changes • popularity of objects may change drastically • flash-crowds, denial of service attacks • nodes measure popularity for local objects and aggregate popularity estimates with neighbors
Adaptation to Workload Changes 2 • orders of magnitude difference in query rates of popular and unpopular objects • solution: combine inter-arrival times and query counts • estimation times proportional to the query rate of the object • monitoring overhead proportional to the query rate of the object • quick detection of large increases in query rate
Honeycomb: Fast Update Propagation • single integer (replication level) indicates locations of all objects • no TTL required • proactively propagate updates • use neighbors in the underlying overlay • increasing version numbers differentiate versions • lazy updates in background
Outline • Introduction • Honeycomb Framework • Applications • Name service (CoDoNS) • Content distribution network (CoBWeb) • On-line data monitoring system (CorONA) • Evaluation • Conclusions
CoDoNS: Cooperative Domain Name System • legacy DNS has fundamental problems • poor failure resilience due to limited replication • high response times due to multi-hop lookups • no support for spontaneous updates • cooperative cache for DNS bindings LegacyDNS [Ramasubramanian and Sirer SIGCOMM 04]
CoDoNS: Cooperative Domain Name System • structured, proactive caching of name-data mappings • targets avg. lookup latency of (0.5 hops) • minimizes memory consumption • updates pushed proactively to all caching nodes • self-certifying data to preserve integrity (DNS-SEC) • incremental deployment path • safety-net for legacy DNS • deployed on Planet-Lab
CobWeb: Cooperative Beehive Web • Web caches • passive, client driven • Content Distribution Networks • active, replication driven • e.g. Akamai, Digital Island (commercial), CoDeeN, CoralCDN (academia) • web caching solutions based on heuristics • ideal cache hit rate (60-70%) [Wolman et al. 01] • achieved cache hit rate (20%-40%) [Breslao et al. 99, Wolman et al. 01]
CobWeb: Cooperative Beehive Web • CobWeb is a cooperative web cache • high cache hit rate through structured, proactive caching • low network overhead using object size and update rate • adaptation to flash crowds • CobWeb performance goals • min. network bandwidth s.t. cache hit rate meets a target • max. cache hit rate s.t. network bandwidth is all consumed
CobWeb: Cooperative Beehive Web • user interfaces • append cob-web.org to urls • e.g., http://slashdot.org.cob-web.org:8888 • DNS redirection, URL rewriting • Meridian finds closest node to the client • deployed on Planet-Lab • greater than10 million requests per day
Corona: Monitoring Online Data • continuously monitoring and detecting changes is crucial • e.g., web pages, sensors, databases • content servers only provide query-based interface • naïve approach through repeated, independent polling • bad update performance • high server load
Corona: Monitoring Online Data • publish-subscribe interface for monitoring web urls • cooperative polling • resource allocation decides how many nodes poll each channel [Ramasubramanian, Peterson, and Sirer NSDI 06]
Corona: Performance Goals • Corona Lite: • Min. update detection time s.t. network load is bounded • Corona Fast: • Min. network load s.t. update detection time meets a target • Corona Fair: • Min. relative update detection time s.t. network load is bounded • ratio of update detection time to update interval
Outline • Introduction • Honeycomb Framework • Applications • Evaluation • Conclusions
CoDoNS: Lookup Latency MIT-DNS trace: 265111 queries, 30000 names, 65 nodes CoDoNS gives 1.5 to 2 times better latency
CoBWeb: Lookup Performance NLANR Workload: 1024 nodes, 10,000 objects, 100, 000 queries
CoBWeb vs. Opportunistic Caching Lookup Latency
CoBWeb vs. Opportunistic Caching Storage Overhead
CoBWeb: Flash Crowd Lookup Latency
CoBWeb: Flash Crowd Network Bandwidth
Corona: Update Performance Corona improves update detection time from 15 min to 45 sec Corona keeps load lower than Legacy RSS
Corona: Update Performance Heuristics vs. Corona
Conclusions • enables high performance, robust, and scalable network services • principled approach for achieving performance goals in distributed systems • mathematical optimization and structured overlays • CoDoNS, CobWeb, and Corona
Other Research in Wireless Networks • Sharp hybrid adaptive routing prorocol for mobile ad hoc networks [Mobihoc 03] • combines proactive and reactive approaches to routing to achieve high performance efficiently • SRL: bidirectional abstraction to support routing protocols on asymmetric mobile ad hoc networks [INFOCOM 02] • Anonymous Gossip: improving multicast reliability on mobile ad hoc networks [ICDCS 01]