Cost Aware Resource Management for Decentralized Network Services

Cost Aware Resource Management for Decentralized Network Services Venugopalan Ramasubramanian (Rama) Microsoft Research Silicon Valley / Cornell University

Introduction • decentralized services have become increasingly important • e.g. name systems, CDNs, publish-subscribe • low latency, constant availability, and high scalability • current services often fall short of required performance • ad hoc techniques

Problems with Ad hoc Techniques • no performance guarantees • unable to quantify/bound performance • unable to tune resource utilization to meet performance targets • tailored to specific workloads • e.g. opportunistic caching: on “90/10” rule • heavy-tailed popularity distributions • mutable objects

Principled Approach • fundamental cost-performance tradeoff • e.g. lookup latency vs. memory / bandwidth consumption • resource allocation problem • which node hosts which object? • depends on popularity, size, update rate, etc.

Prior Work • Scalability • high complexity even to express the problem • number of objects x number of nodes (M x N) • Decentralization • objects are distributed among multiple nodes • expensive to perform resource allocation centrally

Cost-Aware Resource Management Framework • high performance, robust, and scalable services • Mathematical Optimization • system-wide performance goals become constraints to optimization problems: Min. cost s.t. performance meets target Max. performance s.t. cost ≤ limit • Structured Overlays • decentralization and self-organization • well-defined topology with bounded diameter and node degree

Decentralized Internet Services • name service for the Internet • Cooperative Domain Name System (CoDoNS) • content distribution network • Cooperative Beehive Web (CoBWeb) • on-line data monitoring • Cornell On-line News Aggregator (CorONA)

Scalable Resource Allocation • structured overlay • each object has a home node • DAG rooted at home node reaching all nodes • uniform branching-factor • allocate resources at well-defined levels • level ℓ means all nodes ℓ hops away from home node • low complexity resource allocation • Number of objects x Diameter (e.g. M x log N) • practical and scalable

object 0121 = hash(“cs.cornell.edu”) 0021 0112 0122 home node Structured Overlays: Pastry prefix-matching logbN hops 2012

object 0121 = hash(“cs.cornell.edu”) 0021 0112 0122 home node Opportunistic Caching in Pastry 2012

Structured Resource Allocation • analytically model performance-overhead tradeoff • object replicated at all nodes with ℓ matching prefix-digits lookup latency:ℓ hops replicas: N/bℓ • inexpensive to locate and update replicas 0021 0112 0122 2012

Outline • Introduction • Honeycomb Framework • Optimization Analysis • Implementation • Applications • Evaluation • Conclusions

Analytical Modeling • level of allocation (ℓ) • object hosted at all nodes ℓ hops from the home node • optimization problem: find optimal values of ℓi • min.  Ci(ℓi), s.t.  Pi(ℓi)  T • max.  Pi(ℓi), s.t.  Ci(ℓi)  T • performance variables • lookup latency, update latency • cost variables • memory consumption, network overhead, number of nodes

Optimization Problem: Lookup Latency min. ci.bℓis.t., qi (D - ℓi) TL total overhead avg. lookup latency TL: target lookup latency in hops qi: relative query frequency ci: replication cost of object i objects M, nodes N, branching factor b, diameter D

Resource Allocation for Lookup Performance • target avg. lookup latency hops • sub-one hop, fractional values (e.g., 0.5 hops) • indirectly specifies cache hit ratio • worst case lookup latency • lower bound on ℓ • optimizes multiple overhead metrics • number of nodes: c = 1 • memory: c = size of object • bandwidth: c = size x update rate

1 [ ] 1 -  b’ℓ (D – C) 1 + b’ + … + b’D-1 where b’ = b(1- ) / x*ℓ = Analytical Optimization (Beehive) • Zipf popularity distribution (e.g. DNS, Web, RSS) • analytically tractable (one parameter ) • closed-form solution • inexpensive to compute and apply [Ramasubramanian and Sirer NSDI 04]

Numerical Optimization • general-purpose approach • any popularity distribution (including Zipf) • many cost metrics (fine-grained bandwidth consumption) • many performance metrics (update latency) • optimization problem is NP-Hard • Multiple choice Knapsack problem • discrete, convex, and separable • fast and accurate approximation algorithm • O(M D log(M D)) running time • at most one object per node (more or less than optimum)

Numerical Optimization 2 • Lagrange multiplier min.  C(ℓm) + λ [ P(ℓm) – T] • bisection-based bracketing algorithm • upper and lower bound solutions that differ in one channel yields near-optimal solution • pre-computation and sorting of λs before iterating yields O(MD log (MD)) algorithm

Honeycomb • cost-aware resource allocation framework for structured overlays • properties: • system-wide performance goals • scalability and failure resilience • quick adaptation to workload • fast update propagation

independent decisions local aggregation estimate popularity communication only with overlay neighbors replicas managed by one-hop neighbors Scalable Resource Management

Decentralized Optimization • global optimum requires global information • Using local knowledge alone leads to sub-optimal solutions • solution: • approximate tradeoffs for non-local channels • aggregate coarse-grained information between neighbors

Decentralized Optimization 2 • approximate parameters • cluster channels with similar values of P(ℓ) / C(ℓ) • constant number of clusters per level

Decentralized Optimization 3 • Aggregating Clusters • Exchange clusters with one-hop neighbors • Hierarchical aggregation through structured overlay

Adaptation to Workload Changes • popularity of objects may change drastically • flash-crowds, denial of service attacks • nodes measure popularity for local objects and aggregate popularity estimates with neighbors

Adaptation to Workload Changes 2 • orders of magnitude difference in query rates of popular and unpopular objects • solution: combine inter-arrival times and query counts • estimation times proportional to the query rate of the object • monitoring overhead proportional to the query rate of the object • quick detection of large increases in query rate

Honeycomb: Fast Update Propagation • single integer (replication level) indicates locations of all objects • no TTL required • proactively propagate updates • use neighbors in the underlying overlay • increasing version numbers differentiate versions • lazy updates in background

Outline • Introduction • Honeycomb Framework • Applications • Name service (CoDoNS) • Content distribution network (CoBWeb) • On-line data monitoring system (CorONA) • Evaluation • Conclusions

CoDoNS: Cooperative Domain Name System • legacy DNS has fundamental problems • poor failure resilience due to limited replication • high response times due to multi-hop lookups • no support for spontaneous updates • cooperative cache for DNS bindings LegacyDNS [Ramasubramanian and Sirer SIGCOMM 04]

CoDoNS: Cooperative Domain Name System • structured, proactive caching of name-data mappings • targets avg. lookup latency of (0.5 hops) • minimizes memory consumption • updates pushed proactively to all caching nodes • self-certifying data to preserve integrity (DNS-SEC) • incremental deployment path • safety-net for legacy DNS • deployed on Planet-Lab

CobWeb: Cooperative Beehive Web • Web caches • passive, client driven • Content Distribution Networks • active, replication driven • e.g. Akamai, Digital Island (commercial), CoDeeN, CoralCDN (academia) • web caching solutions based on heuristics • ideal cache hit rate (60-70%) [Wolman et al. 01] • achieved cache hit rate (20%-40%) [Breslao et al. 99, Wolman et al. 01]

CobWeb: Cooperative Beehive Web • CobWeb is a cooperative web cache • high cache hit rate through structured, proactive caching • low network overhead using object size and update rate • adaptation to flash crowds • CobWeb performance goals • min. network bandwidth s.t. cache hit rate meets a target • max. cache hit rate s.t. network bandwidth is all consumed

CobWeb: Cooperative Beehive Web • user interfaces • append cob-web.org to urls • e.g., http://slashdot.org.cob-web.org:8888 • DNS redirection, URL rewriting • Meridian finds closest node to the client • deployed on Planet-Lab • greater than10 million requests per day

Corona: Monitoring Online Data • continuously monitoring and detecting changes is crucial • e.g., web pages, sensors, databases • content servers only provide query-based interface • naïve approach through repeated, independent polling • bad update performance • high server load

Corona: Monitoring Online Data • publish-subscribe interface for monitoring web urls • cooperative polling • resource allocation decides how many nodes poll each channel [Ramasubramanian, Peterson, and Sirer NSDI 06]

Corona: Performance Goals • Corona Lite: • Min. update detection time s.t. network load is bounded • Corona Fast: • Min. network load s.t. update detection time meets a target • Corona Fair: • Min. relative update detection time s.t. network load is bounded • ratio of update detection time to update interval

Outline • Introduction • Honeycomb Framework • Applications • Evaluation • Conclusions

CoDoNS: Lookup Latency MIT-DNS trace: 265111 queries, 30000 names, 65 nodes CoDoNS gives 1.5 to 2 times better latency

CoBWeb: Lookup Performance NLANR Workload: 1024 nodes, 10,000 objects, 100, 000 queries

CoBWeb vs. Opportunistic Caching Lookup Latency

CoBWeb vs. Opportunistic Caching Storage Overhead

CoBWeb: Flash Crowd Lookup Latency

CoBWeb: Flash Crowd Network Bandwidth

Corona: Update Performance Corona improves update detection time from 15 min to 45 sec Corona keeps load lower than Legacy RSS

Corona: Update Performance Heuristics vs. Corona

Conclusions • enables high performance, robust, and scalable network services • principled approach for achieving performance goals in distributed systems • mathematical optimization and structured overlays • CoDoNS, CobWeb, and Corona

Other Research in Wireless Networks • Sharp hybrid adaptive routing prorocol for mobile ad hoc networks [Mobihoc 03] • combines proactive and reactive approaches to routing to achieve high performance efficiently • SRL: bidirectional abstraction to support routing protocols on asymmetric mobile ad hoc networks [INFOCOM 02] • Anonymous Gossip: improving multicast reliability on mobile ad hoc networks [ICDCS 01]

Cost Aware Resource Management for Decentralized Network Services

Cost Aware Resource Management for Decentralized Network Services

Presentation Transcript

Decentralized Pharmacy Services

Decentralized Resource Management for Multi-core Desktop Grids

Towards Decentralized Network Management and Reliability

Secure and Flexible Framework for Decentralized Social Network Services

Network Aware Resource Allocation in Distributed Clouds

Network Cost Services for Network-Aware FI Applications

Thermal Aware Resource Management Framework

QoS-aware Resource Management in Distributed System

Decentralized Wastewater Management

State-Based Network Management Resource Allocation using Web Services

Towards Topology-aware Network Services

DECENTRALIZED TRUST MANAGEMENT

Context-Aware Resource Management for Mobile Servers

Network-aware OS

Social-aware Utility-based Radio Resource Management

Decentralized Location Services

Planning for Network-Aware Paths

Twittering by Cuckoo – Decentralized and Socio-Aware Online Microblogging Services

Context-aware Services in Ubiquitous Network

Human Resource Management Services

DECENTRALIZED TRUST MANAGEMENT

Context-Aware Resource Management for Mobile Servers