660 likes | 680 Views
Towards a Scalable, Adaptive and Network-aware Content Distribution Network. Yan Chen EECS Department UC Berkeley. Outline. Motivation and Challenges Our Contributions: SCAN system Case Study: Tomography-based overlay network monitoring system Conclusions. Motivation.
E N D
Towards a Scalable, Adaptive and Network-aware Content Distribution Network Yan Chen EECS Department UC Berkeley
Outline • Motivation and Challenges • Our Contributions: SCAN system • Case Study: Tomography-based overlay network monitoring system • Conclusions
Motivation • The Internet has evolved to become a commercial infrastructure for service delivery • Web delivery, VoIP, streaming media … • Challenges for Internet-scale services • Scalability: 600M users, 35M Web sites, 2.1Tb/s • Efficiency: bandwidth, storage, management • Agility: dynamic clients/network/servers • Security, etc. • Focus on content delivery - Content Distribution Network (CDN) • Totally 4 Billion Web pages, daily growth of 7M pages • Annual traffic growth of 200% for next 4 years
Challenges for CDN • Replica Location • Find nearby replicas with good DoS attack resilience • Replica Deployment • Dynamics, efficiency • Client QoS and server capacity constraints • Replica Management • Replica index state maintenance scalability • Adaptation to Network Congestion/Failures • Overlay monitoring scalability and accuracy
SCAN: Scalable Content Access Network Provision: Dynamic Replication + Update Multicast Tree Building Replica Management: (Incremental) Content Clustering Network DoS Resilient Replica Location: Tapestry Network End-to-End Distance Monitoring Internet Iso-bar: latency TOM: loss rate
Replica Location • Existing Work and Problems • Centralized, Replicated and Distributed Directory Services • No security benchmarking, which one has the best DoS attack resilience? • Solution • Proposed the first simulation-based network DoS resilience benchmark • Applied it to compare three directory services • DHT-based Distributed Directory Services has best resilience in practice • Publication • 3rd Int. Conf. on Info. and Comm. Security (ICICS), 2001
Replica Placement/Maintenance • Existing Work and Problems • Static placement • Dynamic but inefficient placement • No coherence support • Solution • Dynamically place close to optimal # of replicas with clients QoS (latency) and servers capacity constraints • Self-organize replica into a scalable application-level multicast for disseminating updates • With overlay network topology only • Publication • IPTPS 2002, Pervasive Computing 2002
Replica Management • Existing Work and Problems • Cooperative access for good efficiency requires maintaining replica indices • Per Website replication, scalable, but poor performance • Per URL replication, good performance, but unscalable • Solution • Clustering-based replication reduces the overhead significantly without sacrificing much performance • Proposed a unique online Web object popularity prediction scheme based on hyperlink structures • Online incremental clustering and replication to push replicas before accessed • Publication • ICNP 2002, IEEE J-SAC 2003
Adaptation to Network Congestion/Failures • Existing Work and Problems • Latency estimation • Clustering-based: network proximity based, inaccurate • Coordinate-based: symmetric distance, unscalable to update • General metrics: n2measurement for n end hosts • Solution • Latency: Internet Iso-bar - clustering based on latency similarity to a small number of landmarks • Loss rate: Tomography-based Overlay Monitoring (TOM) - selectively monitor a basis set of O(n logn) paths to infer the loss rates of other paths • Publication • Internet Iso-bar: SIGMETRICS PER 2002 • TOM: SIGCOMM IMC 2003
replica cache always update adaptive coherence client Tapestry mesh SCAN Architecture • Leverage Distributed Hash Table - Tapestry for • Distributed, scalable location with guaranteed success • Search with locality data source data plane Dynamic Replication/Update and Replica Management Replica Location Web server SCAN server Overlay Network Monitoring network plane
iterate Algorithm design Realistic simulation Methodology • Network topology • Web workload • Network end-to-end latency measurement Analytical evaluation PlanetLab tests
TOM Outline • Goal and Problem Formulation • Algebraic Modeling and Basic Algorithms • Scalability Analysis • Practical Issues • Evaluation • Application: Adaptive Overlay Streaming Media • Conclusions
Goal: a scalable, adaptive and accurate overlay monitoring system to detect e2e congestion/failures Existing Work • General Metrics: RON (n2measurement) • Latency Estimation • Clustering-based: IDMaps, Internet Isobar, etc. • Coordinate-based: GNP, ICS, Virtual Landmarks • Network tomography • Focusing on inferring the characteristics of physical links rather than E2E paths • Limited measurements -> under-constrained system, unidentifiable links
Problem Formulation Given an overlay of n end hosts and O(n2) paths, how to select a minimal subset of paths to monitor so that the loss rates/latency of all other paths can be inferred. Assumptions: • Topology measurable • Can only measure the E2E path, not the link
Overlay Network Operation Center End hosts topology measurements Our Approach Select a basis set of k paths that fully describe O(n2) paths (k «O(n2)) • Monitor the loss rates of k paths, and infer the loss rates of all other paths • Applicable for any additive metrics, like latency
A p1 1 3 Algebraic Model D C 2 B Path loss rate p, link loss rate l
A p1 1 3 Putting All Paths Together D C 2 B Totally r = O(n2) paths, s links, s «r = …
x2 A b2 (1,1,0) 1 3 b1 (1,-1,0) path/row space (measured) D null space (unmeasured) b3 C 2 x1 B x3 Sample Path Matrix • x1 - x2unknown => cannot compute x1, x2 • Set of vectors form null space • To separate identifiable vs. unidentifiable components: x = xG + xN
x2 (1,1,0) (1,-1,0) path/row space (measured) null space (unmeasured) x1 A b2 x3 Virtualization 1 3 b1 D 2 1 Virtual links b3 C 2 B Intuition through Topology Virtualization Virtual links: • Minimal path segments whose loss rates uniquely identified • Can fully describe all paths • xG is composed of virtual links All E2E paths are in path space, i.e., GxN = 0
1 1’ 2’ 2 1 2 3 Rank(G)=2 2’ 1’ 1 1 3’ 2 2 4 3 3 4’ Rank(G)=3 More Examples Virtualization Real links (solid) and all of the overlay paths (dotted) traversing them Virtual links
= Basic Algorithms • Select k = rank(G) linearly independent paths to monitor • Use QR decomposition • Leverage sparse matrix: time O(rk2) and memory O(k2) • E.g., 79 sec for n = 300 (r = 44850) and k = 2541 • Compute the loss rates of other paths • Time O(k2) and memory O(k2) • E.g., 1.89 sec for the example above = … …
Scalability Analysis k « O(n2) ? For a power-law Internet topology • When the majority of end hosts are on the overlay • When a small portion of end hosts are on overlay • If Internet a pure hierarchical structure (tree): k = O(n) • If Internet no hierarchy at all (worst case, clique): k = O(n2) • Internet has moderate hierarchical structure [TGJ+02] k = O(n) (with proof) For reasonably large n, (e.g., 100), k = O(nlogn) (extensive linear regression tests on both synthetic and real topologies)
TOM Outline • Goal and Problem Formulation • Algebraic Modeling and Basic Algorithms • Scalability Analysis • Practical Issues • Evaluation • Application: Adaptive Overlay Streaming Media • Summary
Practical Issues • Topology measurement errors tolerance • Router aliases • Incomplete routing info • Measurement load balancing • Randomly order the paths for scan and selection of • Adaptive to topology changes • Designed efficient algorithms for incrementally update • Add/remove a path: O(k2) time (O(n2k2) for reinitialize) • Add/remove end hosts and Routing changes
Evaluation Metrics • Path loss rate estimation accuracy • Absolute error |p – p’ | • Error factor [BDPT02] • Lossy path inference: coverage and false positive ratio • Measurement load balancing • Coefficient of variation (CV) • Maximum vs. mean ratio (MMR) • Speed of setup, update and adaptation
Evaluation • Extensive Simulations • Experiments on PlanetLab • 51 hosts, each from different organizations • 51 × 50 = 2,550 paths • On average k = 872 • Results on Accuracy • Avg real loss rate: 0.023 • Absolute error mean: 0.0027 90% < 0.014 • Error factor mean: 1.1 90% < 2.0 • On average 248 out of 2550 paths have no or incomplete routing information • No router aliases resolved
With load balancing Without load balancing Evaluation (cont’d) • Results on Speed • Path selection (setup): 0.75 sec • Path loss rate calculation: 0.16 sec for all 2550 paths • Results on Load Balancing • Significantly reduce CV and MMR, up to a factor of 7.3
TOM Outline • Goal and Problem Formulation • Algebraic Modeling and Basic Algorithms • Scalability Analysis • Practical Issues • Evaluation • Application: Adaptive Overlay Streaming Media • Conclusions
Motivation • Traditional streaming media systems treat the network as a black box • Adaptation only performed at the transmission end points • Overlay relay can effectively bypass congestion/failures • Built an adaptive streaming media system that leverages • TOM for real-time path info • An overlay network for adaptive packet buffering and relay
Adaptive Overlay Streaming Media Stanford UC San Diego UC Berkeley X HP Labs • Implemented with Winamp client and SHOUTcast server • Congestion introduced with a Packet Shaper • Skip-free playback: server buffering and rewinding • Total adaptation time < 4 seconds
Summary • A tomography-based overlay network monitoring system • Selectively monitor a basis set of O(n logn) paths to infer the loss rates of O(n2) paths • Works in real-time, adaptive to topology changes, has good load balancing and tolerates topology errors • Both simulation and real Internet experiments promising • Built adaptive overlay streaming media system on top of TOM • Bypass congestion/failures for smooth playback within seconds
Tie Back to SCAN Provision: Dynamic Replication + Update Multicast Tree Building Replica Management: (Incremental) Content Clustering Network DoS Resilient Replica Location: Tapestry Network End-to-End Distance Monitoring Internet Iso-bar: latency TOM: loss rate
Contribution of My Thesis • Replica location • Proposed the first simulation-based network DoS resilience benchmark and quantify three types of directory services • Dynamically place close to optimal # of replicas • Self-organize replicas into a scalable app-level multicast tree for disseminating updates • Cluster objects to significantly reduce the management overhead with little performance sacrifice • Online incremental clustering and replication to adapt to users’ access pattern changes • Scalable overlay network monitoring
Existing CDNs Fail to Address these Challenges No coherence for dynamic content X Unscalable network monitoring - O(M ×N) M: # of client groups, N: # of server farms Non-cooperative replication inefficient
Network Topology and Web Workload • Network Topology • Pure-random, Waxman & transit-stub synthetic topology • An AS-level topology from 7 widely-dispersed BGP peers • Web Workload • Aggregate MSNBC Web clients with BGP prefix • BGP tables from a BBNPlanet router • Aggregate NASA Web clients with domain names • Map the client groups onto the topology
Network E2E Latency Measurement • NLANR Active Measurement Projectdata set • 111 sites on America, Asia, Australia and Europe • Round-trip time (RTT) between every pair of hosts every minute • 17M daily measurement • Raw data: Jun. – Dec. 2001, Nov. 2002 • Keynote measurement data • Measure TCP performance from about 100 worldwide agents • Heterogeneous core network: various ISPs • Heterogeneous access network: • Dial up 56K, DSL and high-bandwidth business connections • Targets • 40 most popular Web servers + 27 Internet Data Centers • Raw data: Nov. – Dec. 2001, Mar. – May 2002
Absolute and Relative Errors • For each experiment, get its 95 percentile absolute and relative errors for estimation of 2,550 paths
Lossy Path Inference Accuracy • 90 out of 100 runs have coverage over 85% and false positive less than 10% • Many caused by the 5% threshold boundary effects
PlanetLab Experiment Results • Loss rate distribution • Metrics • Absolute error |p – p’ |: • Average 0.0027 for all paths, 0.0058 for lossy paths • Relative error [BDPT02] • Lossy path inference: coverage and false positive ratio • On average k = 872 out of 2550
Experiments on Planet Lab • 51 hosts, each from different organizations • 51 × 50 = 2,550 paths • Simultaneous loss rate measurement • 300 trials, 300 msec each • In each trial, send a 40-byte UDP pkt to every other host • Simultaneous topology measurement • Traceroute • Experiments: 6/24 – 6/27 • 100 experiments in peak hours
Motivation • With single node relay • Loss rate improvement • Among 10,980 lossy paths: • 5,705 paths (52.0%) have loss rate reduced by 0.05 or more • 3,084 paths (28.1%) change from lossy to non-lossy • Throughput improvement • Estimated with • 60,320 paths (24%) with non-zero loss rate, throughput computable • Among them, 32,939 (54.6%) paths have throughput improved, 13,734 (22.8%) paths have throughput doubled or more • Implications: use overlay path to bypass congestion or failures
SCAN Coherence for dynamic content X s1, s4, s5 Cooperative clustering-based replication Scalable network monitoring O(M+N)
Problem Formulation • Subject to certain total replication cost (e.g., # of URL replicas) • Find a scalable, adaptive replication strategy to reduce avg access cost
SCAN: Scalable Content Access Network CDN Applications (e.g. streaming media) Provision: Cooperative Clustering-based Replication Coherence: Update Multicast Tree Construction Network Distance/ Congestion/ Failure Estimation User Behavior/ Workload Monitoring Network Performance Monitoring red: my work, black: out of scope