270 likes | 340 Views
Scalability and Accuracy in a Large-Scale Network Emulator. Amin Vahdat, Ken Yocum, Kevin Walsh, Priya Mahadevan, Dejan Kosti ć , Jeff Chase, and David Becker Duke University Proceedings of 5th Symposium on Operating Systems Design and Implementation (OSDI 2002). Introduction.
E N D
Scalability and Accuracy in a Large-Scale Network Emulator Amin Vahdat, Ken Yocum, Kevin Walsh, Priya Mahadevan, Dejan Kostić, Jeff Chase, and David Becker Duke University Proceedings of 5th Symposium on Operating Systems Design and Implementation (OSDI 2002)
Introduction • Evaluate Internet-scale distributed systems • E.g. peer-to-peer, overlay, wide-area replication • Realistic scenarios: real world • Difficult to deploy and administer • Results not reproducible or not necessarily representative of future behaviour • Simulations: e.g. NS • More control • May miss important system interactions • Emulation • Run unmodified code on target platforms • More control: can subject system traffic to constraints (bandwidth, latency, loss rate, topology,…) • Thus far limited to small and static systems ModelNet
Goal of ModelNet • Environment should support: • Unmodified applications • Reproducible results • Experimentation under broad range of network topologies and dynamically changing network characteristics • Large-scale experiments with large number of nodes and high traffic
ModelNet Architecture • Scalable Internetemulation environment • Based on dummynet, extended to improve accuracy and include multi-hop and multi-core emulation • Edge nodes running user-specified OS and applications • Each instance is a virtual edge node (VN)with unique IP in emulated topology • Route traffic through core routers • Core nodes emulate behaviour of configured target network • Captures effects of congestion and cross-traffic • Uses emulated links or pipes
ModelNet Phases • CREATE • Generate network topology GML graph(*) • Can use Internet traces, BGP dumps, synthetic topology generators • User can annotate graph to specify packet loss rates, failure distribution, etc. (*) GML – graph modeling language
ModelNet Phases • DISTILL • Transform GML graph to pipe topology to model target network • Simplify network • Trade accuracy for reduced emulation cost
ModelNet Phases • ASSIGN • Map distilled topology to core nodes, load balancing • Ideal assignment NP-complete problem • Mapping pipes to cores depends on routing, link properties and traffic load • Use simple greedy k-clusters assignment • Randomly pick one node in the topology for each core node, then cores greedily select from connected nodes in round-robin
ModelNet Phases • BIND • Assign VNs to edge nodes • Can have multiple VNs per physical edge node • Bind each physical node to a single core • Install sets of pipes in distilled topology and routing tables with shortest-path between VN pairs • Configure edge nodes with IP addresses for each VN
ModelNet Phases • RUN • Execute target applications on edge nodes
The Core • Principal tasks (in steady state) • Receive packets from network interface • Move packets • Pipe to pipe • Pipe to final destination • Moving packets is strictly higher priority than receiving packets • Preferentially emulate packets already in core core CPU saturation results in dropped packets at physical level rather than emulation
The Core • Traffic routing • Emulate links as pipes • Pre-computed shortest-path for all VN pairs requires O(n2) space • Route is ordered list of pipes • Move packets through pipes by reference (packet descriptor)
The Core • Packet scheduling • Heap of pipes sorted by earliest deadline (exit time for first packet in queue) • Scheduler executes once per clock tick (10KHz), runs at kernel’s highest priority • Finds heaps with deadline later than current time • Move packets to next destination (tail of next pipe or VN) • Calculate new deadlines and reinsert pipes into heap
The Core • Multi-core configuration • Next pipe may be on different core node • Transfer packet descriptor to next node • Packet contents buffered at entry core node and forwarded to destination upon delivery of packet
Scalability Issues • Bandwidth limitation • Traffic through ModelNet core is limited to cluster’s physical internal bandwidth • Memory requirement • ModelNet must buffer up to full bandwidth-delay product of target network • Routing protocol • Assumes perfect routing protocol: shortest path between all pairs of host • Instantaneous discovery of new shortest path upon node or link failure
Setup for Experiments • Core routers: • 1.4 GHz Pentium-IIIs w/ 1 GB memory • FreeBSD-4.5-STABLE • Connected via 1GB switch • Edge nodes: • 1 GHz Pentium-IIIs w/ 256 MB memory • Linux 2.4.17 • Connected via 100Mb/s Ethernet
Baseline Accuracy • Accurately emulate target packet characteristics on hop-by-hop basis • Use kernel logging to track performance and accuracy • Run ModelNet scheduler at highest kernel priority • Results: • Each hop accurately emulated to granularity of hardware timer (100μs) • Maintains accuracy up to 100% CPU utilization • Future improvement: • in subsequent hops use packet dept handling to correct for emulation errors
Capacity • Quantify as function of load and # of hops • Single core • 1 Gb/s link • 1-5 edge nodes • Each with up to 24 netperf senders (24 VNs) and 24 receivers • 1 Gb/s Ethernet connection • For 1 hop: • At 120 flows CPU is 50% used • Network link is bottleneck • >4 hops • CPU is bottleneck
Additional Cores • Deliver higher throughput • increasing probability of packet’s path crossing node boundary cross-core traffic • Introduces communication overhead • Ability to scale depends on • Application communication characteristics • Partitioning of topology (minimize cross-core traffic)
VN Multiplexing • Mapping of VNs to physical edge nodes • Enables larger-scale emulations • Affects emulation accuracy and scalability • Context switch overhead • Scheduling behaviour • Resource contention at edge nodes
Tradeoff: Accuracy vs. Scalability • Impractical to model every packet and link for large portion of Internet • Create controlled Internet-like execution context for applications • Reduce overhead by making approximations that minimally impact application behaviour • Ideally automate tradeoff to satisfy resource conditions and report degree of inaccuracy to user
Distillation • Hop-by-hop emulation • Distilled topology isomorphic to target network • Accurate but highest per packet cost • End-to-end emulation • Collapse each path to single pipe full mesh • Lowest overhead • Can capture raw network latency, bandwidth and loss rate • Cannot emulate link contention among competing flows
Distillation • Walk-in • Preserve first walk-in links, replace interior by full mesh • Breadth-first traversal to find successive frontier sets (first frontier set is set of all VNs) • Each packet traverses at most (2*walk-in)+1 pipes • Cannot model contention in interior • Walk-out • Model under-provisioned core • Extend walk-in algorithm to preserve inner core • Find “topological center” by generating successive frontiers until one of size one or zero is found • Collapse paths between walk-in and walk-out
Distillation • Ring topology • 20 routers • Interconnected at 20 Mb/s • 20 VNs connected to each router by 2 Mb/s links • VNs partitioned into generator and receiver sets • Each generator sends to random receiver • Hop by hop: 419 pipes • End to end: 79,800 pipes • Last-mile only: 400 edge links and 190 interior links
Changing Network Characteristics • Evaluation of adaptive Internet systems • User can • directly incorporate generators for competing traffic • accurate for emulation of “background” cross traffic • consumes resources at edge nodes and bandwidth at core • modify pipe parameters during emulation • inject cross traffic by dynamically low overhead, scales independently of traffic rate • does not capture all details of Internet packet dynamics (e.g. slow start, bursty traffic) • not responsive to congestion emulation error grows with link utilization level • Fault injection
Case Studies • Network of Gnutella clients • 10,000 nodes (100 VNs for each of the 100 edge nodes) • Support for emulation of ad hoc wireless environments • Implemented but not presented in this paper • CFS(1) • Able to reproduce results from CFS implementation running on RON(2) testbed (published by another group) • Replicated web services • Replay of trace to IBM’s main website • Able to show that one additional replica improves latency, third replica only marginally beneficial • Ability to emulate contention on interior links crucial for obtaining these results • Adaptive overlays • ACDC: overlay that adapts to changing network conditions • Similar experiment results obtained by ModelNet and ns2 (1) CFS - Cooperative File System (2) RON - Resilient Overlay Network (MIT)
Related Work • Many other efforts on emulation • Mostly focus on specific, static and small-scale systems • Netbed (Emulab) • Similar to ModelNet, except that ModelNet focuses on scalable emulation of large-scale networks • Will integrate ModelNet efforts into Netbed • Competing research by WASP(1) project • Emulate network characteristics at end host • Requires emulation software on all edge nodes • Cannot capture congestion of multiple flows on single pipe (1) WASP – Wide Area Server Performance (J.-Y. Pan, H. Bhanoo, E. Nahum, M. Rosu, C, Faloutsos, and S. Seshan)
Summary • ModelNet designed to support • Unmodified applications • Reproducible results • Broad range of network topologies and dynamically changing characteristics • Large-scale experiments • Provided means of balancing accuracy and cost • Presented case studies to show generality of approach