540 likes | 674 Views
Why We STILL Don’t Know How To Simulate Networks. Mostafa H. Ammar College of Computing Georgia Institute of Technology Atlanta, GA. Disclaimer. My Personal Perspective: Networking Researcher and not Simulationist. Have written and used discrete event computer simulations for over 25 years
E N D
Why We STILL Don’t Know How To Simulate Networks Mostafa H. Ammar College of Computing Georgia Institute of Technology Atlanta, GA
Disclaimer • My Personal Perspective: • Networking Researcher and not Simulationist. • Have written and used discrete event computer simulations for over 25 years • Involved in COMPASS project at GT for the last 7 years
The Main Message • The use of simulation has been growing in the networking community • Current shifts in networking research landscape have increased the importance of simulation as tool for evaluation • There is a crisis of credibility causing people to question the validity of simulations • Why and How to Fix it?
The Main Message • The use of simulation has been growing in the networking community • Current shifts in networking research landscape have increased the importance of simulation as tool for evaluation • There is a crisis of credibility causing people to question the validity of simulations • Why and How to Fix it?
Evaluating Networks: A Spectrum • A spectrum of approaches • Mathematical Analysis • Computer Simulation • Computer Emulation • Prototype Testbed • Real network testing/deployment Increased Cost/ Overhead Decreased Realism/Accuracy
A Brief History of Network Simulation • In the beginning: A combination of Mathematical Analysis Small-scale prototypes Simulation • However, simulation was primitive and accessible only to people that had computers and knew how to program them.
Early Examples of Network Simulation • Kleinrock’s thesis (1962) used simulation to validate his Independence assumption. “I invented effective dynamic routing procedures and also established the analytic model by which you could calculate delay . . . and to simulate it I had to make some fundamental assumptions-I simulated the hell out of it to show that the assumptions worked. “ LK http://www.computer.org/internet/v1n3/kleinrock9702.htm
Early Examples of Network Simulation Paul Baran: On Distributed Communications:II. Digital Simulation of Hot-Potato Routing in a Broadband Distributed Communications Network http://www.rand.org/publications/RM/RM3103 II. The Simulated Network Description The size of the network simulated was limited by the amount of storage available in the IBM 7090 computer using FORTRAN. A heavy storage requirement was dictated by the need for each simulated node or station to maintain a table of recorded handover numbers--the tag appended to each message indicating the number of times that message has been relayed. For each node, a table containing handover numbers to every other node via every one of up to a maximum of eight links is needed.
The Rise of Network Simulation • As computing became more accessible more and more people started doing simulations • Papers using simulation • INFOCOM 85: 10% , 92-98: ~ 60% • SIGCOMM 89 : 4/29, 98: 13/26, 04: 11/30
The Main Message • The use of simulation has been growing in the networking community • Current shifts in networking research landscape have increased the importance of simulation as tool for evaluation • There is a crisis of credibility causing people to question the validity of simulations • Why and How to Fix it?
Networking Research Landscape • Early efforts dealt with relatively simple phenomenon on small-scale networks. • Current research deals with complex phenomenon on large-scale networks • A long story …
Network Research Landscape • Systems are • Less tractable mathematically • Difficult to prototype • And yet everyone has access to abundant computing • => Simulation more viable and often the only evaluation tool available
The Main Message • The use of simulation has been growing in the networking community • Current shifts in networking research landscape have increased the importance of simulation as tool for evaluation • There is a crisis of credibility causing people to question the validity of simulations • Why and How to Fix it?
Crisis of Credibility • “Some claim that stochastic simulation as a performance evaluation tool of various dynamic systems, including telecommunication networks, is misused, and that the spread of this phenomenon is so wide that one can speak about a deep credibility crisis. It is even claimed that one cannot rely on the majority of the published results of performance evaluation studies of dynamic systems based on stochastic simulation.” From: Pawlikowski, K., Jeong, H.-D. J., Lee, J.-S. R.: On Credibility of Simulation Studie of Telecommunication Networks. IEEE Comms., Jan. 2002, 132-139.
Crisis of Credibility “ I favor a stamp : WARNING: COMPUTER SIMULATION – MAY BE ERRONEOUS and UNVERIFIABLE. Like on Cigarettes.” Michael Crichton in “State of Fear”
Crisis of Credibility From: Cavin, Sasson and Schiper – On the accuracy of MANET Simulators Ns-2 Opnet Glomosim
Crisis of Credibility • A Typical Paper Review “This paper should be rejected because its evaluation section is weak. The simulation (uses questionable models) and/or (simulates too small a network) and/or (does not have a valid statistical analysis of the simulation output) and/or … (your own critique here).”
The Main Message • The use of simulation has been growing in the networking community • Current shifts in networking research landscape have increased the importance of simulation as tool for evaluation • There is a crisis of credibility causing people to question the validity of simulations • Why and How to Fix it?
Reasons for the Credibility Crisis • Confusion regarding the role of simulation • Impossibility of simulating Internet-scale networks • Difficulty in building realistic models • Lack of standards for validation and repeatability
Reasons for the Credibility Crisis • Confusion regarding the role of simulation • Impossibility of simulating Internet-scale networks • Difficulty in building realistic models • Lack of standards for validation and repeatability
The Roles of Simulation • To validate approximate analysis • To get/confirm first-order insights into new techniques • To understand complex interactions among various entities/procedures • To perform relative evaluation among alternatives • To answer questions regarding deployability in a real network
The Roles of Simulation • Different tools may be needed for different roles • The burden on accuracy, repeatability and validity is highly dependent on the role • It is not always (rarely?) stated up front
A Personal Experience • Parts and Holes in a Manufacturing Transfer Line
A Personal Experience • Parts and Holes in a Manufacturing Transfer Line
A Significant Failure • Simulation has not been able to answer wide-scale deployability questions • Multicast • QoS • RED • … • Perhaps it’s a matter of simulation scale
Reasons for the Credibility Crisis • Confusion regarding the role of simulation • Impossibility of simulating Internet-scale networks • Difficulty in building realistic models • Lack of standards for validation and repeatability
Large-Scale Network Simulation • Large-scale network simulation offers • Verify validity of simulation results on small networks • Examine issues of scale • Validate theoretical models for large networks • But it has been quite challenging to build large-scale simulations Fujimoto, Perumalla, Park, Wu, Ammar, Riley, "Large Scale Simulation: How Big? How Fast?," Proceedings of the 11th IEEE/ACM International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS), October 2003.
Quantifying Simulator Performance • Execution time: T ≈ (NF * PF * HF) / PTS • NF = number of flows • PF = packets sent per flow • HF = average hops per flow • PTS = simulator speed (simulated packets transmissions / sec) • Ignores lost packets, protocol generated packets (e.g., acks) Number of packet transmissions (hops) to be simulated • Example • 500,000 active UDP flows, 1.0 Mbps per flow, average of 8 hops to reach the destination • Assume 1KByte packets (125 packets per sec per flow) • Workload: simulate 500 Million packet transmissions per second of network operation
Time Parallel Simulation Space-parallel Simulation (parallel discrete event simulation) Our focus Scalability of Packet Level Simulators 1010 108 Simulator Speed - PTS (traffic that can be simulated in real time) 106 Sequential Simulation 104 102 108 106 104 102 1 Network Size (hosts, routers, etc.)
Build “from scratch” approach: Substantial effort to build & validate new models Users must learn a new simulator SSFNet, Qualnet, Javasim NS NS NS NS Backplane/RTI Federated simulation approach: • Simulators integrated via a software backplane/RTI • Exploit existing software & validated model & user base • Heterogenous simulations • PDNS Approaches to Parallel Network Simulation Large-scale parallel network simulator
Hardware Platforms • Sequential: Sun / Solaris • Ultra-80, UltraSPARC-II 450MHz • 4GB memory • Parallel: Intel / RedHat Linux 7.3 • 8-way Pentium-III XEON (2MB L2 cache) SMP • 550MHz clock speed • 4GB memory • 17 SMPs (136 CPUs) connectd via Gigabit Ethernet • Performance measurements are conservative (due to hardware performance)
Sequential Performance Comparison (Single Campus Network – ~ 500 nodes and links) * A packet transmission involves simulating a packet transmission over a single link ** Includes NixVectors optimization Average end-to-end delay differed by less than 3%
PDNS Performance on Cluster(Perumalla/Park) PTS • Each processor simulates ~5000 nodes and links • Up to 120 processors simulating 645,600 nodes
Lemieux Supercomputer • 750 HP-Alpha ES45 servers • 4Gbytes memory per server • 4 CPUs per server • 1GHz CPU • 3000 CPUs total • 64-bit computing • Quadrics interconnect Pittsburgh Supercomputing Center http://www.psc.edu/machines/tcs/lemieux.html
PDNS Performance on PSC(Perumalla) • 147K PTS on one CPU • Campus network topology, FTP traffic (500 packets/flow, TCP) • Scale problem size & number CPUs (up to ~4 million network nodes) • Performance up to 106 Million PTS
But… Can we build an Internet-scale Simulation? • A “back-of-the-envelope” calculation • 100 million Internet hosts • 1 router for every 100 and each router has 4 links • 50% of end-hosts have 56Kbps access and 50% have 10Mbps access • Router to router links are as follows: 50% @ 10Mbps, 40% @ 100Mbps, 5% @ 655Mbps and 5% @ 2.4Gbps • Utilization is 50% for access links and 10% for network links • 1% of hosts have active connections • Average packet size = 5000 bits • George Riley, Mostafa Ammar, "Simulating Large Networks: How Big is Big Enough?" Proceedings of First International Conference on Grand Challenges for Modeling and Simulation, January 2002.
Back of the Envelope Calculation (cont’d) • 2.9 x 10^11 events per second • Assume can process 10^6 events per second (~ 500,000 PTS) • => 290,000 CPU seconds (4 days) for evey second of Internet time !!!! • => need 300 Terabytes of memory in ns – not including routing table space!!! • => need 14 Terabytes for event logging for each second of simulation time!!! • Requires 1000 parallel CPUs with 300 GB of main memory and 1.4 TB of disk storage in each!!! • Would not speed things up much – simply allows simulation to run
Wait a few years and computing power will catch up • Possibly … but the network itself is also growing. • Even with Moore’s Law increase in processing power we will need 300x10^6 CPU seconds for every wallclock second (assuming typical Internet growth). • Open Question: What is the right simulation size to explore Internet-scale performance issues?
Many Challenges Remain • Tools & Parallel Simulation Issues • Robust performance • Making parallel simulation more transparent, “automatic” (BenchMap and AutoPart) • Access to HPC platforms • Visualization Tools • Modeling issues [Floyd/Paxson] • Building credible large-scale models and scenarios • Verifying and validating large-scale simulations • Topology? Traffic? • Methodologies and tools to effectively utilize the simulators
Reasons for the Credibility Crisis • Confusion regarding the role of simulation • Impossibility of simulating Internet-scale networks • Difficulty in building realistic models • Lack of standards for validation and repeatability
Building Realistic Models The Simulation Modeler’s Dilemma: • One needs to eliminate “unimportant” details in the simulation in order to speed up simulation (avoid kitchen-sink simulations) • But how can one tell if a detail is unimportant • Simulate and see if there is any difference – this is considered wasted effort – • Perhaps we should encourage these kinds of results!
Incorporating Packet-Level Details in P2P Simulations He, Ammar, Riley, Raj, Fujimoto, "Mapping Peer Behavior to Packet-Level Details: A Framework for Packet-Level Simulation of Peer-to-Peer Systems," Proceedings of the MASCOTS 2003. • access bandwidth affects throughput significantly • Models which do not capture packet-level details do not reveal the difference
Building Realistic Models • A significant challenge especially for large-scale simulation • Significant attention to topology modeling but very little understanding of other important issues • Workload Modeling • Cross-layer interactions (particularly for wireless networks) • Modeling of operations and overheads
Cross-layer modeling • A perfect instance of the Modeler’s Dilemma • Split-stack composition may be helpful Xu, Riley, Ammar, Fujimoto, ``Split Protocol Stack Network Simulations using the Dynamic Simulation Backplane'' Proceedings of the Ninth International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, (MASCOTS'01), August 2001
Simulator 1 A D A B C D E F C F B E Simulator 2 Simulation Split Vertically • Each simulator simulates a portion of the protocol stack of the entire network
Splitting Protocol Stack • Protocol stack split between TCP and IP ns2 Glomosim
Workload Modeling • See our work presented in this conference about generating TCP workloads to match observed network utilization. • Qi He, Constantinos Dovrolis, Mostafa Ammar, "A Methodology for the Optimal Configuration of TCP Traffic in Network Simulation under Link Load Constraints," Proceedings of the 38th Annual Simulation Symposium, San Diego, April 2005.
Reasons for the Credibility Crisis • Confusion regarding the role of simulation • Impossibility of simulating Internet-scale networks • Difficulty in building realistic models • Lack of standards for validation and repeatability
Simulation Validation and Repeatability • The issue: Given that the simulation model is correct, how can one trust the results from the simulation • Two types of problems • Technical • Social