290 likes | 322 Views
Parallel Simulation Made Easy With OMNeT++. Y. Ahmet Ş ekerciu ğ lu 1 , András Varga 2 , Gregory K. Egan 1 1 CTIE, Monash University, Melbourne, Australia 2 Omnest Global, Inc. What is OMNeT++?.
E N D
Parallel Simulation Made Easy With OMNeT++ Y. Ahmet Şekerciuğlu1, András Varga2, Gregory K. Egan1 1CTIE, Monash University, Melbourne, Australia 2 Omnest Global, Inc.
What is OMNeT++? • An open-source, generic simulation framework -- best suited for simulation of communication networks, multiprocessors and other complex distributed systems (further examples: queuing systems, hardware architectures, server farm, business processes, call centers) • C++-based simulation kernel plus a set of libraries and tools (GUI and command-line); platform: Unix, Windows • Active user community (mailing list has about 240 subscribers) • Home page: www.omnetpp.org • Commercial version also exists:www.omnest.com
Model Structure Component-oriented approach:The basic building block is a simple module (programmed in C++). Simple modules can be grouped to form compound modules. Modules are connected with each other.
NED(Network Description Language) defines topology: what modules exist, how they are connected and assembled to form larger modules Defining the Topology The graphical editor GNED operates directly on NED files // // Host with an Ethernet interface // module EtherStation parameters: ... gates: ... submodules: app: EtherTrafficGen; llc: EtherLLC; mac: EtherMAC; connections: app.out --> llc.hl_in; app.in <-- llc.hl_out; llc.ll_in <-- mac.hl_out; llc.ll_out --> mac.hl_in; mac.ll_in <-- in; mac.ll_out --> out; endmodule
Defining the Behaviour Behaviour is encapsulated in simple modules. A simple module: • sends messages, reacts to received messages • collects statistics Simple modules are programmed C++ • one can choose between process-oriented or event-oriented programming • simulation class library covers commonly needed functionality, such as: • random number generation, • statistics colleection (histograms, etc), • queues and other containers, • support for topology discovery and routing, etc.
Running the Model Under the GUI Without extra programming, one can: run or single-step the simulation monitor state of simulation and execution speed examine model object tree see what’s happening inside the model explore modules and see message flow examine scheduled events
Exploring Model Internals trace what one module is doing step to next event in a module look at state variables and statistics find out pointer values for C++ debugging (gdb) examine contents of queues, messages and other objects
Exploring Model Internals find all messages, all statistics objects or all queues (NEW) or any objects by their names and inspect them look at results being recorded and much more…
Modular Architecture simulation user interface CMDENV or TKENV or ... Simulation model ENVIRmain() SIM ModelComponentLibrary • UI and simulations are separated, and interact via a well-defined API • provides command-line and graphical user interface; user interfaces can be customized, or specialized ones can be created • enables embedding of simulations into larger applications
Large-Scale Network Simulations PDES: Parallel Discrete Event Simulation Motivation: • speedup: make use of multiple CPUs to reduce execution time • ability to run large models by distributing resource requirements We want to use clusters that can provide supercomputing power at affordable costs -- inexpensive workstations connected via a high-speed network • example: VPAC Linux cluster contains 96 IBM xSeries (dual 2.8GHz Xeon) PCs running Linux 2.4 yielding 629.7 Gflops; Myrinet interconnection provides 4μs end-to-end delay • communication method: MPI • MPI (Message Passing Interface) is a standard for high-performance computing • several implementations exist: LAM/MPI, MPICH, plus vendor-specific implementations
Why do we need large-scale simulations? • Research on Internet protocols and technologies extensively relies on simulation • Systems are too large and too complex for analytic treatment • Small experimental networks do not reflect large-scale dynamics • Large-scale simulations (10,000-1,000,000 nodes) are needed to: • … properly understand dynamics of routing protocols • … to test various extensions proposed to improve performance of current Internet protocols • … to demonstrate scalability of multicast protocols • … plus more
Parallel DES LP2 (on CPU2) LP1 (on CPU1) LP3 (on CPU3) • Partitioning to Logical Processes (LPs): • Each partition maps to a separate LP with its own virtual time and list of scheduled events (Future Events Set) • LPs are executed on different processors • Synchronization mechanism (e.g. null messages; Chandy-Misra-Bryant 1979) is needed to prevent incausalities from happening
PDES Support in OMNeT++ 2. specify partitioning in configuration file 3. run • To try running existing OMNeT++ models in parallel, you only need to: 1. enable parallel simulation parallel-simulation=true
PDES Support in OMNeT++ • Nearly every model can be run in parallel. Constraints: • modules may communicate via sending messages only (no direct method call or member access) unless mapped to the same processor • no global variables • limitations on direct sending (no sending to a submodule of another module, unless mapped to the same processor) • lookahead must be present in the form of link delays • currently we only support static topologies (this can be improved) • Models run without modification (no special instrumentation needed) • Partitioning is part of configuration, no model change required • follows “separation of model from experiments” principle • Code will be publicly released before end 2003 (available on request until then)
Extensible PDES Architecture • Pluggable communication library (“transport layer”): • currently implemented: • MPI (Message Passing Interface), • named pipes, • shared directory (for demonstration and debug purposes only) • Pluggable PDES algorithm: • currently implemented: • Null Message Algorithm, • Ideal Simulation Protocol (for benchmarking), • no synchronization (to demonstrate the need for synchronization)
Parallel Simulation Architecture Simulation Model Simulation Kernel Parallel simulation subsystem Synchronization Event scheduling, sending, receiving Partition (LP) Communication communications library (MPI, sockets, etc.)
Communication Layer • Must implement the following abstract interface: /** * Provides an abstraction layer above MPI, * PVM, shared-memory communications, etc… */ class cParsimCommunications { virtual void init() = 0; virtual void shutdown() = 0; virtual cCommBuffer *createCommBuffer() = 0; virtual void recycleCommBuffer(cCommBuffer *buffer) = 0; virtual void send(cCommBuffer *buffer, int tag, int destination) = 0; virtual void boadcast(cCommBuffer *buffer, int tag) = 0; virtual void receiveBlocking(cCommBuffer *buffer, int& rcvdTag,int& srcProcId) = 0; virtual bool receiveNonblocking(cCommBuffer *buffer, int& rcvdTag, int& srcProcId) = 0; virtual void synchronize() = 0; }; class cMPICommunications : public cParsimCommunications { … }; class cNamedPipeCommunications : public cParsimCommunications { … }; class cFileCommunications : public cParsimCommunications { … }; Communication buffers encapsulate pack/unpack operations. The cCommBuffer interface (abstract class) has multiple implementations for MPI, etc. Simulation objects are able to pack/unpack themselves to/from communication buffers, using methods from the cCommBuffer interface.
Model Partitioning • OMNeT++ uses placeholder modules and proxy gates: CPU0 nodeA nodeB (placeholder) communication (MPI, pipe, etc.) CPU1 nodeA (placeholder) nodeB
Model Partitioning, cont’d • If compound modules themselves are distributed across LPs, the solution is slightly more complicated: (placeholder for compound module) simple module CPU0 (placeholder) CPU1 simple module (placeholder) (placeholder) CPU2 (placeholder) simple module
Placeholder Approach • Advantage of placeholder approach: when simulating telecommunication networks, all nodes(routers, ASes, hosts, etc)are present(at least as placeholders)in all LPs, so algorithms such as topology discovery for routing can proceed unhindered. placeholders LP1 (on CPU1)
Synchronization Layer /** * Abstract base class for parallel simulation algorithms... */ class cParsimSynchronizer : public cScheduler { virtual void startRun() = 0; virtual void endRun() = 0; /** * Scheduler function -- it comes from cScheduler interface... */ virtual cMessage *getNextEvent() =0; /** * Hook, called when a cMessage is sent out of the segment... */ virtual void processOutgoingMessage(cMessage *msg, int procId, int moduleId, int gateId, void *data) = 0; }; • Parallel simulation protocols must implement the following abstract interface:
Synchronization Layer • Currently implemented parallel simulation algorithms:
Example: Distributed CQN CPU0 CPU1 CPU2 • Closed Queuing Network (CQN) described in the “Performance Evaluation of Conservative Algorithms”, R. Bagrodia et al., 2000 • N tandem queues (switch+queues); exponential service times; propagation delay all links link • Lookahead: propagation delay on links
Example: Distributed CQN • OMNeT++ model for CQN • wraps tandems into compound modules
Configuring for Parallel Execution enable parallel simulation Configuration file: [General] parallel-simulation=true #parsim-communications-class=“cFileCommunications" parsim-communications-class="cMPICommunications" parsim-synchronization-class= "cNullMessageProtocol" [Partitioning] *.tandemQueue[0]*.segment-id=0 *.tandemQueue[1]*.segment-id=1 *.tandemQueue[2]*.segment-id=2 select communication library and parallel simulation protocol assign modules to processors Each partition is simulated in its own process.
CQN Partitioning in Tkenv If simulation executes under the GUI, placeholder modules and proxy gates are shown
Running Parallel Simulation If GUI is used, operation of the Null Message Algorithm can be followed in trace windows
Experimental Results Present simulation framework was used to verify the efficiency criterion for the Null Message Algorithm: LE >> 1 and λ=LE/P >> 1 are necessary for efficient PDES execution see paper “A Practical Efficiency Criterion For The Null Message Algorithm”,András Varga, Y. Ahmet Şekerciuğlu, Gregory K. Egan in the Proceedings
Ongoing Work • Optimisations on the parallel simulation kernel • Create support for node mobility across LPs • Test large-scale IPv6 simulations (using the IPv6Suite for OMNeT++, developed at CTIE, Monash University, Australia) • Further verification and refinement of the efficiency criteria for the Null-Message Algorithm