1 / 13

Parallel Discrete Event Simulation (PDES) at ORNL

Explore uses of High Performance PDES kernel at ORNL for global and local events, emergencies, network simulation, and more. Discover the unique features of this scalable micro-kernel for real-time execution with various synchronization modes. Learn about its application diversity, scalable mixed-mode kernel capabilities, and future defense system support.

artis
Download Presentation

Parallel Discrete Event Simulation (PDES) at ORNL

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parallel Discrete Event Simulation (PDES) at ORNL Kalyan S. Perumalla, Ph.D. Oak Ridge National Laboratory

  2. PDES: Selected application areas Global and local events Emergencies • Network simulation • Internet protocols, security, P2P designs, … • Traffic simulation • Emergency planning/ response, environmental policy analysis, urban planning, … • Social dynamics simulation • Operations planning, foreign policy, marketing, … • Sensor simulations • Wide area monitoring, situational awareness, border surveillance, … • Organization simulations • Command and control, business processes, … Protection and awareness systems Current and future defense systems

  3. High Performance PDES kernel requirements • Global time synchronization • Total time-stamped ordering of events • Paramount for accuracy • Fast synchronization • Scalable, application-independent, time-advance mechanisms • Critical for real-time and as-fast-as-possible execution • Support for fine-grained events • Minimal overhead relative to event processing times • Application computation is typically only 5 µs to 50 µs per event • Conservative, optimistic, and mixed modes • Need support for the principal synchronization approaches • Useful to choose mode on per-entity basis at initialization • Desirable to vary mode dynamically during simulation • General-purpose API • Reusable across multiple applications • Accommodates multiple techniques • Lookahead, state saving, reverse computation, multicast, etc.

  4. 70 3 5 60 3 0 50 2 5 2 0 40 Aggregate events/s (M) µs per event 1 5 30 1 0 20 5 10 0 0 0 1 2 8 2 5 6 3 8 4 5 1 2 6 4 0 7 6 8 8 9 6 1 0 2 4 128 256 512 1024 LP = 1 thousand, MSG = 1 million LP = 1 million, MSG = 100 million LP = 1 million, MSG = 1 billion Number of processors Number of processors LP = 1 thousand, MSG = 1 million LP = 1 million, MSG = 100 million LP = 1 million, MSG = 1 billion µsik—unique PDES “micro-kernel” Unique mixed-mode kernel • The only scalable mixed-mode kernel in the world • Supports conservative, optimistic, and mixed modes in a single kernel Used in a variety of applications • DES-based vehicular traffic models • DES-based plasma physics models • DES-based neurological models • Largest internet simulations • Some recent results of fine-grained PDES benchmark (phold) • Among the largest/fastest scalability results in parallel discrete event simulation

  5. 17000 50 Conservative Optimistic 13000 40 Conservative Mixed 30 Speedup 9000 µs per event Mixed 20 5000 Optimistic 10 1000 0 0 2,000 4,000 6,000 8,000 10,000 12,000 14,000 16,000 18,000 0 2,000 4,000 6,000 8,000 10,000 12,000 14,000 16,000 18,000 Number of processors Number of processors 1.4 600 Conservative Mixed Optimistic 1.2 500 Conservative 1.0 400 Mixed 0.8 Millions of events/s Parallel efficiency 300 Optimistic 0.6 200 0.4 100 0.2 0 0 2,048 4,096 8,192 16,384 0 2000 4000 6000 8000 10000 12000 14000 16000 18000 Number of processors Number of processors µsik scaled to over 104 processors • Some recent results of fine-grained PDES benchmark • On Blue Gene Watson (BGW) at IBM TJ Watson Research Center • Well-known PHOLD benchmark, with 1 million logical processes, 10 million pucks • The largest and fastest scalability results in PDES recorded to date

  6. µsik micro-kernel internals Micro-kernel User LPs Future event listProcessed event listLocal virtual time LP PEL→t LVT FEL→t Pc LP LP ECTS Q LP LP Commitable • When update kernel Q’s? • New LP added or deleted • LP executes an event • LP receives an event Pp EPTS Q Kernel LPs Processable KP LP = Logical processKP = Kernel processECTS = Earliest committable time stampEPTS = Earliest processable time stampEETS = Earliest emittable time stampPEL = Processed event listFEL = Future event listLVT = Local virtual time Pe KP KP EETS Q KP Emittable

  7. libSynk: µsik’s synchronization core µsik TM µsik Process µsik Process µsik Process TM Red TM Null RM RM Bar libSynk FM OS/Hardware FM Myr FM TCP Network FM ShM FM MPI X Y Implies X uses Y

  8. µsik micro-kernel capabilities • μsik is currently able to support the following: • Lookahead-based conservative and/or optimistic execution • Reverse computation-based optimistic execution • Checkpointing-based optimistic execution • Resilient optimistic execution (zero rollbacks) • Constrained, out-of-order execution • Preemptive event processing • Any combinations of the above • Automated, network-throttled flow control • User-level event retraction • Process-specific limits to optimism • Dynamic process addition/deletion • Shared and/or distributed memory execution • Process-oriented views • It accommodates addition of the following: • Synchronized multicast • Optimistic dynamic memory allocation • Automated load-balancing

  9. SimDriver SimComm Tiny Viz Script Interpreter Tython Python Reflected Simulation Script External Event Buffer SensorNet: Parallel simulation/ immersive test-bed SAF Weather Model TOSSIM Comm. Models Plume Models SCIPUFF Weather Simulator Sensor Net Simulator Operations Simulator PlumeModels LiveDevices Environ-mentalModel Proxy CB Sensor Network Testbed Framework Current Envisioned • Seamless integrated testbed to incorporate a variety of important simulations, stimulations, and live devices • Achieves unified capabilities and significant fidelity for test and evaluation of CB sensor device-based designs, concepts, and operations

  10. Sensor Network Layout 0 1 6 11 16 21 Base Station 2 7 12 17 22 3 8 13 18 23 Source Release 4 9 14 19 24 5 10 15 20 25 40 No Winds Winds Turbulent 30 20 10 0 Mean Concentration (10-6 kgs/m3) 40 No Winds Winds Turbulent 30 20 10 0 300 400 500 600 Time (s) SensorNet: Simulation-based analysis for plume tracking • Environmental phenomenon exhibits high variability • Phenomenon drives the sensor network’s computation and communication • Trace gathered at base station of sensed phenomenon reflects high variability • Communication effects induce unpredictable gaps in series • Accurate, integrated simulation of phenomenon and communication captures complex interdependencies 50.5 No Winds Kgs/cm3 48.5 1.00E-02 46.6 1.00E-03 1.00E-04 44.6 1.00E-05 1.00E-06 42.6 1.00E-07 Surface Dosage CHEM at T = 17.0 h 1.00E-08 40.6 50.5 Light Winds 48.5 46.6 Latitude 44.6 42.6 Surface Dosage CHEM at T = 17.0 h 40.6 50.5 Turbulent Winds 48.5 46.6 44.6 Surface Dosage CHEM at T = 17.0 h 42.6 40.6 -96.0 -91.2 -86.4 Longitude

  11. SCATTER: Ultra-Scale PDES-based mobility simulations Our approach: SCATTER • DES models • vs time-stepped • Parallel execution • vs sequential • Scalability to high-performance computing • 102–103 CPUs • Important behaviors • Kinetic + non-kinetic • Scalable tool for transportation and energy/event/emergency research • Regional scale: multiple states • 106–107 intersections • Current tool capabilities • At most 104 intersections • Faster than real time is very useful Faster Network flow methods SCATTER Good for rough estimates of evacuation delay, . . . Speed OREMS Desirable Fidelity Lower accuracy Higher accuracy CORSIM Good for emissions estimates, traffic analysis, . . . MITSIM TRANSIMS Slower

  12. Parallel Speedup Event processing speed Nodes = 9 Nodes = 1089 Speedup over sequential run µs/event Number of processors Number of vehicles SCATTER: Benchmark performance Speedup over real-time Significantly faster than real time with 1 million vehicles! Nodes = 9 Real time/simulated time Nodes = 1089 Number of vehicles Very low event processing overhead (ms)! Significant speedup with parallel execution!

  13. Contacts Kalyan S. Perumalla Oak Ridge National Laboratory (865) 241-1315 perumallaks@ornl.gov 13 Perumalla_PDES_0611

More Related