130 likes | 145 Views
Explore uses of High Performance PDES kernel at ORNL for global and local events, emergencies, network simulation, and more. Discover the unique features of this scalable micro-kernel for real-time execution with various synchronization modes. Learn about its application diversity, scalable mixed-mode kernel capabilities, and future defense system support.
E N D
Parallel Discrete Event Simulation (PDES) at ORNL Kalyan S. Perumalla, Ph.D. Oak Ridge National Laboratory
PDES: Selected application areas Global and local events Emergencies • Network simulation • Internet protocols, security, P2P designs, … • Traffic simulation • Emergency planning/ response, environmental policy analysis, urban planning, … • Social dynamics simulation • Operations planning, foreign policy, marketing, … • Sensor simulations • Wide area monitoring, situational awareness, border surveillance, … • Organization simulations • Command and control, business processes, … Protection and awareness systems Current and future defense systems
High Performance PDES kernel requirements • Global time synchronization • Total time-stamped ordering of events • Paramount for accuracy • Fast synchronization • Scalable, application-independent, time-advance mechanisms • Critical for real-time and as-fast-as-possible execution • Support for fine-grained events • Minimal overhead relative to event processing times • Application computation is typically only 5 µs to 50 µs per event • Conservative, optimistic, and mixed modes • Need support for the principal synchronization approaches • Useful to choose mode on per-entity basis at initialization • Desirable to vary mode dynamically during simulation • General-purpose API • Reusable across multiple applications • Accommodates multiple techniques • Lookahead, state saving, reverse computation, multicast, etc.
70 3 5 60 3 0 50 2 5 2 0 40 Aggregate events/s (M) µs per event 1 5 30 1 0 20 5 10 0 0 0 1 2 8 2 5 6 3 8 4 5 1 2 6 4 0 7 6 8 8 9 6 1 0 2 4 128 256 512 1024 LP = 1 thousand, MSG = 1 million LP = 1 million, MSG = 100 million LP = 1 million, MSG = 1 billion Number of processors Number of processors LP = 1 thousand, MSG = 1 million LP = 1 million, MSG = 100 million LP = 1 million, MSG = 1 billion µsik—unique PDES “micro-kernel” Unique mixed-mode kernel • The only scalable mixed-mode kernel in the world • Supports conservative, optimistic, and mixed modes in a single kernel Used in a variety of applications • DES-based vehicular traffic models • DES-based plasma physics models • DES-based neurological models • Largest internet simulations • Some recent results of fine-grained PDES benchmark (phold) • Among the largest/fastest scalability results in parallel discrete event simulation
17000 50 Conservative Optimistic 13000 40 Conservative Mixed 30 Speedup 9000 µs per event Mixed 20 5000 Optimistic 10 1000 0 0 2,000 4,000 6,000 8,000 10,000 12,000 14,000 16,000 18,000 0 2,000 4,000 6,000 8,000 10,000 12,000 14,000 16,000 18,000 Number of processors Number of processors 1.4 600 Conservative Mixed Optimistic 1.2 500 Conservative 1.0 400 Mixed 0.8 Millions of events/s Parallel efficiency 300 Optimistic 0.6 200 0.4 100 0.2 0 0 2,048 4,096 8,192 16,384 0 2000 4000 6000 8000 10000 12000 14000 16000 18000 Number of processors Number of processors µsik scaled to over 104 processors • Some recent results of fine-grained PDES benchmark • On Blue Gene Watson (BGW) at IBM TJ Watson Research Center • Well-known PHOLD benchmark, with 1 million logical processes, 10 million pucks • The largest and fastest scalability results in PDES recorded to date
µsik micro-kernel internals Micro-kernel User LPs Future event listProcessed event listLocal virtual time LP PEL→t LVT FEL→t Pc LP LP ECTS Q LP LP Commitable • When update kernel Q’s? • New LP added or deleted • LP executes an event • LP receives an event Pp EPTS Q Kernel LPs Processable KP LP = Logical processKP = Kernel processECTS = Earliest committable time stampEPTS = Earliest processable time stampEETS = Earliest emittable time stampPEL = Processed event listFEL = Future event listLVT = Local virtual time Pe KP KP EETS Q KP Emittable
libSynk: µsik’s synchronization core µsik TM µsik Process µsik Process µsik Process TM Red TM Null RM RM Bar libSynk FM OS/Hardware FM Myr FM TCP Network FM ShM FM MPI X Y Implies X uses Y
µsik micro-kernel capabilities • μsik is currently able to support the following: • Lookahead-based conservative and/or optimistic execution • Reverse computation-based optimistic execution • Checkpointing-based optimistic execution • Resilient optimistic execution (zero rollbacks) • Constrained, out-of-order execution • Preemptive event processing • Any combinations of the above • Automated, network-throttled flow control • User-level event retraction • Process-specific limits to optimism • Dynamic process addition/deletion • Shared and/or distributed memory execution • Process-oriented views • It accommodates addition of the following: • Synchronized multicast • Optimistic dynamic memory allocation • Automated load-balancing
SimDriver SimComm Tiny Viz Script Interpreter Tython Python Reflected Simulation Script External Event Buffer SensorNet: Parallel simulation/ immersive test-bed SAF Weather Model TOSSIM Comm. Models Plume Models SCIPUFF Weather Simulator Sensor Net Simulator Operations Simulator PlumeModels LiveDevices Environ-mentalModel Proxy CB Sensor Network Testbed Framework Current Envisioned • Seamless integrated testbed to incorporate a variety of important simulations, stimulations, and live devices • Achieves unified capabilities and significant fidelity for test and evaluation of CB sensor device-based designs, concepts, and operations
Sensor Network Layout 0 1 6 11 16 21 Base Station 2 7 12 17 22 3 8 13 18 23 Source Release 4 9 14 19 24 5 10 15 20 25 40 No Winds Winds Turbulent 30 20 10 0 Mean Concentration (10-6 kgs/m3) 40 No Winds Winds Turbulent 30 20 10 0 300 400 500 600 Time (s) SensorNet: Simulation-based analysis for plume tracking • Environmental phenomenon exhibits high variability • Phenomenon drives the sensor network’s computation and communication • Trace gathered at base station of sensed phenomenon reflects high variability • Communication effects induce unpredictable gaps in series • Accurate, integrated simulation of phenomenon and communication captures complex interdependencies 50.5 No Winds Kgs/cm3 48.5 1.00E-02 46.6 1.00E-03 1.00E-04 44.6 1.00E-05 1.00E-06 42.6 1.00E-07 Surface Dosage CHEM at T = 17.0 h 1.00E-08 40.6 50.5 Light Winds 48.5 46.6 Latitude 44.6 42.6 Surface Dosage CHEM at T = 17.0 h 40.6 50.5 Turbulent Winds 48.5 46.6 44.6 Surface Dosage CHEM at T = 17.0 h 42.6 40.6 -96.0 -91.2 -86.4 Longitude
SCATTER: Ultra-Scale PDES-based mobility simulations Our approach: SCATTER • DES models • vs time-stepped • Parallel execution • vs sequential • Scalability to high-performance computing • 102–103 CPUs • Important behaviors • Kinetic + non-kinetic • Scalable tool for transportation and energy/event/emergency research • Regional scale: multiple states • 106–107 intersections • Current tool capabilities • At most 104 intersections • Faster than real time is very useful Faster Network flow methods SCATTER Good for rough estimates of evacuation delay, . . . Speed OREMS Desirable Fidelity Lower accuracy Higher accuracy CORSIM Good for emissions estimates, traffic analysis, . . . MITSIM TRANSIMS Slower
Parallel Speedup Event processing speed Nodes = 9 Nodes = 1089 Speedup over sequential run µs/event Number of processors Number of vehicles SCATTER: Benchmark performance Speedup over real-time Significantly faster than real time with 1 million vehicles! Nodes = 9 Real time/simulated time Nodes = 1089 Number of vehicles Very low event processing overhead (ms)! Significant speedup with parallel execution!
Contacts Kalyan S. Perumalla Oak Ridge National Laboratory (865) 241-1315 perumallaks@ornl.gov 13 Perumalla_PDES_0611