210 likes | 341 Views
Exploring Multi-Grained Parallelism in Compute-Intensive DEVS Simulations. Qi (Jacky) Liu and Gabriel Wainer Department of Systems and Computer Engineering Carleton University Ottawa, Canada. Outline. Motivation & Background. Fine-Grained Event Parallelism. Event Processing Kernel.
E N D
Exploring Multi-Grained Parallelism in Compute-Intensive DEVS Simulations Qi (Jacky) Liu and Gabriel Wainer Department of Systems and Computer Engineering Carleton University Ottawa, Canada
Outline Motivation & Background Fine-Grained Event Parallelism Event Processing Kernel Parallel DEVS Simulation on Cell Experimental Results Conclusion & Future Work
Motivation • Accelerate general-purpose DEVS-based simulations on heterogeneous CMP architectures like the Cell processor • Develop new parallelization strategies based on fine-grainedevent-level parallelism inherent in the simulation process • Exploit multi-grained parallelismsimultaneously at different levels of the system • Allow general users to gain performance transparently w/o being distracted by multicore programming details • Provide some generalizable methods & insight for PDES on emerging CMP architectures
Cell Processor Overview • Nine-core heterogeneous CMP with two distinct ISAs • Software-managed LS with explicitly-addressed DMA transfer • Low-latency EIB channels – 32-bit mailbox & signal messages
Parallel DEVS (P-DEVS) Formalism Discrete-EVent System Specification (DEVS) • Cell-DEVS Formalism
Structured Simulation Process Parallel Simulation with CD++ • Flat LP Structure • (I) LP and model init. • (@) model output • (*) model state trans. • (D) model sync. • (X) model input data • (Y) model output data
Fine-Grained Event Parallelism • Event-embarrassing parallelism • Independent events within a step • Executed in an arbitrary order • Event-streaming parallelism • Causally-related events between consecutive steps • Executed in a pipelined fashion • Phase-changing events • Exchanged between NC & FC • Natural fork & join points • Data-flow oriented parallelization
SEK Concurrent exec. across SPEs - 98.02% (event-embarrassing parallelism) Pipelined exec. between PPE & SPEs - 1.15% (event-streaming parallelism) Event Processing Kernel • Hydrological Watershed Simulation • 320×320×2 with 204,800 Simulators • Compute-intensive state transitions • Over 300 million events across 663 phases • Cell-DEVS model defined in CD++ spec. lang. • Simulation Profile on the PPE
EVENT-STREAMING PARALLELISM (TWO-STAGE PIPELINE) Parallel DEVS Simulation on Cell - Overview COMPUTE-I/O PARALLELISM THREAD PARALLELISM VECTOR PARALLELISM (SPE SIMD) EVENT-EMBARRASSING PARALLELISM DATA-STREAMING PARALLELISM (DOUBLED-BUFFERED DMA AT THREE LAYERS)
Parallel DEVS Simulation on Cell – LP Virtualization • Purpose • Map active Simulators to a limited group of SPE threads • Fit into the small on-chip LS • Assign each SPE a reusable task operating on a stream of data • Facilitate fine-grained dynamic load-balancing between SPEs • Solution • Turn Simulators (and associated atomic models) into virtual LPs • Separate event-processing logic (wrapped in SPE threads) from state data (maintained in main memory buffers) • Match the states of active Simulators to available SPE threads dynamically at each virtual time – SEK job scheduling
Virtual Simulator State Mgmt. • Decentralized Event Mgmt. Parallel DEVS Simulation on Cell – More Details
Rule Evaluation on SPEs • SEK Job Scheduling Parallel DEVS Simulation on Cell – More Details
IBM BladeCenter QS22 3.2GHz PowerXCell 8i × 2 32GB RAM Red Hat Enterprise Linux 5.2 IBM SDK for Multicore Acceleration 3.1 Platform and Configuration • Parallel DEVS simulator on Cell CD++/Cell • SEK job scheduling policy round-robin or shortest-queue-first • CD++ event-logging turned off minimize the impact of file I/O
Total Simulation Time with Watershed Model • Performance gain with just one SPE 5.84× • OO C++ code on PPE vs. SIMD-aware C code on SPEs • memory latency & cache miss vs. data locality & double-buffered DMA • Low-level optimizations on SPEs (LS data alignment, call stack usage, branch minimization, loop unrolling, in-line substitution, pipelined event execution) • Overall performance with 8 SPEs 33.06×
Speedups over (PPE with 1 SPE) Version • Speedup grows slower with more and more SPEs • Higher overhead for SEK job scheduling and orchestration • Increased DMA contention & channel stalls
Conclusion • Formalism-Based Design Methodology • Facilitate model reuse & portability • Reduce validation & verification cost • Performance-Centric Approach • Accelerate event processing for compute-intensive DEVS models • Minimize communication & synchronization overhead • Achieve fine-grained dynamic load balancing • New Parallelization Strategy for PDES • Exploit fine-grained event parallelism from a data-flow perspective • Combine multi-grained parallelism at different system levels • Break LP boundaries with LP virtualization • Insight for PDES on Heterogeneous CMP Architectures • Match workload characteristics to functional specialization of cores • Address data locality, memory latency, & code optimization issues
Future Work • Porting different types of models to Cell performance testing • Transparency • Minimal knowledge (and learning curve) from users • Integrating with existing conservative/optimistic approaches • Combine cluster-level LP-based conservative simulation • Using both synchronous & asynchronous algorithms • Combine cluster-level Time Warp optimistic simulation • Using Lightweight Time Warp (DS-RT 2008, PADS 2009) • Testing on large-scale hybrid supercomputers • Using Cell processor in new ways 18/18
Questions? This research was supported in part by the MITACS Accelerate Ontario program, Canada, and by the IBM T. J. Watson Research Center, NY. liuqi@sce.carleton.ca http://www.sce.carleton.ca/~liuqi/ ARS Lab: http://cell-devs.sce.carleton.ca/ars/
Some Applications • Defense & Emergency Planning Battlefield Simulations Crowd Behavior & Evacuation Analysis
Some Applications • Biomedical & Environmental Analysis Deformable Membrane Presynaptic Nerve Krebs Cycle in living organisms Forest fire propagation Watershed formation