10 likes | 111 Views
Efficient I/O Using Dedicated Cores in Large-Scale HPC Simulations. Matthieu Dorier, ENS Cachan Brittany, IRISA, matthieu.dorier@irisa.fr. Challenges HPC simulations running on over 100.000 cores Petabytes of data to be stored for subsequent visualization and analysis
E N D
Efficient I/O Using Dedicated Cores in Large-Scale HPC Simulations Matthieu Dorier, ENS Cachan Brittany, IRISA, matthieu.dorier@irisa.fr • Challenges • HPC simulations running on over 100.000 cores • Petabytes of data to be stored for subsequent visualization and analysis • Heavy pressure on the file system • Huge storage space requirements • How to efficiently write large data? • How to efficiently retrieve scientific insights? The Damaris Approach: Dedicated I/O Cores core In-Situ Visualization Memory core core core Image by Roberto Sisneros Files core Memory core core core Asynchronous Storage Multicore SMP node • Damaris – Leave a core, go faster! • Dedicateone or a few coresper SMP node • Communicate data through shared-memory • Through an adaptable plugin system, use this core to asynchronously… program example real, dimension(64,16,2) :: my_data ... calldc_initialize("my_config.xml”,…) calldc_write(”temperature", my_data, ierr) calldc_signal(”do_statistics”, ierr) calldc_end_iteration(ierr) calldc_finalize(ierr) ... end program example <meshname=“my3Dgrid” type=“rectilinear”> <coordinatename=“x”/> <coordinatename=“y”/> </mesh> <layoutname=“my3Dlayout” dim=“3,5*N/2”/> <variablename=“temperature” layout=“my3Dlayout” mesh=“my3Dgrid”/> <eventname=“do_statistics” library=“mylib.so” action=“my_function”/> • Process, compress and aggregate the data • Write it to files • Analyze it while the simulation runs • Transparently connect to visualization backends • Check out an online demo from your smartphone! • Experiments on up to 9216 cores of Kraken with the CM1 atmospheric simulation • Damaris… • Achieves a nearly perfect scalability, shows a more than 3x speedup compared to collective-I/O • Improves the aggregate throughput by a factor of 15 compared to collective-I/O • Completely hides the I/O performance variability from the point of view of the simulation • Globally improves the overall application run-time • Aggregates data in bigger files and allows an overhead-free 600% compression ratio • Spares time in dedicated cores that can be used for in-situ visualization • Experiments on 816 cores of Grid’5000 • With the Nek5000 CFD code • Damaris… • Completely hides the run-time impact of in-situ visualization and analysis within dedicated cores • Provides interactivity through a connection with the VisIt software Without Damaris The VisIt visualization software connects to the running simulation to perform in-situ visualization All cores used by Nek5000, no visualization. Damaris was evaluated on JaguarPF (ORNL) Kraken (NICS) Intrepid (ANL) BlueWaters (NCSA) Grid’5000 Damaris constitutes one of the first results of the JLPC to be validated for use on BlueWaters Nek5000 With Damaris Image by Matthieu Dorier The dedicated cores are used to perform in-situ visualization in the background, without impacting Nek5000. 1 core out of 24 in each node is dedicated. No visualization. CM1 Images by Leigh Orf http://damaris.gforge.inria.fr