90 likes | 186 Views
Lessons Learned From the MPI-Hybrid Parallelism for Streamlines on Large Multi-Core Clusters Project.
E N D
Lessons Learned From the MPI-Hybrid Parallelism for Streamlines on Large Multi-Core Clusters Project E. WES BETHEL (LBNL), CHRIS JOHNSON (UTAH), KEN JOY (UC DAVIS), SEAN AHERN (ORNL), VALERIO PASCUCCI (LLNL), JONATHAN COHEN (LLNL), MARK DUCHAINEAU (LLNL), BERND HAMANN (UC DAVIS), CHARLES HANSEN (UTAH), DAN LANEY (LLNL), PETER LINDSTROM (LLNL), JEREMY MEREDITH (ORNL), GEORGE OSTROUCHOV (ORNL), STEVEN PARKER (UTAH), CLAUDIO SILVA (UTAH), XAVIER TRICOCHE (UTAH), ALLEN SANDERSON (UTAH), HANK CHILDS (LLNL) David Camp(IDAV) www.vacet.org
MPI-Hybrid • Other VACET projects have shown good performance gains with MPI-Hybrid • This project wanted to explore MPI-Hybrid style with two standard Streamlines algorithms, LOD and Static Domains • Talk about some of the problems encountered and performance gains www.vacet.org
Baseline test for MPI-Hybrid • Original MPI test • Ran in 100 seconds, on 128 cores • First MPI-Hybrid Test • Ran in ~20,000 seconds, on 128 cores • Final MPI-Hybrid Test, After many fixes • Ran in 15 seconds, on 128 cores www.vacet.org
VTK • At the heart of VTK pipeline is the data time stamp • This is used to drive their data flow model • Every action in VTK changes the data time stamp • vtkTimeStamp::Modified() • A small test found • Call ~1,000,000 times • Found a pthread_mutex_lock to protect the time stamp www.vacet.org
Crashing in VTK • VTK – Thread Safe? • Documentation said Thread Safe • Look like memory corruption • VTK – Documents Say Thread Safe • But many function where defined “Not Thread Safe” • Some “This Method is Thread Safe if first called from a Single Thread and the dataset is not Modified” • Real Answer is VTK is not Thread Safe • vtkObjectBase did not protect it reference count variable, so Data Concurrency was lost. • Memory was being delete before it life time had truly ended www.vacet.org
C++ Exception Across Share Libraries • Streamline code used an Exception to handle data boundary condition • Linux used a pthread_mutex_lock to handle this Execption • Code was change to remove the exception www.vacet.org
VTK – Object Creation • VTK forces you to use it’s New function • VTK uses a factory method pattern • vtkObjectFactory • Used to override VTK classes with custom versions. • It used strcmp to match object • Strcmp was the most called function in the Streamlines test www.vacet.org
I/O • Found that MPI I/O was better • They where doing multi-I/O operations by default by running four process per node • Changed the Streamline code to thread I/O www.vacet.org
Conclusion – Hard Work Pays Off • Original MPI test • Run on Jaguar • 100 seconds (10,000 Streamlines 128 cores) • Original MPI test with code improvements • Run on Franklin • 45 seconds (20,000 Streamlines 128 cores) • MPI-Hybrid test • Run on Franklin • 15 seconds (20,000 Streamlines 128 cores) www.vacet.org