190 likes | 296 Views
Optimization and evaluation of parallel I/O in BIPS3D parallel irregular application. Performance Modelling, Evaluation, and optimization of Parallel and Distributed Systems (PMEO-PDS 2007) Rosa Filgueira, David E. Singh, Florin Isaila, Jesús Carretero, Antonio G. Loureiro. Sumary.
E N D
Optimization and evaluation of parallel I/O in BIPS3D parallel irregular application Performance Modelling, Evaluation, and optimization of Parallel and Distributed Systems (PMEO-PDS 2007) Rosa Filgueira, David E. Singh, Florin Isaila, Jesús Carretero, Antonio G. Loureiro
Sumary • I. Description of the problem • II. Main objetives • III. Parallel I/O storage • IV. Evaluation • V. Optimization the I/O • VI. Conclusions
I. Description of the problem (I) • BISP3D is a semiconductor devices simulator based on finite element methods. • Optimization and evaluation of parallel I/O for the BISP3D .
I. Description of the problem (II) • The mesh is divided into several sub-domains (METIS). • Each processor makes calculations only with local data. • The results are stored in a sequential way. • The sequential storage is an important bottleneck.
II. Main objectives (I) • Objetives: • Evaluation of the sequential I/O cost. • Implementing parallel I/O techniques. • Developing a method for selecting the most appropriate I/O technique based on the network type, mesh size and data set size. • Introducing a new data clustering technique called Interval Data Grouping (IDG).
II. Main objectives (II) • Several I/O configurations has been implemented andevaluated: • Sequential I/O over NFS. • Sequential I/O over PVFS. • Parallel I/O over PVFS (unoptimized). • Parallel I/O over PVFS with two phase I/O. • Parallel I/O over PVFS with List I/O.
III. Parallel I/O • All processors write on the disk their local data. • Each processor constructs a view over the file using the distibution provided by METIS. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Metis distribution for partition 0 Metis distribution for partition 1 1 2 4 8 9 11 12 13 14 15 View over the file for processor 1 3 5 6 7 10 16 17 19 20 View over the file for processor 2
IV. Evaluation (I) • We have make tests: • Different networks (Myrinet and Fast Ethernet), • Different meshes.
IV. Evaluation (II) • Using a parameter (Load) we increase the size of the mesh • Note that with this parameter we change the grain size of the acceses 1 2 3 4 5 Mesh 1 1’ 2 2’ 3 3’ 4 4’ 5 5’ Mesh with load 2 1 1’ 1’’ 2 2’ 2’’ 3 3’ 3’’ 4 4’ 4’’ 5 5’ 5’’ Mesh with load 3
IV. Evaluation: Myrinet Two phase List I/O
IV. Evaluation: : Fast Ethernet List I/O
IV. Evaluation: Decision tree Nx=70,000 Nld=50 Nld=90
V. Optimizing the I/O • We introduce a novel technique of data grouping: Interval Data Grouping (IDG) • The goal of IDG: grouping data for I/O in order to increase the locality and reduce the disk write time.
V. Optimizing the I/O : Distribution of example mesh Local 0 1 2 3 4 5 6 7 Shared METIS assignation 0 1 3 5 7 Processor 0 2 4 6 Processor 1 BISP3D Data distribution 0 1 2 3 4 5 7 Processor 0 1 2 4 5 6 7 Processor 1
V. Optimizing the I/O : Distribution of example mesh (II) • IDG algorithm has two stages: • Node classification: • Analyze the mesh structure and Metis distribution to clasifying mesh node (shared or local): • Disk access scheduler: • For local nodes they are written by processor which belong to • For shared nodes we have to choose the most appropriate one looking its previous and subsequent node.
V. Optimizing the I/O : Distribution of example mesh (III) Local 0 1 2 3 4 5 6 7 Shared METIS assignation 0 1 3 5 7 Processor 0 2 4 6 Processor 1 BISP3D Data distribution 0 1 2 3 4 5 7 Processor 0 IDG distribution 0 1 2 3 4 5 Processor 0 1 2 4 5 6 7 Processor 1 6 7 Processor 1
V. Optimizing the I/O : evaluation (I) • We have combined IDG with List I/O for different meshes and different loads. • We have compared the IDG performance with other strategies: • METIS Original node distribution. • Random Each shared node is assigned to partition radomly. • First Position Each shared node is assigned to the first particion among all that it belongs.
VI. Conclusions • Optimization and evaluation of parallel I/O operations for BISP3D simulator. • A decision tree to choose the best I/O configuration was made. • We have introduced a novel technique which exploits the data replication of mesh nodes for scheduling disk accesses .With this proposal the perfomance of the parallel I/O operations is improved.