240 likes | 390 Views
Data Flow Pattern Analysis of Scientific Applications. Michael Frumkin Parallel Systems & Applications Intel Corporation May 6, 2005. Outline. Why Data Flow Pattern Analysis? CFD Applications The NAS Parallel Benchmarks The NAS Grid Benchmarks Trace File Analysis Conclusions.
E N D
Data Flow Pattern Analysis of Scientific Applications Michael Frumkin Parallel Systems & Applications Intel Corporation May 6, 2005
Outline • Why Data Flow Pattern Analysis? • CFD Applications • The NAS Parallel Benchmarks • The NAS Grid Benchmarks • Trace File Analysis • Conclusions
Why Data Flow Pattern Analysis? • Scientific applications • model few natural processes • new effects are added infrequently • influence on the existing data flows are insignificant • Knowledge of data flow in program helps with • program understanding • program optimization, parallelization, multithreading • building application performance model
Design of Scientific Applications • Time represented as an outer loop • Iterations over time step • Space is represented by structured/unstructured grids • Important for understanding data locality • Data access patterns • Spatial parallelism • Physics is represented by an operator at each grid point • Data flow • Operator level of parallelism/dependence
CFD Data Flow Patterns • Solve the Navier-Stokes equation K(ui+1)=Lui • u is five-dimensional vector • K is non-linear operator • Solver • RHS computation
ADI method x-solve y-solve z-solve ADI Pattern • ADI method K~Kx*Ky*Kz • Multilevel parallelism y-solve x-solve Multipartition z-solve
Explicit Operators • Stencil operators (explicit methods) • At each point of a 3-dimensional mesh apply: seven-point 27-point
Lower-Upper Triangular Dependence Matrices ( ) ( ) • Two-dimensional pipeline • Hyperplane algorithm -1 0 0 1 0 0 0 -1 0 0 1 0 0 0 -1 0 0 1
Multigrid V-Cycle Interpolation & Smoothing Projection Interpolation & Smoothing Projection Projection Interpolation & Smoothing Interpolation & Smoothing Projection Smoothing
BT x_solve (serial) Call Graph Data Flow Analysis do k=1,ksize do j=1,jsize do i=1,isize
Nest Data Flow Graph do_45 do_134 do_330 Each arc represents Affinity Relation
www.nas.nasa.gov/Software/NPB NAS Parallel Benchmarks • Application Benchmarks • CFD • BT, SP, LU • Data Intensive • DC, DT, BTIO • Computational Chemistry • UA • Kernel Benchmarks • FT, CG, MG, IS • Verification • Performance Model • FORTRAN, C, HPF, Java* • Serial, MPI, OpenMP, Java* Threads * Other names and brands may be claimed as the property of others.
NPB Performance on Altix* ** * Other names and brands may be claimed as the property of others. ** Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or components they are considering purchasing.
Basic Data Flow Patterns • Shuffles • Sorting • FFT • Routing • Gather/Scatter • Conjugate Gradient • MD and FE codes • Sparse matrices • Transpose • FFT • Sorting • Tree • Parallel prefix, Reduction • Sorting
icl.cs.utk.edu/hpcc HPC Challenge Benchmarks • HPL* • DGEMM* • STREAM* • PTRANS* • FFTE* • RandomAccess* • Effective Bandwidth b_eff* * Other names and brands may be claimed as the property of others.
Implemented in DT of NPB and in NGB Programming With Directed Graphs • Arc • Arc* newArc(Node *tail, Node *head) • AttachArc(DGraph *dg) • deleArc(Arc *ar) • Node • newNode(char *name) • Node* AttachNode(DGraph *dg) • deleteNode(Node *nd) • DGraph • DGraph* newDGraph(char *name) • writeGraph(DGraph *dg, char* fname) • DGraph * readGraph(char* fname) do_134
Directed Graphs Around • Parse trees • File Systems • Application task graphs • Device Schematics Visualization and layout Tools • VCG tool • Edge tool • Tom Sawyer Software • Commercial tools
Task Graphs are rapidly growing Cart3D* • Performs CFD analysis on complex geometries • Uses six executables • Intersect* – intersects geometry • Cubes* – produces Cartesian meshes • Reorder* – reorders meshes • Mgprep* – coarsens mesh • flowCart* – convergence acceleration • Clic* – analyzes the flow • Executables communicate via files • Returns relevant forces • Lift, Drag, Side Force * Other names and brands may be claimed as the property of others.
Mixed Bag (MB) Launch LU2 LU4 LU8 MG4 MG8 MG2 FT8 FT8 FT2 Report #steps Helical Chain (HC) Launch Embarrassingly Distributed (ED) Visualization Pipeline (VP) BT SP LU Launch Launch BT SP LU SP SP SP SP SP SP SP SP SP BT MG FT BT SP LU BT MG Report FT BT MG FT Report Report The NAS Grid Benchmarks • Reflect task level programming paradigm • Contain four patterns • Embarrassingly Distributed (ED) • Helical Chain (HC) • Visualization Pipeline (VP) • Mixed Bag (MB)
Automatic Trace Analysis Using OLAP Data Dependent Patterns • Intermittent patterns • Useful for application performance tuning • Visualization is important • Allows to employ human eye ability to detect patterns • Automatic Pattern Mining • OLAP approach • MPI communication patterns
Conclusions Data Flow in Applications • Application Parallelization • Application Understanding • Application Mapping • Application Performance