10 likes | 132 Views
Preconditioner. Parameters. Description parameters. ILU( ). . level-of-fill. M -1 Ax=M -1 b. Ax=b. Solution to the linear system. Iterative Method. Preconditioner. ILUT. ,. level-of-fill, drop tolerance. ILUTP. ,, permtol. level-of-fill, drop tolerance and tolerance ratio.
E N D
Preconditioner Parameters Description parameters ILU() level-of-fill M-1Ax=M-1b Ax=b Solution to the linear system Iterative Method Preconditioner ILUT , level-of-fill, drop tolerance ILUTP ,, permtol level-of-fill, drop tolerance and tolerance ratio ILUD , drop tolerance, diagonal compensation parameter ILUDP , , permtol drop tolerance, diagonal compensation parameter and tolerance ratio ... … … Evaluation environment Intel XEON 3.06 GHz Ultra Sparc-III 750 MHz Level 1 8KB 4-way for data 64KB 4-way for data Weather Simulations Level 2 512 KB 8-way 8MB 2-way Level 3 1 MB 8-way N/A RAM 2 GB RAM 1 GB RAM Turbulence problems in airplanes Replacement algorithm All cache levels use a pseudo-LRU All the cache levels use a pseudo-random DNA models A(m,m)x(m) = b(m) Matrices Name Non-zero elements Rows Numerical symmetry (NS) NS/B Raefsky3 1,488,768 21,200 48% 4.3 Ldoor 42,493,817 952,203 100% 1.06 Cage14 27,130,349 1,505,785 21% 0.48 Torso3 4,429,042 259,156 0% 0 Results for ILUD preconditioner and method GMRES, 14 possible values for each parameter (drop tolerance, diagonal compensation parameter). There are 378 possible combinations. In the ith iteration of the outer loop: Data accessed but not modified The ith row Data accessed and modified Data not accessed "Characterizing the Relationship between ILU-type Preconditioners and the Storage Hierarchy"Diego Rivera1 , David Kaeli1 and Misha Kilmer2 www.ece.neu.edu/students/drivera/tlg/tunlib.html • Approximate inverse preconditioner: SPAI, MR, etc. • The PIN tool was used to capture cache events. LRU and random replacement policies were modeled • Several matrices were evaluated. Results from four representative matrices are shown below: Target preconditioners • Objective • To improve the performance of preconditioners targeting sparse • matrices • To accelerate the memory accesses associated with these codes • Motivation • Prior work targeted Krylov subspace methods • However, little has been done in the case of preconditioners • “Nothing will be more central to computational science in the next century than the art of transforming a problem that appears intractable into another whose solution can be approximated rapidly. For Krylov subspace matrix iterations, this is preconditioning” from Numerical Linear Algebra by Trefethen and Bau (1997). • Common target applications • Computational time is a barrier in these applications • Parallel processing can be used to lower this barrier • The sparsity of the data reduces the effectiveness of direct parallel computation • Preconditioners can be used to accelerate the convergence of Krylov subspace methods • A drawback of these approaches is that it is difficult to choose good values for their tuning-parameters • Choosing good values depends heavily on the structure of non-zero elements of the coefficient matrix • In our work we have found that it depends also on the memory hierarchy machine used to compute the solution • What about tuning memory access patterns of preconditioner techniques? • Plans and future work • Developing a benchmark suite for evaluating how best fill-in can be used for a given memory hierarchy and application code • Arriving at an algorithmic approach to select the best values of the preconditioner parameters for a given memory hierarchy • Proposing a new portable ILU-type preconditioner that does dynamic matrix fill-in: • Reordering technique for improving temporal locality • Adapting the number of non-zero elements to the block’s size of the highest cache level for improving spatial locality Correlation of load accesses and execution time Ultra Sparc-III Intel XEON DTLB DL1 L2 L3 DTLB DL1 L2 Raefsky3 Ldoor Torso3 Cage14 Relation numerical-symmetry/matrix-bandwidth decreases in this direction Error norm vs. 13 first duple sorted in increasing order for execution time of ILUT and GMRES Acknowledgement This project is supported by the National Science Foundation’s Computing and Communication Foundations Division, grant number CCF-0342555 and the Institute of Complex Scientific Software. ICSS Institute for Complex Scientific Software