140 likes | 293 Views
Performance Enhancements in MSC.Nastran for Large Scale Design Optimization on Cray SV1 Computers. Dr. D. Obrist, Dr. H. Misra, Cray Inc. Dr. S. Zhang, Dr. D. Chou, MSC.Software Corp. … with the goal to enhance the design optimization capabilities of MSC.Nastran on the Cray SV1(ex).
E N D
Performance Enhancements in MSC.Nastran for Large Scale Design Optimization on Cray SV1 Computers Dr. D. Obrist, Dr. H. Misra, Cray Inc. Dr. S. Zhang, Dr. D. Chou, MSC.Software Corp.
… with the goal to enhance the design optimization capabilities of MSC.Nastran on the Cray SV1(ex) Large Scale Design Optimization • joint project between MSC.Software Corp. and Cray Inc. (October 2000 - July 2001) 2-4x shorter turnaround time
Characteristics of Large Scale Optimization Problems • millions of degrees of freedom • hundreds of design variables and responses • hundreds of modes prohibitive turnaround times for simulations excessively large I/O
List of enhancements • exploit the sparsity of the design model • improved data management to process DSADJ in a single pass - improved vectorization • highly optimized matrix-matrix multiplications from the Cray Scientific Library • optimized sparse matrix I/O • parallelization of DSADJ and DSVG1 • misc. improvements (GP5, EMG, MPYAD, PARTN, MERGE, SADD5, etc.)
up um um ûp ’ um = = Sparsity of the design model In many design optimization tasks only a small number of elements are modified during the design process (sparse design set) Example: data recovery sub-dmap DISPRS
Data recovery sub-dmap DISPRS • amount of I/O is reduced by 4x • CPU time is reduced by 5x • Industry example: • design model is 25% sparse • 2’091’102 DOF • 251 modes • 128 design variables • 2931 retained responses
Instead of loading all nodal displacements in multiple passes … u ndof longer vectors improved vectorization 1 2 3 4 5 6 nsol … only the displacements of the element nodes are loaded on demand. ndof u nsol Improved data management in DSADJ - single pass single pass reduced scalar overhead
Industry Example I:DSADJ performance 7x improved!
Industry Example II:DSADJ performance 9x improved!
Industry Example III:DSADJ performance 13x improved!
Industry Example IV:DSVG1 performance 10x improved!
FLOPS I/0 Parallelization of DSVG1 and DSADJ Parallel runs of the Industry Example III (2 million DOF) with 1, 2, and 4 processors.
Total Improvements over one Design Cycle Overall improvement: 2-4x !
Larger problems can be solved in less time! Conclusions The turnaround time for a large design optimization task is dramatically reduced (2-4x) ... The performance is independent of the open core memory size ...