230 likes | 374 Views
Case studies in Optimizing High Performance Computing Software. Jan Westerholm High performance computing Department of Information Technologies Faculty of Technology / Åbo Akademi University. FINHPC / Åbo Akademi Objectives. Sub-project in FINHPC
E N D
Case studies in Optimizing High Performance Computing Software Jan Westerholm High performance computing Department of Information Technologies Faculty of Technology / Åbo Akademi University
FINHPC / Åbo Akademi Objectives • Sub-project in FINHPC • Three year duration 01.07.2005-30.06.2008 • Objective: to improve code individuals and research groups have written and are running on CSC machines • faster code, with in many cases exactly the same numerical results as before • ability to run bigger problems • Work approach: apply well known techniques from computer science • Faster programs may imply better quality for results • Better throughput for everybody
FINHPC / Åbo AkademiLimitations • We will use: • parallelization techniques • code optimization • cache utilization (particularly L2-cache) • microprocessor pipeline continuity • data blocking: grid scan order • introduction of new data structures • replacement of very simple algorithms • sorting (quicksort instead of bubble sort) • open source libraries
FINHPC / Åbo AkademiLimitations • We will not: • introduce better physics, chemistry, etc. • replace chosen basic numerical technique • replace individual algorithms unless they are clearly modularized (matrix inversion as library routine)
3 case studies • Lattice-Boltzmann fluid simulation : 3DQ19 • Protein covariance analysis: Covana • Fusion reactor simulation: Elmfire
3DQ19: Lattice Boltzmann fluid mechanics • Jyväskylä University / Jussi Timonen, Keijo Mattila; ÅA / Anders Gustafsson • Physical background: • phase space distribution simulated in time • Boltzmann's equation: drift term and collision term • physical quantities = moments of distribution
3DQ19: Program Profiling Flat profile: % cumulative self self total time seconds seconds calls ms/call ms/call name 33.96 43.65 43.65 50 873.00 1230.10 everything2to1() 30.79 83.22 39.57 50 791.40 1148.50 everything1to2() 27.79 118.93 35.71 49000000 0.00 0.00 relaxation_BGK() 2.30 121.89 2.96 shmem_msgs_available 1.19 123.42 1.53 100 15.30 15.30 send_west() 1.11 124.85 1.43 100 14.30 14.30 send_east() 0.82 125.91 1.06 recv_message 0.45 126.49 0.58 sock_msg_avail_on_fd 0.37 126.97 0.48 100 4.80 4.80 per_bound_xslice() 0.33 127.40 0.43 1 430.00 430.00 init_fluid() 0.31 127.80 0.40 1 400.00 400.00 local_profile_y() 0.23 128.10 0.30 socket_msgs_available 0.19 128.34 0.24 1 240.00 240.00 calc_mass() 0.04 128.39 0.05 net_recv 0.03 128.43 0.04 1 40.00 40.00 allocation() 0.02 128.46 0.03 main
3DQ19: Optimizations • Parallelization: well done already! • Code optimization • blocking: grid scan order • anti-dependency: make blocks of code independent • deep fluid: mark those grid points which do not have solids as neighbours
3DQ19: Results on three parallel systems Athlon 1800 IBMSC AMD64 everything1to2(): 18,8 19,48 10,06 everything2to1(): 19,34 18,78 10,52 send_west(): 8,4 0,68 1,96 send_east(): 8,31 1,17 3,14 Total time (s): 55,15 40,28 25,76 Time gained (s): 27,48 14,13 14,76 Speed up (%): 33% 26% 36%
2nd case study: Covana Protein Covariance analysis • Institute of Medical Technology, University of Tampere / Mauno Vihinen, Bairong Chen; ÅA / André Norrgård • Biological background • physico-chemical groups of amino acids • protein function from structure • pair and triple correlations between amino acids • web server for covariance analysis
Covana: Protein covariance analysis • Protein sequences: calculate correlations between columns of amino acids • Typical size • 50-150 sequences (rows) • 300-1500 amino acids in a sequence (columns) >Q9XW32_CAEEL/9-307 IDVTKPTFLLTFYSIHGTFALVFNILGIFLIMK-NPKIVKMYKGFMINMQ-ILSLLADAQ TTLLMQPVYILPIIGGYTNGLLWQVFR----LSSHIQMAMF---LLLLY---------LQ VASIVCAIVTKYHVVSNIGKLSDRSI-LFWIF---VIVYHGCAFVITGFFSVS-CLARQ- -EEENLIK------T-KFPNAISVFTLEN--VAIYDLQVN---KWMMITTILFAFMLTSS IVISFY--FSVRLLKTLPSKRNTISARSFRGHQIAVTSLM-AQAT-VPFLVL---IIP-- IGTIVYLFVHVLP------NAQ-----EISNIMMAV--YSFHASLST---FVMIISTPQY
Covana: Code optimization • Effective data structures: dynamic memory allocation • Effective generic algorithms: sorting • Avoid recalculations
Covana: Results • Runtime: • Original : 227.8 s • Final Version : 2.0 s • Improvement : 112 times faster • Computer memory usage: • Original : 3250 MB • Final Version : 37 MB • Improvement : 88 times less. • Disk space usage: • Original : 277 MB • Final version : 21 MB • Improvement : 13 times less.
3rd study case: ELMFIRE Tokamak fusion reactor simulation • Jukka Heikkinen, Salomon Janhunen, Timo Kiviniemi / Advanced Energy Systems / HUT; ÅA / Artur Signell • Physical background: • particle simulation with averaged gyrokinetic Larmor orbits • turbulence and plasma modes
Elmfire: Tokamak fusion reactor simulation • Goal 1: Computer platform independence • replacing proprietary library routines for random number generation with open source routines • replacing proprietary library routines for distributed solution of sparse linear systems with open source library routines • Goal 2: Scalability • Elmfire ran on at most 8 processors • new data structures for sparse matrices were invented, which make element updates efficient
Conclusions • Software can be improved! • modern microprocessor architecture is taken into account: • cache utilization • pipeline • use of well-established computer science methods
Conclusions • In 1 case out 3, a clear impact on run time was made • In 2 cases out of 3, previously intractable results can now be obtained • Are these three cases representative of code running on CSC machines? • the next two cases are under study!
What have we learnt? • Computer scientists with minimal prior knowledge of e.g. physical sciences can contribute to HPC • Are supercomputers needed to the extent they are used today at CSC? • Interprocess communication often a bottleneck • Parallel computing with 1000 processors may become routine in the future for certain types of problems • Who should do the coding? • Code for production use (intensive cycles of use, maintainability) should be outsourced?
Co-workers: • Mats Aspnäs, Ph.D • Anders Gustafsson, M.Sc. • Artur Signell, M.Sc. • André Norrgård THANK YOU!