1 / 20

Short Report on the laohu GPU cluster usage at NAOC

Changhua , Li National Astronomical Observatory of China. Short Report on the laohu GPU cluster usage at NAOC. Introduction of Laohu. Laohu GPU cluster built in 2009, the peak of single precision performance is 160TFLOPs.

Download Presentation

Short Report on the laohu GPU cluster usage at NAOC

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Changhua, Li National Astronomical Observatory of China Short Report on the laohu GPU cluster usage at NAOC

  2. Introduction of Laohu Laohu GPUcluster built in 2009, the peak of single precision performance is 160TFLOPs. Total Cost: 6 million RMB2009 (4/1 Min. of Finance ZDYZ2008-2-A06/NAOC) Hardware configuration: 85nodes+ Infiniband+140T Node hardware info: Lenovo R740, 2 Xeon E5520 CPUs,24GBMemory,500G disk, 2 Nvidia C1060GPU cards.

  3. Laohu upgrade C1060 240 cores, 4GMemory, 933GFlops In Sep. 2013,we bought 59 K20 GPU cards for 59 nodes,we spent 1.18million RMB. So, the new laohu configuration is 59 hosts with one k20 GPU cardand 26 hosts with 3C1060GPU cards。In theory, the peak of single precision performance is 280 TFLOPS/s.

  4. LAOHUArchitecture

  5. Laohu management system--LSF • Platform LSF (Load Sharing Facility) is a suite of distributed resource management products that: • Connects computers into a Cluster (or “Grid”) ; • Monitors loads of systems ; • Distributes, schedules and balances workload; • Controls access and load by policies; • Analyzes the workload; • High Performance Computing (HPC) environment

  6. Laohu Queues for GPU job GPUqueues: • gpu_16: k20 host, max cores: 16, min cores: 4, total cores limitation: 32 • gpu_8: k20 host, max cores: 8, min cores: 2, total cores limitation: 24 • gpu_k20_test: k20 host, only 2 croes for one job, total cores • limitation: 3 • gpu_c1060: c1060 host, max cores: 30, min cores: 2, total cores limitation: 66 • gpu_c1060_test: c1060 host, only 3 cores for one job, total cores limitation: 9

  7. LaohuQueues for CPU job CPUqueues • cpu_32: 25-32 nodes with 7/5 Cpu cores per node (192 cores) for per job, Allow to execute as two jobs. Maximum running time 1 week. • cpu_large: 8 - 22 nodes with 7/5 Cpu cores per node (total: 48 cores). Allow to execute as many jobs. Maximum running time 1 week. • cpu_small: 2 - 8 nodes with 7/5 cpu cores per node for per single job.Allow to execute as many job to fill 8 nodes/48 cpu cores. Maximum running time 1 week. • cpu_test: 1 - 5 nodes with 7/5 cpu cores per node(total: 30 cores). Allow to execute as many job to fill 5 nodes/30 cpu cores. Maximum running time 3 hours

  8. CPU job submit script Sample 1: cpujob.lsf #!/bin/sh #BSUB -q cpu_32 #job queue, modify according to user #BSUB -a openmpi-qlc #BSUB -R 'select[type==any] span[ptile=6] ‘ #resource requirement of host #BSUB -o out.test #output file #BSUB -n 132 #the maximum number of CPU mpirun.lsf --mca "btlopenib,self" Gadget2wy WJL.PARAM # need modify for user’s program. Exec method: bsub < cpujob.lsf

  9. GPU job submit script Sample 2: gpujob.lsf #!/bin/sh #BSUB -q gpu_32 #job queue #BSUB -a openmpi-qlc #BSUB -R 'select[type==any]‘ #resource requirement of host #BSUB -o out.test #output file #BSUB –e out.err #BSUB -n 20 #the maximum number of CPU mpirun.lsf --prefix "/usr/mpi/gcc/openmpi-1.3.2-qlc" -x "LD_LIBRARY_PATH=/export/cuda/lib:/usr/mpi/gcc/openmpi-1.3.2- qlc/lib64" ./phi-GRAPE.exe # need modify for user’s program. Exec method: bsub < gpujob.lsf

  10. LAOHUMonitoring http://laohu.bao.ac.cn

  11. LAOHU SOFT. CUDA4.0/CUDA5.0 OPENMPI/IntelMPI, etc. GCC 4.1/GCC4.5 Intel Compiler Math lib: blas, gsl, cfitsio, fft,… Gnuplot, pgplot Gadget

  12. LAOHU Users

  13. LAOHU CPU utilization ratio (2012) 2012 (Avg. 74%)

  14. LAOHU CPU utilization ratio (2013) 2013 (Avg. 64%)

  15. LAOHU Application List NBODY Simulations (NBODY6++, phiGPU, Galactic Nuclei, Star Clusters) NBODY Simulations (Gadget2, galactic dynamics) Correlator(only test) Gravitational Microlensing Local spirals formation through major merger Dark energy survey 7. TREND, the Mentocarlo simulation for the extreme-high energy Extensive AirShower(EAS) 8. Parallelization of Herschel Interactive Processing Environment 9. The HII region and PDR modeling based on CLOUDY code 10. Reconstructing primordial power spectrum and dark energy equation of state ……

  16. LAOHU Achievement Berczik, P., Nitadori, K., Zhong S., Spurzem, R., Hamada, T, Wang, X.W., Berentzen, I., Veles, A., Ge, W., Proceedings of the International conference on High Performance Computing High Performance massively parallel direct N-body simulations on large GPU clusters  Amaro-Seoane, P., Miller, M. C., Kennedy, G. F., Monthly Notices of the Royal Astronomical Society Tidal disruptions of separated binaries in galactic nuclei  Just, A., Yurin, D., Makukov, M., Berczik, P., Omarov, C., Spurzem, R., Vilkoviskij, E. Y., The Astrophysical Journal Enhanced Accretion Rates of Stars on Supermassive Black Holes by Star-Disk Interactions in Galactic Nuclei  Taani, A., Naso, L., Wei, Y., Zhang, C., Zhao, Y., Astrophysics and Space Science Modeling the spatial distribution of neutron stars in the Galaxy  Olczak, C., Spurzem, R., Henning, T., Kaczmarek, T., Pfalzner, S., Harfst, S., PortegiesZwart, S., Advances in Computational Astrophysics: Methods, Tools, and Outcome Dynamics in Young Star Clusters: From Planets to Massive Stars  Spurzem, R., Berczik, P., Zhong, S., Nitadori, K., Hamada, T., Berentzen, I., Veles, A., Advances in Computational Astrophysics: Methods, Tools, and Outcome Supermassive Black Hole Binaries in High Performance Massively Parallel Direct N-body Simulations on Large GPU Clusters  Khan, F. M., Preto, M., Berczik, P., Berentzen, I., Just, A., Spurzem, R., The Astrophysical Journal Mergers of Unequal-mass Galaxies: Supermassive Black Hole Binary Evolution and Structure of Merger Remnants  Li, S., Liu, F. K., Berczik, P., Chen, X., Spurzem, R., The Astrophysical Journal Interaction of Recoiling Supermassive Black Holes with Stars in Galactic Nuclei ..... http://silkroad.bao.ac.cn/web/index.php/research/publications

  17. HPC/GPU Training

  18. Astronomy Cloud Project

  19. Astronomy Cloud Architecture

  20. ThankS! Email: lich@nao.cas.cn

More Related