120 likes | 221 Views
Sun HPC10000. Richard Frost San Diego Supercomputer Center. Hardware Overview. Architecture: 64 GB Shared Memory (UMA) 64 Ultra Sparc II processors 400 MHz, 800 Mflops I/O: 6.4 GB/s peak per SBus, 100MB/s/SBus sustained Network SUN Ultra Port Architecture (UPA)
E N D
Sun HPC10000 Richard Frost San Diego Supercomputer Center
Hardware Overview • Architecture: • 64 GB Shared Memory (UMA) • 64 Ultra Sparc II processors • 400 MHz, 800 Mflops • I/O: 6.4 GB/s peak per SBus, 100MB/s/SBus sustained • Network • SUN Ultra Port Architecture (UPA) • Gigaplane-XB : 16 X 16 data cross bar links every system board • 4 separate address buses • Bandwidth = 200 MB/sec per PE ~ 12.8 GB/sec total • Read latency of ~400 ns • Peak speed: 51.2 Gflops
Software Overview • Queuing system • LSF = Load Sharing Facility • For MPI jobs • pam is part of ‘hpc’ queue but not part of others • Choose # of PEs in LSF script • Put pam in front of executable for non hpc queues • For threaded (OpenMP or Phtreads) jobs • Choose # of PEs in LSF script • Set number of threads in LSF script • Start executable without pam
Software Overview • Compilers and Programming Tools • Serial: f77, f90, cc, CC • MPI: tmf90, tmf77, tmcc, tmCC (must use -lmpi flag) • Sun Fortran OpenMP: tmf90 (must use -mp=openmp -xparallel) • KAI C and Fortran OpenMP: guidef90, guidec • Debuggers and Performance Monitors from Sun and KAI • Libraries • Sun Performance Library (serial) and SSL (parallel) numerical libraries • Applications software and Tools • Abaqus, Gaussian98, Nastran, SPRNG, … • Prism, TotalView debuggers • See http://www.npaci.edu/Applications/
Programming Models • Data Parallel • OpenMP can be very effective • MPI for portability; esp. when some data sets exceed Sun’s 60+ GB • Task Parallel • Either OpenMP or MPI • Choice is dependent upon application needs; including problem size • Task Farming • MPI is best choice • Event-driven Dense Communication • MPI is best choice • Multi-Level Parallelism • Sun MPI + Sun F90 OpenMP recommended • Sun MPI + KAI OpenMP (guide) will work in simple situations
Recommended Porting Methods • Port MPI code ‘as is’ • Modify makefile for compiler options (tmXX … -lmpi) • Submit jobs using LSF script • Put pam in front of executable for non hpc queues • Port OpenMP code ‘as is’ • Modify makefile for compiler options (-mp=openmp -xparallel) • Set number of threads in LSF script • Submit job without pam • Use Prism or TotalView for debugging and tuning
Performance Considerations • CPUs • Most applications will be CPU bound • Large memory jobs (> 1GB) must use 64-bit addressing • Network/Interconnect • Good bandwidth with moderate to large messages • Moderately high latency for intermittent small messages • Messaging “warm-up” can improve data locality -- i.e., 1st iteration might be slowest • I/O • Sun was 1st vendor to implement complete MPI I/O • Tuned to architecture
Performance Results/Comparisons • Performance comparison: HPC10000 and T90 • CMRR Monte Carlo code for thermal instability : wall clock time on 40 HPC10K is 2.5 hrs; on 1 T90 18.05 hrs (for this case 1 T90 procs equivalent to ~ 5.5 HPC10K procs) • Scaling for this case :#Procs Time (hrs) Speedup-----------------------------------------------------------20 5.5 1.0 30 3.8 1.4540 2.8 1.96
User Experiences • Half the machine allocated to NPACI users on 1/1/00 • Universities : SUNY Buffalo, Auburn U, Stanford U, Masschusetts General Hospital, Penn State, Cornell U, UCSD, UNM, ... • Topics : parallel adaptive FEM for bone structures, plasma process in magnetosheath, LES, light scattering of neuron cells, study of mesoscale microstructures, SMP performance, etc. • Other projects, including SAC projects: • Bertram : Magnetic recording studies, CMRR, Physics, UCSD • Bower : Neuron modeling, Neuroscience, Cal Tech • Hauschildt : Stellar atmosphere, Astronomy, U Georgia • Bourne : Protein Data Base, SDSC/UCSD
Future Developments… • 64 CPU ultra upgrade to Solaris 8 • Same (identical) environment as gaos • ultra becomes “compute only” engine • ultra logins restricted to necessary sysadmin tasks • user logins restricted to gaos • Sun to release OpenMP for C