270 likes | 424 Views
Roadrunner Supercluster. University of New Mexico -- National Computational Science Alliance Paul Alsing. Alliance/UNM Roadrunner SuperCluster. Alliance/UNM Roadrunner SuperCluster. Strategic Collaborations with Alta Technologies Intel Corp. Node configuration
E N D
Roadrunner Supercluster University of New Mexico --National Computational Science Alliance Paul Alsing
Alliance/UNM Roadrunner SuperCluster Cactus Workshop
Alliance/UNM Roadrunner SuperCluster • Strategic Collaborations with • Alta Technologies • Intel Corp. • Node configuration • Dual 450MHz Intel Pentium II processors • 512 KB cache, 512 MB ECC SDRAM • 6.4 GB IDE hard drive • Fast Ethernet and Myrinet NICs Cactus Workshop
Alliance / UNM Roadrunner • Interconnection Networks • Control: 72-port Fast Ethernet Foundry switch with 2 Gigabit Ethernet uplinks • Data: Four Myrinet Octal 8-port switches • Diagnostic: Chained serial ports Cactus Workshop
A Peek Inside Roadrunner Cactus Workshop
Roadrunner System Software • Redhat Linux 5.2 (6.0) • SMP Linux kernel 2.2.12 • MPI (Argonne’s MPICH 1.1.2) • Portland Group Compiler Suite • Myricom GM Drivers (1.086) and • MPICH-GM (1.1.2.7) • Portable Batch Scheduler (PBS) Cactus Workshop
HPF Parallel Fortran for clusters • F90 Parallel SMP Fortran 90 • F77 Parallel SMP Fortran 77 • CC Parallel SMP C/C++ • DBG symbolic debugger • PROF performance profiler
Roadrunner System Libraries • BLAS • LAPACK • ScaLAPACK • Petsc • FFTw • Cactus • Globus Grid Infrastructure Cactus Workshop
Parallel Job Scheduling • Node-based resource allocation • Job monitoring and auditing • Resource reservations Cactus Workshop
Computational Grid • National Technology Grid • Globus Infrastructure • Authentication • Security • Heterogenous environments • Distributed applications • Resource monitoring Cactus Workshop
For more information: • Contact Information http://www.alliance.unm.edu/ help@alliance.unm.edu • To Apply for an Account • http://www.alliance.unm.edu/accounts • accounts@alliance.unm.edu Cactus Workshop
Easy to Use rr% ssh -l username rr.alliance.unm.edu rr% mpicc -o prog helloWorld.c rr% qsub -I -l nodes=64 r021 % mpirun prog Cactus Workshop
Job Monitoring with PBS Cactus Workshop
Roadrunner Performance Cactus Workshop
Roadrunner Ping-Pong Time Cactus Workshop
Roadrunner Bandwidth Cactus Workshop
Applications on RR • MILC QCD (Bob Sugar, Steve Gottlieb) • A body of high performance research software for doing SU(3) and SU(2) lattice gauge theory on several different (MIMD) parallel computers in current use • ARPI3D (Dan Weber) • 3-D numerical weather prediction model to simulate the rise of a moist warm bubble in a standard atmosphere • AS-PCG (Danesh Tafti) • 2-D Navier Stokes solver • BEAVIS (Marc Ingber, Andrea Mammoli) • 1994 Gordon Bell Prize-winning dynamic simulation code for particle-laden, viscous suspensions Cactus Workshop
Applications: CACTUS • 3D Numerical Relativity Toolkit for Computational Astrophysics(Courtesy of Gabrielle Allen and Ed Seidel) • Roadrunner performance under the Cactus application benchmark shows near-perfect scalability. Cactus Workshop
CACTUS Performance (Graphs - courtesy of O. Wehrens) Cactus Workshop
CACTUS Scaling (Graphs - courtesy of O. Wehrens) Cactus Workshop
CACTUS: The evolution of a pure gravitational wave A subcritical Brill wave (Amplitude=4.5), showing the Newman-Penrose Quantity as volume rendered 'glowing clouds'. The lapse function is shown as a height field in the bottom part of the picture. (Courtesy of Werner Benger) Cactus Workshop
Future Directions • TeraScale computing • “A SuperCluster in every lab” • Efficient use of SMP nodes • Scalable interconnection networks • High-performance I/O • Advanced programming models for hybrid (SMP and Grid-based) clusters Cactus Workshop
Exercises • Login to Roadrunner % ssh roadrunner.alliance.unm.edu -l cactusXX • Request interactive session % qsub -I -l nodes=n • Create Myrinet Node-Configuration File % gmpiconf $PBS_NODEFILE (to use 1 CPU per node) % gmpiconf2 $PBS_NODEFILE (to use 2 CPUs per node) • Run Job % mpirun cactus_wave wavetoyf90.par (on 1 CPU per node) % mpirun -np 2*n cactus wavetoyf90.par (on 2 CPUs per node) Cactus Workshop
Compiling Cactus: WaveToy • Login to Roadrunner % ssh roadrunner.alliance.unm.edu -l cactusXX • .cshrc • #MPI (season to taste) • #setenv MPIHOME /usr/parallel/mpich-eth.pgi #ethernet/Portland Grp • #setenv MPIHOME /usr/parallel/mpich-eth.gnu #ethernet/GNU • setenv MPIHOME /usr/parallel/mpich-gm.pgi #myrinet/Portland Grp • #setenv MPIHOME /usr/parallel/mpich-gm.gnu #myrinet/GNU • if you modify .cshrc make sure to • source .cshrc; rehash • echo $MPIHOME #should read /usr/parallel/mpich-gm.pgi Cactus Workshop
Compiling Cactus: WaveToy • Create WaveToy configuration % gmake wave F90=pgf90 MPI=MPICH MPICH_DIR=$MPIHOME • Compile WaveToy % gmake wave % cd ~/Cactus/exe Copy all .par files into this directory (not necessary) % foreach file (`find ~/Cactus -name “*.par” -print`) foreach> cp $file . foreach> end Cactus Workshop
Running WaveToy on RoadRunner • Run waveinteractively on RoadRunner • PBS job scheduler: request interactive nodes % qsub -I -l nodes=4 (note: -I = interactive) • Note: prompt changes from a front-end node name like [cactus01@rr exe] to an compute-node name for e.g. [cactus01@r034 exe] • Note: you should compile on the front-end and run on the compute nodes (open 2 windows) • PBS job scheduler: setup a node-configuration file % gmpiconf $PBS_NODEFILE • Note: cat ~/.gmpi/conf-xxxx.rr will contain specific node names • Run the job from ~/Cactus/exe % mpirun cactus_wave wavetoyf90.par % mpirun -np 2 cactus_wave wavetoyf90.par Cactus Workshop
Running WaveToy on RoadRunner • Run wavebatch on RoadRunner • PBS script: (call it, for e.g.) wave.pbs #PBS -l nodes=4 # pbs script for wavetoy: 1 processor per node gmpiconf $PBS_NODEFILE mpirun ~/Cactus/exe/cactus_wave wavetoyf90.par #(use full path) %Submit batch PBS job % qsub wave.pbs % 234.44 (PBS responds with your job_id #) % qstat -a (check status of your job) % qstat -n (check status, and see the nodes you are on) % qdel 234.44 (remove job from queue) % dsh killall cactus_wave (if things hang, mess up, etc…) Cactus Workshop