1 / 39

The MSU Institute for Cyber Enabled Research As a Resource for Theoretical Physics

The MSU Institute for Cyber Enabled Research As a Resource for Theoretical Physics. 2010/11/09 – Eric McDonald. Outline. The Institute for Cyber Enabled Research (iCER) The High Performance Computing Center (HPCC) (Upcoming Computational Physics Talks at the Institute)

caryn-weber
Download Presentation

The MSU Institute for Cyber Enabled Research As a Resource for Theoretical Physics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The MSU Institute for Cyber Enabled Research As a Resource for Theoretical Physics 2010/11/09 – Eric McDonald

  2. Outline • The Institute for Cyber Enabled Research (iCER) • The High Performance Computing Center (HPCC) • (Upcoming Computational Physics Talks at the Institute) • Work on the NuShellX Nuclear Shell Model Code Using HPCC Resources • How Can iCER Serve NSCL and FRIB? • NSCL-iCER Liaising • Questions? Discussion

  3. What is iCER? • Established in 2009 to encourage and support the application of advanced computing resources and techniques by MSU researchers. • Goal is to maintain and enhance the university's national and international standing in computational disciplines and research thrusts.

  4. Organization of iCER • High Performance Computing Center • HPC Programmers / Domain Specialists • Research Consultants and Faculty • Clerical Staff • External Advisory Board • Steering Committee • Executive Committee • Directorate

  5. What does iCER provide? • High Performance Computing Systems (via HPCC) • Buy-in Opportunities (departmental liaisons / domain specialists, privileged access to additional HPCC hardware) • Education (weekly research seminars and HPCC technical talks, hosting of workshops and virtual schools, special topic courses) • Consulting • Collaboration and Grant Support

  6. What is HPCC? • Established in late 2004 to provide campus-wide high performance computing facility for MSU researchers. • HPC systems are free to use. No project or department accounts are charged. • One goal is to help researchers capture research and funding opportunities that might not be otherwise possible.

  7. Organization of HPCC • Systems Administrators (full-time on-call staff; each has one or more areas of specialty, such as databases, scheduling, sensors, or storage) • iCER Domain Specialists / HPC Programmers (have “superuser” privileges on HPCC systems and work closely with the systems administrators) • Clerical Staff • Director

  8. HPCC Hardware History I • 2005: 'green'. 1 “supercomputer” (128 Intel IA-64 cores), 512 GiB of RAM, ~500 GFLOPS. • 2005: 'amd05' cluster. 128 nodes (512 AMD Opteron x86-64 cores), 8 GiB of RAM per node (1 TiB total), ~2.4 TFLOPS. • 2007: 'intel07' cluster. 128 nodes (1024 Intel Xeon x86-64 cores), 8 GiB of RAM per node (1 TiB total), ~9.5 TFLOPS.

  9. HPCC Hardware History II • 2009: 'amd09' cluster. 5 “fat nodes” (144 AMD Opteron x86-64 cores), 128 or 256 GiB per node (1.125 TiB total). • 2010: 'gfx10' GPGPU cluster. 32 nodes (256 Intel Xeon x86-64 cores), 18 GiB per node (576 GiB total), 2 nVidia Tesla M1060 GPGPU accelerators per node, 480 GPGPU cores per node (15360 GPGPU cores total).

  10. HPCC Hardware History III • Coming Soon: 'intel10' cluster. 188 nodes (1504 Intel Xeon x86-64 cores), 24 GiB per node (4.40625 TiB), ~14.7 TFLOPS. • (Installation pictures on next slide.) • What's next? You tell us. More fat nodes, another GPGPU cluster, ...?

  11. 'intel10' Installation Collage

  12. Hardware Buy-In Program I • When HPCC is planning to purchase a new cluster, users are given a window in which they may purchase nodes to add to it. • Buy-in users have privileged access to their nodes in that they can preempt other users' jobs on those nodes. • When buy-in nodes are not specifically requested by their purchasers, then jobs are scheduled on them as with the non-buy-in nodes.

  13. Hardware Buy-In Program II • HPCC maintains a great deal of infrastructure which buy-in users can regard as a bonus: • Power • Cooling • Support Contracts • Mass Storage • High-Speed Interconnects • On-Call Staff • Security • Sensor Monitoring • Compare to building and maintaining your own cluster....

  14. Data Storage I • Initial home directory storage quota is 50 GB. Can be temporarily boosted in 50 or 100 GB increments, up to 1 TB, if a good research reason is given. • http://www.hpcc.msu.edu/quota • Shared research spaces may also be requested. • http://www.hpcc.msu.edu/contact • Allocations beyond 1 TB are sold in 1 TB chunks, currently at US$500 per TB. This factors in staff time and infrastructure costs as well as raw storage costs.

  15. Data Storage II • Snapshots of home directories are taken every hour. Users can recover accidentally-deleted files from these snapshots without intervention from systems administrators. • Home and research directories are automatically backed up daily. • Replication to off-site storage units is performed. • Scratch space is not backed up.

  16. Disk Array

  17. High Speed Networking • Most nodes are connected via an Infiniband fabric. • Individual node throughputs can reach up to 10 Gbps. • Access to network data storage over this link. • MPI libraries have Infiniband support, and so MPI-parallelized codes can pass messages over these low latency links. (As low as 1.1 μs endpoint-to-endpoint message passing.)

  18. Software Overview I • Operating System Kernel: Linux 2.6.x • Compiler collections (C, C++, Fortran 77*, Fortran 90/95/2003/2008-ish*) from multiple vendors available. • GCC (FSF): 4.1.2 (default), 4.4.4 • Intel: 9.0, 10.0.025, 11.0.083 (default), 11.1.072 • Open64: 4.2.3 • Pathscale: 2.2.1, 2.5, 3.1 (default) • PGI: 9.0, 10.1 (default), 10.9 * Some restrictions may apply. See compiler documentation for details.

  19. Software Overview II • MPI Libraries • MVAPICH (essentially MPICH with Infiniband support): 1.1.0 • OpenMPI: 1.3.3, 1.4.2 (default version depends on compiler version) • Math Libraries • Intel Math Kernel Library (MKL): 9.1.021, 10.1.2.024 (default) • Fastest Fourier Transform in the West (FFTW): 2.1.5 (with MPI support), 3.2.2 (default)

  20. Software Overview III • Debuggers • gdb 6.6 • TotalView 8.7 (has GUI interface and MPI support) • GPGPU Programming Tools • PGI Fortran Compilers (CUDA support in version 10.x) • nVidia CUDA Toolkit: 2.2, 2.3 (default), 3.1

  21. Software Overview IV • Matlab 2009a • Version Control Systems • CVS • Git • Mercurial • Subversion • Miscellaneous Developer Tools • cmake • Doxygen

  22. Coming Software... • Math Libraries • AMD Core Math Library (ACML) • ATLAS • PETSc • SuperLU • Trilinos • Mathematica 7.1 • SciPy (including NumPy) • Visualization Tools • ParaView • VisIt

  23. Using HPCC I • Interactive logins via 'gateway.hpcc.msu.edu'. • Need SSH client. Several good ones exist for Windows; try 'PuTTY', for example. MacOS X and Linux distributions come bundled with one. • 'gateway' is a gateway. Login to developer nodes from here to do work. However, you can submit jobs to the batch queues from here. • Developer nodes are for compiling codes and running short (<10 minutes) tests.

  24. Using HPCC II • Architectures of developer nodes are representative of cluster compute nodes. • 'white' ↔ 'green' • 'dev-amd05' ↔ 'amd05' cluster • 'dev-intel07' ↔ 'intel07' cluster • 'dev-gfx08' ↔ nothing • 'dev-intel09' ↔ nothing • 'dev-amd09' ↔ 'amd09' fat nodes • 'dev-gfx10' ↔ 'gfx10' GPGPU cluster • 'dev-intel10' ↔ 'intel10' cluster

  25. Using HPCC III

  26. Using HPCC IV • Files can be accessed without logging into 'gateway' via SSH. • File servers containing your home directories and research spaces can be accessed via a CIFS connection. CIFS is the same protocol which Windows uses for file sharing. Mac OS X talks CIFS as well. • For more information: • https://wiki.hpcc.msu.edu/x/VYCe

  27. Upcoming iCER Seminars • Erich Ormand – LLNL • “Is High Performance Computing the Future of Theoretical Science?” • 2010/11/11 (this Thursday) 10:30 AM • BPS 1445A • Bagels and coffee • Joe Carlson – LANL • 2010/12/09 10:30 AM • For more information: • https://wiki.hpcc.msu.edu/x/q4Cy

  28. NuShellX

  29. NuShellX I • Nuclear shell model code. • Developed by Bill Rae from Oxford, who has created other shell model codes: Oxbash, MultiShell, and NuShell. • Uses a Lanczos iterator to find the energies. • Code can also produce transition rates and spectroscopic factors.

  30. NuShellX II • Another iterative solver, which implements a scheme known as Thick Restart Lanczos, is also available. • Part of my project with Alex is to verify the correctness of this implementation and fix accuracy issues that may arise in some cases. • May also investigate implementation of application-level checkpointing in this context.

  31. NuShellX III • Some work done on NuShellX using HPCC. • Ported the code, including Alex's wrappers, to Linux, and provided a Makefile-based build system. • Generalized code away from Intel Fortran compiler (ifort), so that other compilers may be used to build it. In particular, HPCC's PGI compilers were used. • Explored scalability of the OpenMP version of the code using HPCC fat nodes.

  32. NuShellX IV • Some planned work on NuShellX which will use HPCC resources: • Working bugs out of the MPI version. (In progress.) • Performance tuning the MPI version. • Experimenting with a CUDA implementation on the GPGPU cluster. • Goal is to prepare for big, long runs on NERSC and ANL ALCF machines.

  33. What can iCER do for you? • Can work together to pursue grant opportunities, especially in computational physics. • Can provide dedicated computing power via the HPCC buy-in program. • Can provide a cost-free stepping stone to the petascale “leadership class” machines at national labs, Blue Waters, etc.... (No charges for CPU time while debugging and testing on a significant scale.)

  34. iCER Online Resources I • The core of the iCER/HPCC web: • https://wiki.hpcc.msu.edu • iCER Home Page: • http://icer.msu.edu/ • Announcements: • https://wiki.hpcc.msu.edu/x/QAGQ • Calendar: • https://wiki.hpcc.msu.edu/x/q4Cy

  35. iCER Online Resources II • New Account Requests (requires login with MSU NetID and password): • http://www.hpcc.msu.edu/request • Software Installation Requests, Problem Reports, etc... (requires login with MSU NetID and password): • http://rt.hpcc.msu.edu/ • Documentation: • https://wiki.hpcc.msu.edu/x/A4AN

  36. iCER-NSCL Liaising I • Successful partnership between BMB and iCER has produced a domain specialist / HPC programmer who is a liaison between the two organizations. • iCER and HPCC leadership looking to extend this model to other organizations. • Hence this talk. Hence the availability of someone with physics background to be a liaison. • (Work with NSCL-at-large paid for by iCER, not Alex Brown.)

  37. iCER-NSCL Liaising II • Proposed liaison is fluent in C, C++, Python, and Mathematica. Also has some proficiency in Fortran 77/90. • Proposed liaison has experience with parallelization. • Proposed liaison has accounts at several DOE facilities (and always looking to add more elsewhere). Knowing what to expect at the petascale can help preparations at the “decaterascale”.

  38. iCER-NSCL Liaising III • Proposed liaison has superuser privileges on the HPCC clusters. • Can rectify many problems without waiting for a systems administrator to address them. • Can install software without needing special permission or assistance. • Proposed liaison is “in the loop”. Participates in both HPCC technical meetings and iCER organizational meetings.

  39. Thank you. Questions? Discussion.

More Related