Compiler and Tools: User Requirements from ARSC

Compiler and Tools: User Requirements from ARSC Ed Kornkven Arctic Region Supercomputing Center DSRC kornkven@arsc.edu HPC User Forum September 10, 2009

Outline • ARSC and our user community • User issues and eight needs they have in the HPC environment • Conclusions

About ARSC • HPCMPDoD Supercomputing Resource Center, est. 1993 • University of Alaska Fairbanks owned and operated • Provides HPC resources & support • Cray XT5, 3456 cores • Sun cluster, 2312 cores • Supports and conducts research

ARSC User Community • An HPCMP DoD Supercomputing Resource Center • Support of DoD computational research priorities • Open research (publishable in open research journals) • Non-DoD academic research • ARSC supports high performance computational research in science and engineering with an emphasis on high latitudes and the Arctic • In-house research • Oceanography, space physics • Heterogeneous computing technologies, multicore systems • ARSC supports about 300 users, HPCMP about 4000

HPCMP Application Areas • HPCMP projects are defined by ten Computational Technology Areas (CTAs) • Computational Structural Mechanics; Computational Fluid Dynamics; Computational Biology, Chemistry and Materials Science; Computational Electromagnetics and Acoustics; Climate/Weather/Ocean Modeling and Simulation; Signal/Image Processing; Forces Modeling and Simulation; Environmental Quality Modeling and Simulation; Electronics, Networking, and Systems/C4I; Integrated Modeling and Test Environments • These CTAs encompass many application codes • Mostly parallel, with varying degrees of scalability • Commercial, community-developed and home-grown • Unclassified and classified

HPCMP Application Suite • This suite is used for various benchmarking uses including system health monitoring, procurement evaluation and acceptance testing • Contains applications and test cases • Composition of the suite fluctuates according to current and projected use • Past apps include WRF • Significance: Believed to represent the Program’s workload

HPCMP Application Suite

ARSC Academic Codes • Non-DoD users’ codes have similar profiles • Many are community codes • E.g., WRF, ROMS, CCSM, Espresso, NAMD • Also some commercial (e.g., Fluent) and home-grown • Predominantly MPI + Fortran/C; some OpenMP/hybrid

Need #1 • Protect our code investment by supporting our legacy code base • MPI-based codes will be around for a while • Some are scaling well, even to 104 cores (our largest machines) • Many are not – lots of users still use 102 cores or fewer • Some single-node codes might be able to take advantage of many cores

Parallel Programming is Too Unwieldy • Memory hierarchy stages have different “APIs” • CPU / registers – mostly invisible (handled by compiler) • Caches – code restructuring for reuse; possibly explicit cache management calls; may have to handle levels differently • Socket memory – maintain memory affinity of processes/threads • Node memory – explicit language features (e.g. Fortran refs/defs) • Off-node memory – different explicit language features (MPI calls) • Persistent storage – more language features (I/O, MPI-IO calls) • Other things to worry about • TLB misses • Cache bank conflicts • New memory layers (e.g. SSD), effect of multicore on memory performance, …

Need #2 • Help with the complexity of parallel programming, esp. managing memory • State-of-the-art is to be an expert in • Architectural features (which constantly change) • Multiple languages (Fortran, MPI, OpenMP) • Performance analysis tools • Coding tricks (which depend on architecture)

Q: Why do so few of our users use performance tools? Does the average user have no incentive? -- or – Have they given up because it seems too difficult?

Need #3 • Users need to understand what the “performance game” is and they need tools to help them win. • Remember the days of “98% vectorized?” • What expectations (metrics) should users have for their code on today’s machines? (It must not be utilization.) • What will the rules be in a many-core world?

Beyond Fortran & MPI • We do have some codes based on other parallel models or languages, e.g. • Charm++ -- NAMD, ChaNGa • Linda – Gaussian (as an optional feature) • PetSc – E.g., PISM (Parallel Ice Sheet Model) • These illustrate some willingness (or need) in our community to break out of the Fortran/MPI box However: The pool of expertise outside the box is even smaller than for MPI.

HPCMP Investments in Software • HPCMP is investing in new software and software development methodologies • E.g., The PET and CREATE programs • User education • Modern software engineering methods • Transferable techniques and/or code • Highly scalable codes capable of speeding up decision-making ability

“New” Programming Models • At ARSC, we are interested in PGAS languages for improving productivity in new development • Have a history with PGAS languages • Collaboration with GWU team (UPC) • Experience with Tsunami model: • Parallelization using CAF in days vs. weeks w/ MPI

Need #4 • High-performing implementations of new programming models • For starters, timely implementations of co-array features of Fortran 2008 • Users need some confidence that their investments in these languages will be safe since their codes will outlive several hardware generations and perhaps the languages themselves.

Beyond Fortran & MPI • Heterogeneous processors • Had a Cray XD1 with FPGAs • Very little use • Cell processors and GPUs • PlayStation cluster • IBM QS22

Need #5 • Easier code development for heterogeneous environments. • Cell processors, GPUs and FPGAs have tempting performance, but • For most users the effort required to use these accelerators is too high. • Work underway in these areas is encouraging.

Multicore Research In collaboration with GWU, we are seeking to better understand multicore behavior on our machines. Codes based on Charm++ (NAMD and ChaNGa) performed better on our 16-core nodes than the MPI-based codes we tested.

Need #6 • We need models and methods to effectively use many cores. • Who doesn’t? • Could potential of many core processors go untapped? • Vector processors weren’t universally accepted because not all apps were a good fit. • If users don’t find a fit with many cores, they will still need to compute. • It’s up to CS, not users, to make multicore work.

Need #7 • Corollary to other requirements: Provide new avenues to productive development, but allow it to be adopted incrementally. • Probably implies good language interoperability • Tools for analyzing code and giving advice, not just statistics • Automatically fix code or, show where the new language will most help

Users Run Codes • Our users want to do science. • For many users, code development is a negligible part of their HPC use. • For all of them, it isn’t the main part. • Most will spend more time running programs than writing them.

Need #8 • Users need help with the process of executing their codes. • Setting up runs • Launching jobs • Monitoring job progress • Checkpoint/restart • Storing output (TBs on ARSC machines) • Cataloguing results • Data analysis and visualization

Conclusion • We are ready for new parallel programming paradigms. • Much science is being done with today’s machines, so “First do no harm” applies. • There are still plenty of opportunities for innovators to make a difference in making HPC more productive for users.

Compiler and Tools: User Requirements from ARSC