Hercules Development Status

Hercules Development Status John Urbanic ASTA Presentation July 16th, 2009

A Brief Overview Of Our Experiences and Expectations • What Science are we trying to accomplish? • What does our code do? • How well does it do it? • Scalability Issues • Debugging Issues • What are we shooting for next? • Credits and Partners

Our Special Thing: Wavelength-Adaptive Tetrahedral Mesh

FE Approach • Allows us to scale well despite disparities in element sizes • Forces us to deal with complex domain decompositions • If done right, we end up with one of the best ingredients for scalability: almost completely nearest-neighbor communications • Processing each element requires some pointer chasing: we can not lay out data contiguous in memory. • Makes cache use complicated

ShakeOut Results and Verification

Scaling Hotspots • Solver portion is almost perfectly scalable due to nearest neighbor communication • Exception is IO code in that loop. No machine has completely scalable hardware. • Stations: easy. A few dozen points. • Planes: could be one small area, or a bunch stacked to make volumes. Resolution determined by user requirements, not generally simulation res. • Volumes: no complete volume dumps for now.

Scaling IO • To make IO as scalable as possible on any existing system, you must accommodate limitations/configuration of output devices. We have done this by • Spreading writes out from many app nodes • MPI’ing that data to many disk writing nodes • Doing that in variable packet sizes • Limiting number of outstanding packets • All of these are easily configurable.

Scaling Issues • IO System • Reliability issues have caused us to scale back on IO for these runs • Not doing volumes • Limited number of planes (we would like 1000’s) • We are also not going to force through any epic (8h, 64K PE) runs just yet. • Network • Looks great thus far, but is seemingly affected by Lustre accesses by other jobs

Hercules Scaling Weak Scaling Strong Scaling Seconds Processors Processors Data gathered using Kraken at NICSfor a ShakeOut-type problem

Debugging Issues • Lustre misdirection. If you see this: • [2048] MPICH has run out of unexpected buffer space. • Try increasing the value of env var MPICH_UNEX_BUFFER_SIZE (cur value is 62914560), • and/or reducing the size of MPICH_MAX_SHORT_MSG_SIZE (cur value is 50000). • aborting job: • out of unexpected buffer space • [NID 4867]Apid 949492: initiated application termination • Application 949492 exit codes: 255 • Application 949492 exit signals: Killed • What would you try next?

Debugging Issues • Wrong answer: • Start playing with MPICH_UNEX_BUFFER_SIZE or MPICH_MAX_SHORT_MSG_SIZE. These will get you nowhere, but will confuse you greatly when they seem to do something due to job load variations. • Answer: • Rerun when machine is less loaded (jobcount seems to be metric here) and it will likely disappear.

Debugging Issues • To be fair, you can easily (and will likely) generate legitimate MPI buffer issues as you scale up. A few helpful hints: • Do not take Cray “suggestion” messages too seriously, as per the above. • The primary documentation on the growing list of effective environment variables is “man mpi”. • This documentation is not entirely correct or self-consistant, so if you spot/encounter seemingly contradictory behavior, just note it and more on.

Totalview • Has been very useful to us for the past several years • Used at scale of 4-256 cores • We have had an outstanding offer to try to debug with 16K cores that we have not yet required – but may very soon • This requires a unique license on Kraken, but Totalview guys are willing to work around that for us, and probably for you

Next Steps • Fix large source issues • Scale to 64 and 128K on Kraken and Intrepid • Implement Non-linear soil response

Regional NonlinearSoil Response Study Case: The Euroseistest in the Volvi area in Thessaloniki, Greece von Mises and Drucker-Prager Material Models incorporated in Hercules with a explicit solution method

Results: Elastic v. Elastoplastic Synthetics Stress-Strain relationships in time

Non-Linear Implementation • These appear as conditional exceptions in solver kernel • Would be very expensive change if our elements were well vectorized • They are not, due to irregular nature of mesh, so we don’t pay much of a performance price

Credits Whole SCEC group has been valuable on many fronts, but these immediate results are attributable to Jacobo Bielak, Julio Lopez, Leonardo Ramirez Guzman, Haydar Karaoglu and doubly so to Ricardo Taborda (for his pretty graphics as well as expertise).

Hercules Development Status

Hercules Development Status

Presentation Transcript

Hercules.

HerCULES

HERCULES

Hercules

Hercules

Hercules

Hercules

Hercules

Hercules

Hercules

Hercules

HERCULES

Hercules

Hercules

Hercules

Hercules

Hercules

Hercules

Hercules

Hercules

HERCULES

HERCULES