120 likes | 220 Views
MetaComputing: An Evaluation of Emerging Systems. David Cronk Brett Ellis Graham Fagg (the PI). Goals. To review current metacomputing systems for the ERDC MSRC Review was to be relative to an MSRC environment To cover:
E N D
MetaComputing: An Evaluation of Emerging Systems David Cronk Brett Ellis Graham Fagg (the PI)
Goals • To review current metacomputing systems for the ERDC MSRC • Review was to be relative to an MSRC environment • To cover: • design, instillation, maintenance, support, usability, performance...
Goals • Two summaries • General for everybody • Specific to the ‘production’ environment at ERDC
What the report is not.. • A perfect review of MetaComputing systems being used to Perform Grand Challenge Applications at SuperComputing XYZ. • We were limited to resources here. I.e. • did not test batch queue systems • multi-site multi-machine ‘Meta-jobs’
The Report • Overview of Globus and Legion • Human factors • Installation and maintenance • OPRs, LDAP, MDA and CAs… • Or when do you know that Legion works? • You want to run Globus without us? • Ease of Use • accounts, logging in, compiling and running MPI jobs… • Assistance and support
The Report • Site Autonomy • Security issues • Resource management • Grid files and local MayIs • System Functionality • The file system • GASS verse context space • Programming Language support • Fault tolerance • when to reinstall
The Report • Performance • We tested a number of LU solvers including one from Scalapack as well as basic MPI performance • Did not test HPF over a meta-mpi… nor F90.
The Report • Performance • Not as expected … and currently under review I.e. Legion will be faster
Summary • Two possible answers • If certain infrastructure already exists at the MSRCs such as Meta-Queuing and single login-in (as in krb5) then they only buy us a global file system.. • Just run AFS ?? • Need to run multiple MPI jobs? • Use MPI_Connect/Pacx/.. • MPI_Connect was used for the joint CGWAVE SC98 challenge using machines at ASC & ERDC
Summary OR: • Cannot declare a verdict until after we: • run we test against real world Meta-Applications that need multiple MPPs at multiple sites on multiple batch queue systems needing data from different repositories… • Grand challenge again.
Status Allowing the different project teams to review current document to allow for correction of our mistakes, and defense of their systems. Next? Doing the Meta-Application stuff hopefully…