200 likes | 340 Views
The eMinerals minigrid and the national grid service:. A user’s perspective NGS169 (A. Marmier). Objectives. User Profile Two real resources: eMinerals Minigrid National Grid Service Practical Difficulties Amateurish rambling (discussion/suggestions). User Profile 1.
E N D
The eMinerals minigrid and the national grid service: A user’s perspective NGS169 (A. Marmier)
Objectives • User Profile • Two real resources: eMinerals Minigrid National Grid Service • Practical Difficulties • Amateurish rambling (discussion/suggestions)
User Profile 1 Atomistic modelling community Chemistry/physics/material science Potentially big users of eScience (CPU intensive, NOT data) VASP, SIESTA, DL_POLY, CASTEP …
User Profile 2 Relative proficiency with Unix, mainframes, etc … Scripting parallel programming Note of caution: Speaker might be biased
eMinerals Virtual Organisation, NERC The eMinerals project brings together simulation scientists, applications developers and computer scientists to develop UK eScience/grid capabilities for molecular simulations of environmental issues Grid prototype: the minigrid
eMinerals: Minigrid 3 clusters of 16 pentiums UCL condor pool Earth Science Cambridge condor pool SRB vaults SRB manager at Daresbury
eMinerals: Minigrid philosophy Globus 2 • No Login possible (except one debug/compile cluster) • No easy Files transfer (have to use SRB, see later) • Feels very ‘gridy’, but not painless • Promotes condorG and home wrappers
eMinerals: Minigrid example Universe = globus Globusscheduler = lake.bath.ac.uk/jobmanager-pbs Executable = /home/arnaud/bin/vasp-lam-intel Notification = NEVER transfer_executable = true Environment = LAMRSH=ssh -x GlobusRSL = (job_type=mpi)(queue=workq)(count=4)(mpi_type=lam-intel) Sdir = /home/amr.eminerals/run/TST.VASP3 Sget = INCAR,POTCAR,POSCAR,KPOINTS Sget = OUTCAR,CONTCAR SRBHome = /home/srbusr/SRB3_3_1/utilities/bin log = vasp.log error = vasp.err output = vasp.out Queue My_condor_submit script example
NGS: What ? VERY NICE PEOPLE who offer access to LOVELY clusters Real GRID approximation
NGS: Resources “Data” Clusters: 20 compute nodes with dual Intel Xeon 3.06 GHz CPUs, 4 GB RAM grid-data.rl.ac.uk - RAL grid-data.man.ac.uk – Manchester “Compute” Clusters: 64 compute nodes with dual Intel Xeon 3.06 GHz CPUs, 2 GB RAM grid-compute.leeds.ac.uk - WRG Leeds grid-compute.oesc.ox.ac.uk – Oxford Plus Other nodes : HPCx, Cardiff, Bristol …
NGS: Setup Grid-proxy-init Gsi-ssh … Then, a “normal” machine • Permanent fixed account (NGS169) • unix • queuing system With gsi-ftp for file transfer
NGS: example globus-job-rungrid-compute.oesc.ox.ac.uk/jobmanager-fork /bin/ls globusrun -b grid-compute.oesc.ox.ac.uk/jobmanager-pbs example1.rsl [EXAMPLE1.RSL: & (executable=DLPOLY.Y) (jobType=mpi) (count=4) (environment=(NGSMODULES intel-math:gm:dl_poly))
Difficulty 1: access Well known problem • Certificate • Globus enabled machine • SRB account (2.0)
Difficulty 2: Usability How do I submit a job ? • Directly (gsi-ssh…) • Remotely (globus,condorG) Direct Login, checkq, submit, (kill), logout Different Batch Queuing Systems (PBS, condor,LoadLeveler …)
Usability 2 Usually requires a “script” Almost nobody writes their own scripts Works by inheritance and adaptation At the moment eScience forces the user to learn the syntax of the B.Q.S.
Usability 3 Remote [EXAMPLE1.RSL: & (executable=DLPOLY.Y) (jobType=mpi) (count=4) (environment=(NGSMODULES intel-math:gm:dl_poly)) Ignores file transfer Ignores more complex submit structures
Usability 4 Ignores more complex submit structures • abinit <inp.txt • Cpmd.x MgO.inp => User has to learn globus syntax :o/ (environment and RSL)
Finally At the moment no real incentives to submit remotely Mechanism to reward the early adopters Access to special queues • Longer walltime ? • More cpus ?
CONCLUSION • Submission scripts are very important and useful pieces of information • Easily accessible examples would save a lot of time • Mechanism to encourage remote submission (access to better queues)