170 likes | 302 Views
Using APST to Run Parameter Sweeps on the Grid. Jim Hayes Grid Research and Innovation Laboratory (GRAIL). Parameter Sweep Applications. “Many” tasks Tasks vary in switches and/or input files Minimal inter-task communication Typically produce files as output
E N D
Using APST to Run Parameter Sweeps on the Grid Jim Hayes Grid Research and Innovation Laboratory (GRAIL)
Parameter Sweep Applications • “Many” tasks • Tasks vary in switches and/or input files • Minimal inter-task communication • Typically produce files as output • Found in bio-informatics, neuroscience, computer graphics, discrete-event simulations, protein folding, database searches, etc.
PSAs Map Well to Grid • Can effectively use huge amounts of resources • Flexibility in task assignment • Latency tolerant—little communication • Fault tolerant—restarting a task is sufficient
Grid Execution Can Be Hard • Available infrastructure varies • Some infrastructures have steep learning curves • Grid is a mix of batch & interactive systems • Changing Grid environment complicates task/resource mapping • PSAs on Grid require much bookkeeping—task location/status, input file locations, output tracking, etc.
APST Eases PSA Path to Grid • Provides a single interface to multiple infrastructures • Lowers the learning curve • Does intelligent mixing of batch & interactive • Incorporates dynamic resource info into smart (re)scheduling • Handles bookkeeping
Application Tasks Application Data Files APST Example – EOL Project Analysis Apps (e.g., psiblast, 123d) Postprocessing Preprocessing Genome Data Sequences Analysis Output Data Base
horizon.sdsc.edu AIX/LoadLeveler saxicolous.sdsc.edu Linux/PBS morpheus.engin.umich.edu Linux/PBS {multivac/nbcr3/nbcr4/nbcr5/nbcr6}.sdsc.edu Solaris APST Example – EOL Platform
APST System • Daemon (apstd) schedules tasks, stages input, spawns processes, returns output • Client (apst) controls/monitors daemon • Daemon is user agent (single-user) • Resource/task spec via XML
APST Infrastructure Support • Compute: GRAM, SSH • Storage: GASS, FTP, SCP, SFTP, SRB • Batch: Condor, LL, LSF, PBS • Meta-data: MDS, NWS, self-generated
APST XML • Used for all resource/task descriptions • Growing familiarity as a “common language” • Availability of editing tools • Design philosophy: easy things should be easy (and brief); hard things should be possible • Primary tags <storage>, <compute>, <tasks> • <files>, <gridinfo> used in special circumstances • Most projects produce XML via application-specific scripts/GUI
APST XML - Storage <storage> <disk id=‘myDisk’ datadir=‘${HOME}/myData’> <ftp|gass|ftp|local|sftp|srb server=‘blue.ufo.edu’/> </disk> </storage>
APST XML - Compute <compute> <host id=‘myHost’ disk=‘myDisk’> <globus|local|ssh server=‘blue.ufo.edu’/> <condor|loadleveler|lsf|pbs|shell/> </host> </compute>
APST XML - Tasks <tasks> <task executable=‘myProgram’ arguments=‘arg1 arg2 arg3’ input=‘infile1 infile2’ output=‘outfile1 outfile2’ /> </tasks>
APST Advanced Features • Site-specific executables/paths • Task priority • Direct tasks to a host/subset of hosts • Task-specific working directories • User estimate of task “cost” for scheduling
Major Projects Using APST • Encyclopedia of Life (eol.sdsc.edu) • This is an ambitious project seeking to catalog the complete proteome of every living species in a flexible, powerful reference system. This includes calculating three-dimensional models and assigning biological function for all recognizable proteins in all currently known genomes. • Mcell (www.mcell.cnl.salk.edu) • This is a general simulator for cellular microphysiology. MCell uses Monte Carlo diffusion and chemical reaction algorithms in 3D to simulate the complex biochemical interactions of molecules inside and outside of living cells.
APST Status • APST v2.1.1, released 7/29/03, includes all of the features discussed • APST v2.2 to include task spec shortcuts, GSI client/daemon communication, Ganglia support, bulk file transfer, SGE support, jApst visual client • Software, tutorial, FAQ, man pages, XML DTD available on-line
For more information • http://grail.sdsc.edu/projects/apst • apst@sdsc.edu, apst-users@sdsc.edu • jhayes@cs.ucsd.edu, casanova@cs.ucsd.edu