180 likes | 335 Views
A JSDL Application Repository Portal for Heterogeneous Grids and the NGS. David Meredith NGS Operations, e-Science Centre, Daresbury Laboratory, UK d.j.meredith@dl.ac.uk. NGS Applications Repository Portal/Portlet Core Functionality. A JSDL Repository :
E N D
A JSDL Application Repository Portal for Heterogeneous Grids and the NGS David Meredith NGS Operations, e-Science Centre, Daresbury Laboratory, UK d.j.meredith@dl.ac.uk
NGS Applications Repository Portal/Portlet Core Functionality • A JSDL Repository: • Search/browse for JSDL (personal and shared) by category of interest (e.g bioinformatics, chemistry, tutorials/examples). Select, load, modify, save. • JSDL documents can be pre-configured and published by domain experts / resource administrators (users benefit from sharing expertise, artefacts and configuration captured in JSDL). • Community formation around a “best practice” approach (OGF). • JSDL GUI Editor: for authoring, validating, sharing, uploading app descriptions. • Grid Operations: File Staging, Application Submission, Monitoring (run either ‘out-of-the-box,’ or modify/tweak as required). • Generic designed to be extensible, can extend to support different Grid middleware technologies and data staging protocols.
JSDL Ali Anjomshoaa, Fred Brisard, Michel Drescher, Donal K. Fellows, William Lee, An Ly, Steve McGough, Darren Pulsipher, Andreas Savva, Chris Smith • JSDL 1.0 is an OGF recommendation • JSDL 1.0 is published as GFD-R-P.56 –http://www.ggf.org/gf/docs/?final • An XML Schema language for describing the requirements of computational jobs for submission to Grids. • Is agnostic of middleware - no dependencies on Globus, WSRF, gLite (means portal can be generic and not tied to any particular set of Grid technologies). • GGF / OGF Standard. • JSDL documents can be validated against the JSDL and JSDL POSIX XSD Schema to ensure its correctness <jsdl:Application> <jsdl:ApplicationName>gnuplot</jsdl:ApplicationName> <jsdl-posix:POSIXApplication> <jsdl-posix:Executable> /usr/local/bin/gnuplot </jsdl-posix:Executable> <jsdl-posix:Argument>control.txt</jsdl-posix:Argument> <jsdl-posix:Input>input.dat</jsdl-posix:Input> <jsdl-posix:Output>output1.png</jsdl-posix:Output> </jsdl-posix:POSIXApplication> </jsdl:Application> <jsdl:Resources> ….
Middleware - GT Middleware - gLite Grid A - NGS Grid B - EGEE Grid Heterogeneity • Different middleware adopt different formats for the description of applicationsand their associated resources (JDL, RSL), and for their subsequent execution to a Grid. • A Number of different data storage resourcesare also relevant for management and transfer of data. e.g. GsiFTP, SRB, SRM, WebDav, (S)FTP.
Grid A Globus RSL (Resource Specification Language) &(executable=$(GLOBUSRUN_GASS_URL)/home/ngs0153/cpi) (arguments= 30 fileA) (jobType=mpi) (environment = (NGSMODULES mpich-gm/1.2.5..10-intel8.1:intel/fce/9.1.032) (TMP /tmp)) (count = 4) (hostCount = 8) (minMemory = 512) (maxWallTime = 3) (directory=/home/ngs0153) (stdin=/home/ngs0153/cpi.in) (stdout=/home/ngs0153/cpi.out) (stderr=/home/ngs0153/cpi.err) Grid B gLite JDL (Job Description Language) Type = "Job"; JobType = "Normal"; RetryCount = 3; Executable ="/home/ngs0153/cpi"; Arguments = "30 fileA"; VirtualOrganisation = "myGridVOproject"; StdInput = "cpi.in"; StdOutput = "cpi.out"; StdError = "cpi.err"; InputSandbox = { "gsiftp://grid-data.rl.ac.uk:2811/home/ngs0153/cpi", "gsiftp://grid-data2.dl.ac.uk:2811/myhome/fileA" }; InputSandboxDestFileName = { "cpi", "fileA" }; OutputSandbox = { "cpi.out" }; OutputSandboxDestURI = { "gsiftp://mygridhome.dl.ac.uk:2811/myhome" }; DeleteOnTermination = { "fileA" }; Environment = { "NGSMODULES=mpich-gm/1.2.5..10-intel8.1:intel/fce/9.1.032", "TMP=/tmp" }; Requirements = ( other.GlueCEInfoLRMSType == "PBS" ) && ( member( GlueCEInfoHostName, {"grid-data.rl.ac.uk:2119" , "mygrid-resource.dl.ac.uk:2119" } ) ) && ( GlueHostProcessorModel == "Intel" ); Rank = -other.GlueCEStateEstimatedResponseTime;
Catering for Grid Heterogeneity • Middleware specific dependencies added at run time - convert the JSDL into middleware specific scheme (e.g. RSL). • Add mw-specific parameters, e.g. RSL JobType (cater for this in JSDL using XML Schema extensions in place of <xsd:any> placeholder elements) • Portal Database has to accommodate all middleware variations. GT2 RSL extension XML schema <?xml version="1.0" encoding="UTF-8"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns="http://www.ggf.org/namespaces/2004/11/jsdl-rsl-1.0.xsd" targetNamespace="http://www.ggf.org/namespaces/2004/11/jsdl-rsl-1.0.xsd" elementFormDefault="qualified"> <xsd:element name="jobType" type="jobType"/> <xsd:element name="gramMyJob" type="gramMyJob"/> <xsd:element name="dryRun" type="boolean" default="no"/> <xsd:element name="save_state" type="boolean" default="no"/>
Applications Repository Portal is ‘open’, free to browse pubic JSDL documents without log-in (free to use JSDL editor). Login required to browse personal applications, save and submit jobs, interact with Grid resources. List jobs, read job descriptions and load a job to initialise the ‘Active Job.’ Changes to the parameters in the GUI will update and validate the JSDL template automatically.
‘My Job’ Detail • Input fields are pre-configured / filled out. • Fields are taken from the JSDL and JSDL-POSIX extension schemas. • POSIXApplication is a JSDL extension. It defines standard POSIX elements. • stdin, stdout, stderr • Working directory • Command line arguments • Environment variables • <POSIXApplication> • <Executable ... /> • <Input ... />? • <Output ... />? • <Error ... />? • <WorkingDirectory ... />? • … • </POSIXApplication>
Environment Variables <jsdl1:Environment name=“TMP">/tmp</jsdl1:Environment> <jsdl1:Environment name="NGSMODULES">envVarValue1</jsdl1:Environment> …..
Command Line Arguments <jsdl1:Argument>fasta34</jsdl1:Argument> <jsdl1:Argument>-H</jsdl1:Argument> <jsdl1:Argument>humanDNA2.input</jsdl1:Argument> <jsdl1:Argument>/var/data/bioinformatics/..</jsdl1:Argument> <jsdl1:Argument>S</jsdl1:Argument> Paste and parse command line arguments (space and/or line separated values)
Named File Systems Named file systems used to declare mount points on the consuming system. File system names are referenced throughout the portal (and JSDL doc) for substituting mount points. Changes to a FS mount point will be updated automatically throughout the portal/JSDL. Used when specifying path info e.g. locations to files/dirs, stage data locations etc. <jsdl:FileSystem name=“WORKINGDIR"> <jsdl:MountPoint>/home/ngs0024/myScratchDir</jsdl:MountPoint> </jsdl:FileSystem> <jsdl:FileSystem name=“DataDir"> <jsdl:MountPoint>/home/ngs0024/myDataDir</jsdl:MountPoint> </jsdl:FileSystem> … <jsdlposix:Output filesystemName="WORKINGDIR"> fasta.out </jsdl1:Output>
Stage Data List of data from across the Grid that should be copied to the consuming system Before job: src URI After job: tgt URI JSDL does not mandate the protocol / URI format. Data is staged relative to named file systems. <jsdl:DataStaging> <jsdl:FileName>Mg.psf</jsdl:FileName> <jsdl:FilesystemName>WORKINGDIR</jsdl:FilesystemName> <jsdl:CreationFlag>overwrite</jsdl:CreationFlag> <jsdl:DeleteOnTermination>false</jsdl:DeleteOnTermination> <jsdl:Source> <jsdl:URI>gsiftp://ngs.rl.ac.uk:2811/apps/Siesta_mpi/…</jsdl:URI> </jsdl:Source> </jsdl:DataStaging>
Candidate Hosts Candidate Hosts: resources that can be used to run the given application. The candidate host list can contain personal and default hosts (available to all users). In future, a RB matchmaking will be used to select execute host from candidate hosts. <jsdl:CandidateHosts> <jsdl:HostName> ngs.rl.ac.uk:2119 </jsdl:HostName> <jsdl:HostName> clyde.dl.ac.uk:2119 </jsdl:HostName> </jsdl:CandidateHosts>
Browse Host / Data Transfer • File and recursive directory transfers between hosts • File and directory operations • Actions for updating application
Technical • JSFv1.1 (Java Server Faces) GUI. • JSR-168 compliant. Vanilla JSF (core spec) is JSR-168 compliant so can host as Web application or portlet within institutional portals (JSF extensions can be problematic). • Spring v2.0 for managing objects in an n-tier server application (highly recommended, adds J2EE to non J2EE apps, e.g. Tomcat/Jetty apps). • Declarative transaction demarcation (akin to EJB 3 session beans). • Data source management (e.g. JPA PstCtx, Hibernate Session). • Propagation of Data Source across DAO’s / session façade’s during long running transactions. • C3p0 pooled database connections. • JPA (Java persistence API) for ORM (object relational mapping). Hibernate 3.2 for domain model (could use Kodo, Toplink, apache openJPA). • CogKit for Globus API from Globus. • Object / Xml data binding framework. XMLBeans / JAX-B.
CURRENT • Staging from more Data Grid + Web protocols (SRB). Browsing / file operations with different data storage resources. Staging across different protocols adds complexity (buffering required). TODO • Parametric jobs (parametric JSDL extension schema – defines parametric variables, functions, ranges for modifying JSDL doc for iteration). • Middleware extensions, e.g. gLite resource broker, JSDL conversion to JDL (aim to use SAGA). • Integrate OMII WHIP artefact sharing framework (gather and bundle remote resources / artefacts together into self contained application bundle, e.g. executable for particular OS, src, input files, data files). • Support Roles / VO’s (for artefact sharing, not just public / personal). • Shibboleth enable. • Describe more apps using NGS Uniform Execution Environment (UEE) - standard way to describe same application across different (NGS) resources – consistent JSDL description with multiple candidate hosts for the same app. • Improvements / refinements (AJAXify)
Please come and find me at the NGS Stand Demo on the OMII booth (2.00pm Wed) https://portal.ngs.ac.uk
Summary • Please contact NGS to request more hosted applications. • JSDL Repository: https://portal.ngs.ac.uk • Search/browse for JSDL (personal and shared) by category of interest (e.g bioinformatics, chemistry, tutorials/examples). Select, load, save application (run either ‘out-of-the-box,’ or modify/tweak as required). • JSDL documents can be pre-configured and published by domain experts / resource admins (users benefit from sharing expertise and artefacts captured in JSDL). • Community formation around a “best practice” approach (JSDL is an OGF recommendation). • JSDL GUI Editor: for authoring, validating, sharing, uploading app descriptions. • Grid Operations: File Staging, Application Submission, Monitoring. • Generic and not tied to any particular set of Grid technologies. Extend to support more middleware and staging protocols.