510 likes | 618 Views
WS – PGRADE Tutorial MTA SZTAKI Laboratory of Parallel and Distributed Systems (LPDS). M. Kozlovszky Research fellow m.kozlovszky@sztaki.hu. www.lpds.sztaki.hu. Contents. History, family of products P-GRADE Portal, WS-PGRADE, gUSE WS-PGRADE in a nutshell
E N D
WS– PGRADE TutorialMTA SZTAKILaboratory of Parallel and Distributed Systems (LPDS) M. Kozlovszky Research fellowm.kozlovszky@sztaki.hu www.lpds.sztaki.hu
Contents • History, family of products • P-GRADE Portal, WS-PGRADE, gUSE • WS-PGRADE in a nutshell • Architecture, high level structures, Workflow handling • WS-PGRADE features • Parameter Study (PS) support in WS-PGRADE • Generator-job-collector type PS • Xconnect-DOTconnect type PS • Repository • Internal Storage (home dir) • Automated workflow submission & Timing • Web services • Embedded Workflows • DB support • Conditional job submission • Howto use WS-PGRADE /Basic + Advanced tasks/ • Case studies • CancerGrid portal, TINKER application
Family of P-GRADE Portal products • P-GRADE portal • Creating (basic) workflows and parameter sweeps for clusters, service grids, desktop grids • www.portal.p-grade.hu • P-GRADE/GEMLCA portal (University of Westminster) • To wrap legacy applications into Grid Services • To add legacy code services to P-GRADE Portal workflows • http://www.cpc.wmin.ac.uk/cpcsite/gemlca • WS-PGRADE • Creating complex workflow and parameter sweeps for clusters, service grids, desktop grids, databases • Creating complex applications using embedded workflows, legacy codes and community components from workflow repository • www.wspgrade.hu
Motivations of creating gUSE • To overcome (most of)the limitations of P-GRADE portal: • To provide better modularity to replace any service • To improve scalability to millions of jobs • To enable advanced dataflow patterns • To interface with wider range of resources • To separate Application Developer view from Application User view WS-PGRADE (Web Services Parallel Grid Runtime and Developer Environment) and gUSE (Grid User Support Environment) architecture
References and WS-PGRADE/gUSE installations • WS-PGRADE Portal service is available for • GILDA - Training VO of EGEE and other projects • SEE-GRID - South-Eastern European Grid • VOCE - Virtual Organization Central Europe of EGEE • HunGrid - Hungarian Grid VO of EGEE • NGS - UK National Grid Service • Desktop Grids: SZTAKI DG • Users and projects using WS-PGRADE/gUSE • EDGeS project (Enabling Desktop Grids for e-Science) • Integrating EGEE with BOINC and XtremWeb technologies • User interfaces and tools • ProSim project • In silico simulation of intermolecular recognition • JISC ENGAGE program • University of Westminster Desktop Grid • Using AutoDock on institutional PCs • CancerGrid project • Predicting various properties of molecules to find anti-cancer leads • Creating science gateway for chemists
WS-PGRADE in a nutshell • General purpose, workflow-oriented portal. Supports the development and execution of workflow-based applications • Based on GridSphere • Services supported by the portal: • New functionalities • Web services • DB connectors • Embedded workflows • Job level PS • Conditional jobs • Recursive graph • Multi-generator • Multi-collector • CROSS product PS • DOT product PS WS-PGRADE + gUSE = P-GRADE = Solves service+desktop Grids/clusters interoperability problems at workflow level
WS-PGRADE architecture Graphical User Interface: WS-PGRADE Gridsphere portlets gUSE Filestorage Workflowstorage Filestorage gUSEinformationsystem Autonomous Services: high level middleware service layer WorkflowEngine Applicationrepository Submitters Submitters Submitters Submitters Meta-broker Logging Resources: middleware service layer Local resources, service grid VOs, Desktop Grid resources, Web services, Databases
Concrete Workflow Graph Template Workflow Instance Repository Item Algorithms, executableResource references,Inputs Jobs,Edges,Ports Constraints,Comments,Form Generators Running state,Outputs Application ORProject OR,Workflow part(G,T,CW) Important high-level graph structures in WS-PGRADE Legend:aba must reference baba may reference b
1. Create and edit the Graph structure Concrete Workflow Graph2 Workflow Instance Repository Item Graph Template Constraints,Comments,Form Generators Jobs,Edges,Ports Running state,Outputs Application ORProject OR,Workflow part(G,T,CWI) Algorithms,Resource references,Inputs Jobs,Edges,Ports Observe,Download,Suspend,Delete Edit, CopyDelete Edit New 5. Add name to the template and configure it (job, port, resources, etc.) Workflow handling by Developer New 4. Restrict some parameters/features of the workflow and create a template 6. Export workflow, template, graph, to the repository (reuse it later by you or by end users) New New Configure,Copy, Delete Export Import New Submit 2. Add name to the graph and configure it (job, port, resources, etc.) 7. Reuse an Application, Template, graph or Workflow from repository 3. Submit the Concrete Workflow, observe its status and fetch its result 8. Redefine the originally used graph within the workflow
Concrete Workflow Template Workflow Instance Repository Item Algorithms,Resource references,Inputs Constraints,Comments,Form Generators Running state,Outputs Application ORProject OR,Workflow part(G,T,CW,WI) Observe,Download,Suspend,Delete Workflow handling by end user 2. Set the available parameters on the end-user View Configure,Delete Import 3. Submit the Concrete Workflow, observe its status and fetch its result Submit 1. Import an Application/Template
Contents • History, family of products • P-GRADE Portal, WS-PGRADE, gUSE • WS-PGRADE in a nutshell • Architecture, high level structures, Workflow handling • WS-PGRADE features • Parameter Study (PS) support in WS-PGRADE • Generator-job-collector type PS • Xconnect-DOTconnect type PS • Repository • Internal Storage (home dir) • Workflow activation & Timing • Web services • Embedded Workflows • DB support • Conditional job submission • Howto use WS-PGRADE /Basic + Advanced tasks/ • Case studies • CancerGrid portal, TINKER application
Parameter Study (PS) support in WS-PGRADE • Generator-job-collector type PS • Not totally similar concept as in P-GRADE • P-GRADE: workflow level PS • WS-PGRADE: job level PS • Xconnect-DOTconnect type PS • X (cross) product • . (dot) product
Repository usage & services • Share with others the developed • Graphs • Workflows - Configured graphs • Templates - Restricted workflows • Applications - Semantically tested, trusted Workflows containing the definitions of the eventually referenced Embedded Workflows (including their transitive closures) • Projects - Applications which are not ready/fully functional yet • Build up a science/application gateway for end-users • Restrict certain workflow parameters • Provide easy workflow configuration/parameter setup interface • Repository is acting like an interface between developers and end users Note: Exported/imported “Projects” are not checked for completeness and correctness “Applications” are checked. Name collisions are checked and handled by the system.
Repository Export Step 1:The button Export of the requested WF is selected Step 2:Decide about the way of exporting (in case of Application and Project the Embedded Workflows will be exported with) Step 3:Enter free input text which appearing in the Import List will describe and identify the object for the user. Step 4:Execute the Export Way of working is similar in case of Graph and Template just “Export type” can not be select there
Repository import views • Materials are classified by their type • Latest available lists • Application • Project • Concrete • Template • Graph • To import
Internal Storage (user workspace) Information about the quota of the user allotted storage capacity in the Portal server Columns of individual instances, please note, that outputs can be downloaded separately Inner slider to encounter and access each Instances Columns of bulk download: All or proper parts of all instances of a given WF can be downloaded
Upload Implementation to the internal Storage Step 4: Confirm the operation Step 1: Select the compressed file in the client machine containing the requested Workflow Step 3 (If Step 2 performed): Enter the new name(s) which will not collide Step 2(option): Check the kind of name(s) you want to redefine
Workflow submission A workflow can be submitted by 3 different way: • Interactively started by the user hitting the button Submit belonging to the given concrete workflow on the portlet Workflow/Concrete (default) • Started by a –crontab like - predefined time schedule.The corresponding timetable can be set in the portlet Workflow/Timing • Started by an external event. The corresponding event to be waited for and the name of the Workflow can be defined in the portlet Workflow/Remoting
Workflow Submission(1)Interactively by the user Step 1: The workflow is selected by button “Submit” Step 2: The submission can be confirmed or refused after the optional filling of a free description field identifying Workflow Instance for the user
Workflow Submission(2)Start automatically by an internal timer • Start a workflow at a predefined time • Valid certificate should be available (if grid is used) • Crontab like working • Single run Checklist to select an existing Workflow to be added Confirm button to append a new element to the list Tool to define submission time Definition of a new element to be added List of scheduled Workflows Item can be revoked deleting it from the list The items will leave the list on schedule time expiration even if the Workflow submission has failed by any cause
Workflow Submission(3)Start by external event Two parts must be defined: • On the Portal Server Side- the name of the Workflow and - the identifier “Remote Key” • On the Client Side inside of a WSDL description- the URN of service call- the name of the service- the owner of the workflow (user)- the identifier “Remote Key”- a free string to be associated to Workflow Instance Note, that a Service call referencing a common Remote Key used in more than one Portal Server Side description submits all the associated Workflows
Workflow Submission(3)Start by external event – Portal Side Check list can select each of the available workflows Name of WF to be appended to the list of remotely callable WF- s Definition of Keyword to identify the call Append button List of listening Workflows Any Workflow can be deleted from the listeners.The delete button can accessed only in the not hidden state Button to hide the stored Keyword and the button Delete Button to see a stored Keyword
Workflow Submission(3) Start by external event – Client side: WSDL URL of the Portal SERVER Predefined name of the Service The client should provide a Service call conforming the WSDL.
Concrete WF List Selected WF Selected Job Job Executable JDL/RSL Job is Workflow Job is Service Job is Binary Job Config. History Job Inputs & Outputs Web service support Principle: • Job is a web service user can reach an existing remote service with the following attributes: • Type:Base standard type of the Web Service. The administrator of the portal server sets the list of standards the portal can understand. The default value is Axis. • Services:Selection list of services of the given type having been explored by the Portal Server • Methods:Selection list functions the selected service implements
Web service support (contd.) Step 1. Select Service as Job interpretation class Step 2. Select type of Service to be understood Step 3. Select one from the found services Step 4. Select a Method among the interface routines of Service
Embedded workflow support Principle: Job is an embedded workflow Embedded Workflow To ensure the compatibility of interfaces the embedded workflow must be defined by a Template The dummy job whose execution will be substituted by the call of the embedded one Original Workflow
Embedded workflow support Focus on the caller Job Step 1. Select (embedded) Workflow as Job interpretation class Step 2. Embedded (called) Workflow is selected by the check list showing the possible templated Workflows Focus has been set by selecting a job
External File name Upload Person Name Gender Age A M 15 K F 26 PASSWORD USER SQL URL (UDBC) <DB_URL> L M 24 <any_user> Departments <any>.mdb Age from Person where Gender =“M” SELECT 1 0 15 24 Remote SQL Value DB support (Datasource: SQL Database) <Db_URL> Step 2 (online) Port configuration:set Query Step 3(online) Port configuration:File generation from result set Step 1 (offline) Create Database on a remote site
B1 Do not execute if B1 AND B2 = FALSE B2 Conditional job submission Rule: To any input port a boolean conditional expression may be attached.They will be evaluated upon the value(s) of the referenced port(s). The false value inhibits the execution of the actual and subsequent jobs.The state of the failed job is indicated as “Term_false” the eventual subsequent jobs will remain in “Init” state. Expression Syntax: <File value> <Operator> <Constant string> | <File value> <Operator> <reference of a foreign port>where <Operator>: == | != | contain The comparison of the file value (regarded as a string) associated to the current port and of other operand which may be a string or the string file content of a different port results true or false depending on the string operation, where == means equal, != means not equal, and contain means, that the left operand contains the right operand. AND operation is assumed for more than one ports associated with conditional expressions:
Howto use WS-PGRADE Basic + Advanced tasks • Create Graph, or reuse an existing one • Create Concrete Workflow from Graph • Setup Concrete Workflow (ports, jobs) • Submit and check the running Workflow • Create template and share it as application through the repository
There are two ways to create a Workflow Graph (WfG): Opening the Graph Editor (with the proper button) in the portlet Workflow/Graph Clone an existing one (Saving the actual Graph with a new name) Graph Editor - Graph Creation
Graph Editor detailed - Build up a WF Graph Port Property New Port Right Click New Job Select Output Port name can be changed Comment can be inserted Close by OK ,and drag to Input Port Press mouse… New Job New Port
Howto use WS-PGRADE Basic + Advanced tasks • Create Graph, or reuse an existing one • Create Concrete Workflow from Graph • Setup Concrete Workflow (ports, jobs) • Submit and check the running Workflow • Create template and share it as application through the repository
Development cycle – Creating the Concrete Workflow • Create Concrete Workflow from Graph, template, different workflow • give a name to the Concrete Workflow • give notes
Howto use WS-PGRADE Basic + Advanced tasks • Create Graph, or reuse an existing one • Create Concrete Workflow from Graph • Setup Concrete Workflow (ports, jobs) • Submit and check the running Workflow • Create template and share it as application through the repository
Setup Concrete Workflow • Setup all the job properties • Execution model: • Workflow, Service, Binary • Type:GEMLCA,GLITE,DG,GT-4,GT-2,Local • Grid resource • Type of binary • SEQ, MPI, Java • Number of MPI Nodes • Executable • Additional parameters • Setup all the port properties
Concrete WF List Selected WF Selected Job Job Executable JDL/RSL Job is Workflow Job is Service Job is Binary Job Inputs & Outputs Job Config. History Workflow configuration Hierarchy
Note the inner slider:By moving it you can encounter –and make visible – any Port of the current job Workflow Configuration; step-by-step Select Configure Select a job by mouse click Fill the job property characteristics. Details have discussed previously. Select Port Property Configuration Fill port property characteristics. Details have discussed previously. Select JDL/RSL Configuration Return to main view Insert a definition Select one of the JDL/RSL Configuration Parameters of the list box Save & Upload the Workflow configuration. Remind eventual error messages! Close the configuration of this job Confirm the settings
Howto use WS-PGRADE Basic + Advanced tasks • Create Graph, or reuse an existing one • Create Concrete Workflow from Graph • Setup Concrete Workflow (ports, jobs) • Submit and check the running Workflow • Create template and share it as application through the repository
Info, Submit, Delete/Abort All Job Job Instance List Job Instance Workflow instance List Output File Job List Log File Job List WF visualization STD Out File STD Error File Concrete WF List JDL/RSL file Job Exec Conf Workflow Selected Job History File Basic Workflow management with WS-PGRADE Portal Locate Item Details Configure Locate Item in graph Locate Item Job Executable Job Config. History Suspend/Resume/Delete, In case of PS Workflows the list “Job Instance” may contain more than one elements with different “PID” -s JDL/RSL Job Inputs & Outputs Workflow Instance Port Configurations Short Details Visualize Locate Item View Contents Locate Item Log Output Std Err Std Out
Howto use WS-PGRADE Basic + Advanced tasks • Create Graph, or reuse an existing one • Create Concrete Workflow from Graph • Setup Concrete Workflow (ports, jobs) • Submit and check the running Workflow • Create template and share it as application through the repository
Contents • History, family of products • P-GRADE Portal, WS-PGRADE, gUSE • WS-PGRADE in a nutshell • Architecture, high level structures • Workflow handling • By developers • By end users • WS-PGRADE features • Parameter Study (PS) support in WS-PGRADE • Generator-job-collector type PS • Xconnect-DOTconnect type PS • Workflow timing • web service support • Embedded workflows • DB support • Basic Workflow management with WS-PGRADE Portal • Repository usage • Principles of Workflow management • Case studies • ProSim portal, CancerGrid portal, TINKER application
The CancerGrid project (Case Study 1) • EU Framework Program 6 • Title: Grid Aided Computer System For Rapid Anti-Cancer Drug Design • Project period • January 1, 2007 – December 31, 2009 (+ extended period) • Goals: • Developing focused libraries with a high content of anti-cancer leads, building models for predicting various molecule properties • Developing a computer system based on grid technology, which helps to accelerate and automate the in silico design of libraries for drug discovery processes
The CancerGrid infrastructure Portal Local jobs DG jobs 3GBridge LocalResource Job 1 Job 2 Job N browsing molecules executingworkflows BOINCserver PortalStorage WU 1 WU 2 WU N BOINC client GenWrapper forbatch execution WU X LegacyApplication WU Y moleculedatabase LegacyApplication Portal and DesktopGridserver DG clients from all partners Molecule database server
The CancerGrid portal (gUSE & SZTAKI DG) Workflow execution Workflow development & configuration Integrated components of CancerGrid portal Algorithms configuration Molecule database browser Structure viewer
TINKER Conformer Generator (Case study 2.)P-GRADE/gUSE Workflow- background • The application uses the open-source TINKER library, which contains tool for molecular design: • http://dasher.wustl.edu/tinker/ • The TINKER molecular modeling software (written in Fortran language) is a complete and general package for molecular mechanics and dynamics, with some special features for biopolymers.
Problem: The application to be gridified • A combination of these tools can be used formolecular modeling (so-called QSAR studies for drug development). The original application was developed by a Hungarian biochemist Ferenc Ötvös, PhD, from the Biological Research Center, Szeged, Hungary. • Target end users are biologists or chemists, who need to examine conformers with the TINKER package. • The application generates conformers by unconstrained molecular dynamics at high temperature to overcome conformational bias then finishes each conformer by simulated annealing and/or energy minimization to obtain reliable structures. • The aim is to obtain conformation ensemble(s) to be evaluated by multivariate statistical modeling.
n = 50.000 Generate 1000 K trajectory snapshots START Read input T1 Ti Tn Peptide definition: Sequence, Chirality Parameter File 1 minimize TM1 Parameter File 2 300 K dynamics minimize TD1 TDM1 Parameter File 2 Parameter File 3 Simulated annealing minimize TSA1 TSAM1 Parameter File 2 Parameter File 4 Problem: The application to be gridified (contd.) • The sequential application generates 50.000 conformers and executes TINKER algorithms on them. It runs for about 7 days on a 2 GHz single CPU 1 GB memory machine.
Gridification process… The gridified P-GRADE workflow • In the original sequention application, the TINKER algorithms are called by Bash unix scripts, which are also applicable in EGEE grids. • As it can be seen from the previous figure: • The sequential generation of 50.000 conformers cannot be parallelized. • The rest of the application performs 5 algorithms on each conformers in 3 threads, which can run in parallel. The gridified gUSE workflow
Data handling • The input TINKER package is 4.5 MBs. • The generator job (runs for around 15 hours) creates 50 tarballs containing 1000 conformers each, and the TINKER package. Each one of these files are 6.1 MBs. • Offline runtime and output of the PS jobs: • First (dyn300k+min300k): ~40 min, 2.2 MBs • Second (min1000k): ~35 min, 1.1 MBs • Third (Simann+min): ~60 min, 2.2MBs • The final output file is around 300 MBs. • The generator and collector input and output files are stored in storage elements. • Results: • Tested in 3 VOs (VOCE, SEE-GRID-SCI, Biomed) • A significant speedup with 7 times faster execution achieved in average.