190 likes | 272 Views
Integrating OGSA-DAI to computational Grid workflows. Tamas Kukla, Tamas Kiss , Gabor Terstyanszky University of Westminster, UK Peter Kacsuk MTA SZTAKI, Hungary. Motivation I. What advantages does the integration of databases to workflow solutions provide?.
E N D
Integrating OGSA-DAI to computational Grid workflows Tamas Kukla,Tamas Kiss, Gabor Terstyanszky University of Westminster, UK Peter Kacsuk MTA SZTAKI, Hungary
Motivation I.What advantages does the integration of databases to workflow solutions provide? • Several successfulGrid workflow systems (e.g. Taverna, Triana, Kepler, P-GRADE) • composition, orchestration and execution of computationally intensive processes • Limiteddata handling capabilities of Grid workflow solutions • Restricted mainly to file based data • No or very limited database access
FS2 FS1 Motivation II. Workflow level interoperation of grid data resources Grid 1 Grid 2 Workflow engine DB1 J1 J3 J2 J4 DB2 J5 J: Job FS: File storage system, e.g. SRB or SRM DB: Database management system
Why OGSA-DAI? • Open Grid Services Architecture Data Access and Integration project is concerned with constructing middleware to assist with access and integration of data from separate data sources via the grid. • An engineered extensible framework for data access and integration. • Expose heterogeneous data resources to a grid through web services. • Interaction with data resources: • Queries and updates. • Data transformation / compression • Data delivery. • Customise for your project using • Additional Activities • Client Toolkit APIs • Data Resource handlers • A base for higher-level services • federation, mining, visualisation,… Source: GGF 16, Feb 2006 by Neil Chue Hong
OGSA DAI integration aspectsData staging Legend s: Data gathering request specification • Static:databases can be accessed before and after workflow execution, but they cannot be accessed at runtime • Semi-dynamic:data is accessed during workflow execution, but the parameters of the OGSA-DAI request are already specified before execution and cannot be generated at runtime • Dynamic: access the databases at runtime and the parameters of the request are also generated during workflow execution s e s e e: Data gathering request execution s: Data uploading request specification s s e e e: Data uploading request execution s e s e
WF Management System Auxiliary tool WF editor WF engine Workflow composition Workflow execution OGSA DAI integration aspectsSubject of OGSA-DAI integration • Auxiliary tool:the workflow management system is extended with an auxiliary tool (typically a portlet) • Workflow editor:enables the workflow editor to be capable of communicating with databases exposed via OGSA-DAI services – data gathering during workflow authoring • Workflow engine: the workflow engine is enhanced to be able to execute the OGSA-DAI requests
Port level representation:OGSA-DAI request is represented as either an input or an output port of a node data access OGSA-DAIservice OGSA-DAI port • Node level representation: request is represented as a workflow node that submits it to the OGSA-DAI service and receives the results data access OGSA-DAIservice OGSA-DAI node OGSA DAI integration aspectsRequest representation
OGSA DAI integration aspectsSupported OGSA-DAI functionalities • Specific support: only a subset of OGSA-DAI functionalities are supported - higher level of usability, but restricted functionality • Generic support: full support for every OGSA-DAI functionality – could be more complex to use in specific use-cases Client integration level • Coupled:OGSA-DAI client becomes part of the workflow system • Decoupled: connection is provided via an interface through which the client can be invoked on the behalf of the system
The targeted OGSA-DAI integration Static Data staging Semi-dynamic Dynamic WF Editor Subject of integration Auxiliary Tool WF Engine OGSA-DAI integration aspects Port Level Request representation Node Level Specific Functionality support General Coupled Client integration level Decoupled
Implementation environment: P-GRADE Portal • Open source, general purpose, workflow-oriented computational Grid portal. Supports the development and execution of workflow-based Grid applications –a tool for Grid orchestration • Based on GridSphere-2 • Easy to expand with new portlets (e.g. application-specific portlets) • Easy to tailor to end-user needs • Developed by P-GRADE portal Alliance (lead by SZTAKI) • Grid services supported by the portal:
What is a P-GRADE Portal workflow? • A directed acyclic graph where: • Nodes represent jobs - either sequential or parallel programs • Ports represent input/output files the jobs expect/produce • Arcs represent file transfer between the jobs • Integration at required integration level:allow the submission of a general/specific OGSA-DAI command line client application to the Grid as a P-GRADE workflow node
How to submit the OGSA-DAI client to the Grid? • Direct submission is not feasible • Software dependencies • Complexity for the user • Requires an application repositoryintegrated to the workflow engine • GEMLCA: • An application repository extended with a job submitter • Open source – Globus incubator project • Deployment of a code in the GEMLCA repository means simply the creation of an XML-based description file (supported even from a portlet interface) • User can select previously deployed applications from the repository and run them with custom parameter values • GEMLCA is fully integrated to the P-GRADE workflow engine
OGSA-DAI integration through GEMLCA OGSA-DAI node Workflow OGSA-DAIservice Computationalresources GEMLCArepository ... submit OGSA-DAI client ... Database OGSA-DAI client The solution is generic as any workflow engine can be made capable to communicate with the GEMLCA service (GT4 based Grid service) Set custom parameter values
OGSA-DAI integration through GEMLCA • OGSA-DAI client applications supporting both OGSA-DAI 3.0 Axis (WSI) and GT (WSRF) deployed in the GEMLCA repository • Query client: to submit query statements to a given database exposed by an OGSA-DAI service • Update client: to submit update statements to a given database exposed by an OGSA-DAI service • Request document client: to execute general OGSA-DAI workflows represented as request documents (database query and update execution, data transfer, data transformation)
Using the query client Selecting Grid Setting OGSA-DAI service URL Selecting deployed OGSA-DAI client Setting Database Resource ID Selecting computational site Setting query file Log file Results in CSV file
An Application exampledeveloping a performance rating framework for UK hospitals - Health Care Modelling and Informatics Research Group UoW Executes the given OGSA-DAI query Generates sampler queries Analysis on the sample data Gathering results
So this is what we have achived Data Transfer Level Interoperation in P-GRADE Grid infrastructure Portal server GridFTP servers LOCAL INPUT FILES User levelstorage LOCAL INPUT FILES SRB servers REMOTE INPUTFILES LOCAL OUTPUT FILES REMOTE OUTPUTFILES LOCAL OUTPUT FILES Computing resources Data manipulation Input to workflows Output from workflows Workflow level Interoperation of local, GridFTP, SRM and SRB file catalogues and databases exposed by OGSA-DAI Control of remote input/output OGSA-DAI services EGEE Storage elements
How can the UK-e-Science community utilise the solution? • Deployed at production level in the NGS P-GRADE portal • portal URL: https://grid2-portal.cpc.wmin.ac.uk:8080 • Information page: http://ngs-portal.cpc.wmin.ac.uk • Please visit our next demonstration session on theNGS booth – Booth 13 Appleton tower • Wednesday 10-12
Any questions? Thank you for your attention … Email: kisst@wmin.ac.uk Website: www.cpc.wmin.ac.uk