330 likes | 427 Views
Design and Execution of Scientific Workflows using Web Services. Ilkay Altintas Ashraf Memon Bertram Ludaescher San Diego Supercomputer Center University of California San Diego. Outline. Introduction & Overview – Bertram The Kepler System – Ilkay
E N D
Design and Execution of Scientific Workflows using Web Services Ilkay Altintas Ashraf Memon Bertram Ludaescher San Diego Supercomputer Center University of California San Diego
Outline • Introduction & Overview – Bertram • The Kepler System – Ilkay • Demonstration: Geologic Map Integration – Ashraf • From Web services to Grid services (and back ;-) – Karan SDSIC 01/29/2004
The Scientific Workflow (SWF) “Business” • In silico science: • from the wet lab to the information managers, analysts, data miners, … • commercially really a big business • Scientific Workflows – Goals: • simplify and automate data management & analysis for scientists • support “knowledge discovery workflows” • Scientific Workflows – Aspects: • Capture (reverse-engineer) existing SWFs • legacy SWFs: hard-wired, hard to reuse, maintain, change, … • Design new SWFs: • reuse components and SWFs • needs: intuitive modeling paradigm, clear component interaction semantics, … • Debug SWFS (test, simulate, validate, verify, …) • Operate SWFs (deploy, execute, monitor, steer, archive, re-run, …) SDSIC 01/29/2004
Scientific Workflows: Tools • Scientific Workflows – Aspects: • Data Integration • Process Integration • Application/tools Integration • Different tools and angles: • PSEs (Problem Solving Environments): SciRUN, … (quite a few) • LIMS (Laboratory Information Management Systems): … (many) • Workflow systems: … (very many) • Signal processing and dataflow systems (AVS, Khoros, Ptolemy, …) • Scientific workflow systems (DiscoveryNet/InforSense, PipelinePilot/SciTegic, … Triana, Taverna, …, Kepler, …) • often dataflow oriented (but some workflow aspects too) SDSIC 01/29/2004
Web Services and Scientific Workflows in Kepler • Web services = individual components (“actors”) • “Minute-Made” Application Integration: • Plugging-in and harvesting web service components is easy and fast • Rich SWF modeling semantics (“directors” and more): • Different and precise dataflow models of computation • Clear and composable component interaction semantics Web service composition and application integration tool • Coming soon: • Shrinked wrapped, pre-packaged “Kepler-to-Go” (v0.8) • SWFs with structural and semantic data types (better design support) • Grid-enabled web services (for big data, big computations,…) • Different deployment models (SWF WS, web site, applet, …) SDSIC 01/29/2004
Genomics: Promoter Identification Workflow Source: Matt Coleman (LLNL) SDSIC 01/29/2004
Archive To Ecogrid Registered Ecogrid Database Registered Ecogrid Database Registered Ecogrid Database Registered Ecogrid Database Test sample (d) Species presence & absence points (native range) (a) Native range prediction map (f) Training sample (d) GARP rule set (e) Data Calculation Map Generation Map Generation EcoGrid Query EcoGrid Query Validation User Validation Sample Data +A2 +A3 Model quality parameter (g) Generate Metadata Integrated layers (native range) (c) Layer Integration Layer Integration +A1 Environmental layers (native range) (b) Invasion area prediction map (f) Selected prediction maps (h) Model quality parameter (g) Integrated layers (invasion area) (c) Environmental layers (invasion area) (b) Species presence &absence points (invasion area) (a) Ecology: GARP Analysis Pipeline forInvasive Species Prediction Source: NSF SEEK (Deana Pennington et. al, UNM) SDSIC 01/29/2004
Source: NIH BIRN (Jeffrey Grethe, UCSD) SDSIC 01/29/2004
Ilkay Altintas SDM Chad Berkley SEEK Shawn Bowers SEEK Jeffrey Grethe BIRN Christopher H. Brooks Ptolemy II Zhengang Cheng SDM Efrat Jaeger GEON Matt Jones SEEK Edward A. Lee Ptolemy II Kai Lin GEON Ashraf Memon GEON Bertram Ludaescher BIRN, GEON, SDM, SEEK Steve Mock NMI Steve Neuendorffer Ptolemy II Mladen Vouk SDM Yang Zhao Ptolemy II … Kepler Team, Projects, Sponsors Ptolemy II SDSIC 01/29/2004
Collaboration of various projects http://kepler.ecoinformatics.org The KEPLER Systemfor Scientific Workflows … • A framework for design, execution and deployment of scientific workflows • Caters specifically to the domain scientist • Builds on Ptolemy II (next slide... :-) SDSIC 01/29/2004
… based on Ptolemy II • A set of Java packages for heterogeneous, concurrent modeling, design and execution. • Strengths include: • Precisely defined models of computation and component interaction • e.g. Process Networks (PN) – data-flow oriented • An intuitive GUI that lets rapid workflow composition • A modular, reusable and extendable object-oriented environment • An XML based workflow definition – MoML • Workflows defined in Ptolemy II MoML XML schema • Easily exchangable SDSIC 01/29/2004
KEPLER Core Capabilities (1/2) • Capturing scientific workflows • Accessing available workflows through the Grid • Designing scientific workflows • Composition of actors (tasks) to perform a scientific WF • Actor prototyping • Accessing heterogeneous data • Data access wizard to search and retrieve Grid-based resources • Relational DB access and query • Ability to link to EML data sources SDSIC 01/29/2004
KEPLER Core Capabilities (2/2) • Data transformation actors to link heterogeneous data • Executing scientific workflows • Distributed and/or local computation • Various models for computational semantics and scheduling • SDF and PN: Most common for scientific workflows • External computing environments: • C++, Python, C (… Perl--planned ...) • Deploying scientific tasks and workflows as web services (… planned …) SDSIC 01/29/2004
Drag and drop utilities, director and actor libraries. The KEPLER GUI (Vergil) SDSIC 01/29/2004
Running the workflow SDSIC 01/29/2004
Distributed SWFs in KEPLER • Web and Grid Service plug-ins • WSDL, GWSDL • ProxyInit, GlobusGridJob, GridFTP, DataAccessWizard • WS Harvester • Imports all the operations of a specific WS (or of all the WSs in a UDDI repository) as Kepler actors • WS-deployment interface (…ongoing work…) • XSLT and XQuery transformers to link non-fitting services together SDSIC 01/29/2004
Configure - select service operation A Generic Web Service Actor • Given a WSDL and the name of an operation of a web service, dynamically customizes itself to implement and execute that method. SDSIC 01/29/2004
Set Parameters and Commit Set parameters and commit SDSIC 01/29/2004
WS Actor after Instantiation SDSIC 01/29/2004
Web Service Harvester • Imports the web services in a repository into the actor library. • Has the capability to search for web services based on a keyword. SDSIC 01/29/2004
Output of previous web service Composing 3rd-Party WSs Input of next web service User interaction & Transformations SDSIC 01/29/2004
More information… • Recent changes in the WS and Grid standards • Changes in the future expected based on the changes on the standards. • Focus for this talk: web service-based components of Kepler. For more info on other Kepler components: • http://kepler.ecoinformatics.org • http://kbis.sdsc.edu/SciDAC-SDM/ • http://ptolemy.eecs.berkeley.edu/ptolemyII/ • http://seek.ecoinformatics.org SDSIC 01/29/2004
What’s next? • Ashraf Memon • GEON Geological Map Information Integration • Conceptual Workflow • WS-based Architecture and Design in Kepler • DEMO in Kepler • Karan Bhatia • Grid standards and their relations to web services • OGSI, OGSA, GWSDL, etc. • Informal discussion on WSRF SDSIC 01/29/2004
Problem Description • Geologic Map Information Integration (GMMI) • Integration of Heterogeneous Geological Datasets • Data sets • State geology map datasets (rocky mountain area) • State boundaries and coast lines. SDSIC 01/29/2004
Heterogeneities • System • Use Different operating systems to store and process the data, vendor databases. • Representational • Different Formats (shape files, BLOB, binary, spatial data objects etc.). • Structural • Different schema (table) structures. SDSIC 01/29/2004
Heterogeneities • Syntactic • Different Query Languages (SQL, Spatial SQL, XQuery etc.) • Semantic • Use of different concept maps by different state for storing the data values. • Example, use of term “Holocene”, “Pleistocene”, that are the sub-periods of “Quarternary” period which in the geologic age hierarchy, others unknown about the finer details about the geology would refer to its subdivisions (“Quarternary”). SDSIC 01/29/2004
Using Web Services SDSIC 01/29/2004
Continued… Ontology Legend Generator Map Assembler … Web Service FOR MAP INTEGRATION ArcIMS and WMS Services wrapped in WSDL/SOAP SDSIC 01/29/2004
GMMI WF Designed in Kepler SDSIC 01/29/2004
DataMapper Sub-Workflow SDSIC 01/29/2004
The result in a BrowserDisplay SDSIC 01/29/2004
Kepler … is a community-based, cross-project, open source collaboration uses web services as basic building blocks has a joint CVS repository, mailing lists, web site, … is gaining momentum thanks to contributors and contributions BSD-style license allows commercial spin-offs a pre-packaged, shrink-wrapped version (“Kepler-to-GO”) coming soon to a place near you… KEPLER and You SDSIC 01/29/2004
From Web Services to Grid Serivces … and back! Source: Ian Foster’s GlobusWORLD keynote talk SDSIC 01/29/2004