1 / 7

Scientific Workflow systems: Summary and Opportunities for SEEK and e-Science

Scientific Workflow systems: Summary and Opportunities for SEEK and e-Science. Scientific workflow systems. Workflows are a way of documenting what has been done (provenance) Can be seen as their conceptual model of what needs to be done, need for more descriptive information in the process

oki
Download Presentation

Scientific Workflow systems: Summary and Opportunities for SEEK and e-Science

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scientific Workflow systems:Summary and Opportunities for SEEK and e-Science

  2. Scientific workflow systems • Workflows are a way of documenting what has been done (provenance) • Can be seen as their conceptual model of what needs to be done, need for more descriptive information in the process • Combine the conceptual view with the executable workflow • Go from napkin diagram to formal conceptual workflow to executable workflow • As important to design the workflow than to execute it • Documentation contributes to reproducibility of results because of the exact record a workflow creates • Annotation of usage history for workflows gives new users an idea of the quality, appropriateness, and reliability of the workflow for their own usage • Need to be able to get more information about the workflow than the WSDL provides • strong ties to semantic mediation, in terms of: • Integration, composition, discovery • User interface

  3. Distributed computing • Distributed computing with workflows • Good idea but the human cost of coordinating the system is still too high to be practical when ad-hoc analytical services are considered • Gains may be made by leveraging existing systems like Condor and Pegasus • Process flows could also demonstrate the benefits of infrastructure development to the domain scientists

  4. Models of computation • Models of computation • There’s an important point in them, but has as much to do with how you separate different scientific problems – I.e, does ecology have different needs than bioinformatics that is implicit in the discipline • Need much clearer ways of communicating about these models, and the need for different models may not ever arise • Partly driven by how you scope the domain of usefulness for a tool, for example if you’re handling just web services you’ll never need a continuous time model • User probably shouldn’t have to select the model of computation, especially for workflows that can only use one model

  5. Workflow languages • Workflow languages • Two separate languages: for designing the actors and the workflow • You can describe the workflow without understanding what each component does • Need another language to describe semantics of individual components (e.g. OWL-S, Web service model ontology (WSMO)) • Our current efforts focus on describing semantics of data flow, not processing • Simplest descriptions of components are name, can extend it over time with better and better approximations of a formal specification • Inputs and outputs alone doesn’t cut it • Mathematical description alone doesn’t cut it • Really need concept that constrains how the statistical approach is used • Mathematically simple models are rare in ecology, complex arbitrary designs are common and extremely difficult to describe • Until we learn how to represent models declaratively, we’ll never fully understand these complex models • Shared language: good idea, but all current languages incorporate references that can only be interpreted within one specific environment

  6. Collaboration opportunities • Shared workflow languages • Scufl/MoML/DPML/… • Shared work on semantic annotation of workflow components • Shared ontologies that cross domains • SEEK ontologies focus on ecology & environment • myGrid ontologies focus on molecular biology • Shared case study: conservation genetics • Incorporates data from multiple disciplines • Incorporates workflows, mediation, grid issues all in one issue • Ecoinformatics.org

  7. Acknowledgements This material is based upon work supported by the National Science Foundation under awards 0225676 for SEEK and 0225673 (AWSFL008-DS3) for GEON and by the Department of Energy under Contract No. DE-FC02-01ER25486 for SciDAC/SDM and by DARPA under Contract No. F33615-00-C-1703 for Ptolemy. Any opinions, findings and conclusions or recomendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation (NSF). The National Center for Ecological Analysis and Synthesis, a Center funded by NSF (Grant Number 0072909), the University of California, and the UC Santa Barbara campus. The Andrew W. Mellon Foundation. PBI Collaborators: NCEAS, University of New Mexico (Long Term Ecological Research Network Office), San Diego Supercomputer Center, University of Kansas (Center for Biodiversity Research) Kepler contributors: SEEK, Ptolemy II, SDM/SciDAC, GEON

More Related