1 / 19

Scientific Workflows

Scientific Workflows. Deana Pennington, PhD University of New Mexico LTER Network Office, Sevilleta LTER PI CI-Team: Advancing CI-Based Science through Education, Training, and Mentoring of Science Communities CoPI Science Environment for Ecological Knowledge (SEEK) project July 10, 2007.

madelyn
Download Presentation

Scientific Workflows

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scientific Workflows Deana Pennington, PhD University of New Mexico LTER Network Office, Sevilleta LTER PI CI-Team: Advancing CI-Based Science through Education, Training, and Mentoring of Science Communities CoPI Science Environment for Ecological Knowledge (SEEK) project July 10, 2007

  2. Scientific Workflows Knowledge- intensive Human cognition Ontologies Semantic query Theory Data-intensive Analyses Data mining High Performance Cmp Bio-inspired algs. Sci Visualization Inductive, Descriptive Statistics Web Dessimination E Journals Dynamic websites Info Visualization Query Data Management Data models Metadata Storage Conduct Analyses Deductive, Prescriptive Mechanistic Conceptual Model Assumptions Idealizations Simplification Collect Data Research Design Results Hypothesis Generation Informatics and the Research Cycle

  3. Workflows: Process Support Scientific Workflow Systems Analytical Component Analytical Component Data Analytical Component Data Business Workflow Systems Files Files

  4. Scientific Workflow Systems Input Data Site 1 Site 3 Site 4 Input Data Native functionality Site 2 Integration => Transformations SEEK: Kepler Workflow System Derived Data Analytical Component Analytical Component Analytical Component Data Data Derived Data • Goals: • Visual modeling of end-to-end analytical process • Discovery of distributed data and analytical components • Easy incorporation of distributed data/components • Automated transformation between heterogeneous data/components

  5. Goals: • Visual modeling of end-to-end analytical process • Discovery of distributed data and analytical components • Easy incorporation of distributed data/components • Automated transformation between heterogeneous data/components • Not linear • Involve multiple data sets • Involve multiple analytical steps

  6. Nested workflows SW0 ASx TS1 ASy ASz TS2 ASr Image Processing Pipeline Signal Processing Pipeline Integrated Field Data Search for relevant data and analyses (Query) TS2 ASr Ground Sensors Imagery

  7. Goals: • Visual modeling of end-to-end analytical process • Discovery of distributed data and analytical components • Easy incorporation of distributed data/components • Automated transformation between heterogeneous data/components • Scripts Single platform • Visual modeling Single environment environment • Workflows: • Cross-platform • Cross-environment • Distributed data & analyses

  8. Scientific Workflows Workflow archive Compute grid Data grid Shared Data Registry Algorithm Web Service WSDL Data Site 1 Service Broker (UDDI) Metadata Simulation Model Data Site 2 Get Data Query Data Grid to find data Query Service broker to find services Archive output data to Grid Archive workflow Return URL Return URL & call functions Get Component SEEK: EcoGrid => Kepler: EarthGrid

  9. Goals: • Visual modeling of end-to-end analytical process • Discovery of distributed data and analytical components • Easy incorporation of distributed data/components • Automated transformation between heterogeneous data/components Generally speaking, an ontology • specifies a conceptual model by … • defining and relating … • generic concepts representing features of the real or abstract world (within a domain of interest)

  10. Ontologies Ontology: river use concepts from (explicitly or implicitly) Informal Conceptual Model: stream Informal Conceptual Model: tributary Design Artifact Schema: STR Schema: STRM Schema: TRB Schema:ABC  Metadata Data An ontology can then be used as a standard that supports exchange and integration of heterogeneous data sources and applications

  11. SEEK’s Observation Ontology (OBOE) Characteristic Entity Standard Value Measurement Observation Ontologies: Entity, Characteristic, and Standards Limited functionality in Kepler currently (more coming!)

  12. Scientists design their research at the conceptual workflow level • Often done on the fly over the period of time the research is being conducted • For automated approaches, this must be well thought out from the beginning • HOWEVER, because of the automation it is easy to modify the analysis and rerun it many times, so you are not locked into the original design

  13. Productivity Example Biomass Conceptual Workflow Merge Model Predict Climate Temp Abstract Workflow Soil Executable Workflow DS DS AS AS AS AS “View1”: Excel GIS SAS GIS DS “View2”: VBScript R Script GA R DS Data Step DS TS AS DS Analysis Step DS TS AS TS AS TS AS AS Transformation Step TS DS DS TS DS Dessimination Mental Model Biomass == f ( Temp Soil Et al. C Concept

  14. +A2 +A3 +A1 Ecological niche modeling conceptual workflow Test sample Species pres. & abs. points Species pres. & abs. points Model quality parameters EcoGrid DataBase Training sample GARP rule set Transformation Data Calculation Sample Data EcoGrid Query Validation EcoGrid DataBase GARP rule set Integrated layers Native range prediction map User Map Generation Env. layers Integrated layers Selected prediction maps EcoGrid DataBase EcoGrid Query Layer Integration Scaling EcoGrid DataBase Archive To Ecogrid Generate Metadata

  15. +A2 +A3 +A1 Ecological niche modeling conceptual workflow Test sample Species pres. & abs. points Species pres. & abs. points Model quality parameters EcoGrid DataBase Training sample GARP rule set Transformation Data Calculation Sample Data EcoGrid Query Validation EcoGrid DataBase GARP rule set Integrated layers Native range prediction map User Map Generation Env. layers Integrated layers Selected prediction maps EcoGrid DataBase EcoGrid Query Layer Integration Scaling EcoGrid DataBase Archive To Ecogrid Generate Metadata Spatial location Temporal extent

  16. Generic Workflow +A3 +A2 +A1 Occurrence Data Binary, Categorical or Numeric Test sample Model quality parameters EcoGrid DataBase Training sample GARP rule set PhysicalTransformation Data Calculation Sample Data EcoGrid Query Validation EcoGrid DataBase GARP rule set Integrated layers Prediction map User Map Generation Environmental layers Integrated layers Selected prediction maps EcoGrid DataBase EcoGrid Query Layer Integration Scaling EcoGrid DataBase Archive To Ecogrid Generate Metadata

  17. Temperature Interpolation Workflow +A3 +A2 +A1 Weather station temperature data Test sample Model quality parameters EcoGrid DataBase Training sample GARP rule set PhysicalTransformation Data Calculation Sample Data EcoGrid Query Validation EcoGrid DataBase GARP rule set Integrated layers Environmental layers: elevation, aspect, land cover Prediction map: Interpolated temperature grid User Map Generation Integrated layers Selected prediction maps EcoGrid DataBase EcoGrid Query Layer Integration Scaling EcoGrid DataBase Archive To Ecogrid Generate Metadata

  18. Sinkhole Interpolation Workflow +A3 +A2 +A1 Sinkhole occurrence Test sample Model quality parameters EcoGrid DataBase Training sample GARP rule set PhysicalTransformation Data Calculation Sample Data EcoGrid Query Validation EcoGrid DataBase GARP rule set Integrated layers Environmental layers: Groundwater level, chemistry, etc Prediction map: Sinkhole distribution User Map Generation Integrated layers Selected prediction maps EcoGrid DataBase EcoGrid Query Layer Integration Scaling EcoGrid DataBase Archive To Ecogrid Generate Metadata

  19. Current Benefits • Reusable analysis steps, pipelines, and workflows • Formal documentation of methods • Reproducibility of methods • Visual creation and communication of methods • Versioning

More Related