1 / 25

Bertram Ludäscher Dept. of Computer Science, UC Davis UC Davis Genome Center

Semantic Mediation in SEEK/Kepler: Exploiting Semantic Annotation for Discovery, Analysis, and Integration of Scientific Data and Workflows. Bertram Ludäscher Dept. of Computer Science, UC Davis UC Davis Genome Center ludaesch @ uc davis .edu. Shawn Bowers UC Davis Genome Center

varen
Download Presentation

Bertram Ludäscher Dept. of Computer Science, UC Davis UC Davis Genome Center

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Semantic Mediation in SEEK/Kepler:Exploiting Semantic Annotation for Discovery, Analysis, and Integration of Scientific Data and Workflows Bertram Ludäscher Dept. of Computer Science, UC Davis UC Davis Genome Center ludaesch @ ucdavis.edu Shawn Bowers UC Davis Genome Center sbowers @ ucdavis.edu seek.ecoinformatics.org | kepler-project.org | www.sdsc.edu | dbis.ucdavis.edu | genomics.ucdavis.edu

  2. Science Environment for Ecological Knowledge SEEK is an NSF-funded, multidisciplinary research project to facilitate … Access to distributed ecological, environmental, and biodiversity data • Enable data sharing & reuse • Enhance data discovery at global scales Scalable analysis and synthesis • Taxonomic, spatial, temporal, conceptual integration of data, addressing data heterogeneity issues • Enable communication and collaboration for analysis • Enable reuse of analytical components • Support scientific workflow design and modeling

  3. SEEK data access, analysis, mediation Data Access (EcoGrid) • Distributed data network for environmental, ecological, and systematics data • Interoperate diverse environmental data systems Workflow Tools (Kepler) • Problem-solving environment for scientific data analysis and visualization  “scientific workflows” Semantic Mediation (SMS) • Leverage ontologies for “smart”data/component discovery and integration

  4. Managing Data Heterogeneity • Data comes from heterogeneous sources • Real-world observations • Spatial-temporal contexts • Collection/measurement protocols and procedures • Many representations for thesame information (count, area, density) • Data, Syntax, Schema, Semantic heterogeneity • Discovery and “synthesis” (integration) performed manually • Discovery often based on intuitive notion of “what is out there” • Synthesis of data is very time consuming, and limits use

  5. Scientific workflow systems support data analysis KEPLER

  6. A simple Kepler workflow Composite Component (Sub-workflow) Loops often used in SWFs; e.g., in genomics and bioinformatics (collections of data, nested data, statistical regressions, ...) (T. McPhillips)

  7. A simple Kepler workflow Lists Nexus filesto process (project) Reads text files Parses Nexus format Draws phylogenetic trees PhylipPars infers trees from discrete, multi-state characters. Workflow runs PhylipPars iteratively to discover all of the most parsimonious trees. UniqueTrees discards redundant trees in each collection. (T. McPhillips)

  8. A simple Kepler workflow An example workflow run, executed as a Dataflow Process Network

  9. SMS motivation • Scientific Workflow Life-cycle • Resource Discovery • discover relevant datasets • discover relevant actors or workflow templates • Workflow Design and Configuration • data  actor (data binding) • data  data (data integration / merging / interlinking) • actor  actor (actor / workflow composition) • Challenge: do all this in the presence of … • 100’s of workflows and templates • 1000’s of actors (e.g. actors for web services, data analytics, …) • 10,000’s of datasets • 1,000,000’s of data items • … highly complex, heterogeneous data – price to pay for these resources: $$$ (lots) –scientist’s time wasted: priceless!

  10. Approach & SMS capabilities Ontologies Iterative Development SemanticAnnotation Resource Discovery Workflow Validation Resource Integration Workflow Elaboration

  11. Approach & SMS capabilities • SEEK KR group is developing OWL-DL ontologies: • Various workflow-component ontologies (for categorizing by function, project, scientific discipline, …) • Scientific observation ontology (OBOE), an upper ontology for defining and relating observations, measurements, and units • Domain specific ontologies that extend OBOE (standard and derived units, ecology and biodiversity concepts, …) Ontologies Iterative Development SemanticAnnotation Resource Discovery Workflow Validation Resource Integration Workflow Elaboration

  12. Approach & SMS capabilities • Annotations “connect” resources to ontologies • Conceptually describe a resource and/or its “data schema” • Annotations provide the means for ontology-based discovery, integration, … Ontologies Iterative Development SemanticAnnotation Resource Discovery Workflow Validation Resource Integration Workflow Elaboration

  13. O : Observation  obsProperty.SpeciesOccurrence S : SpeciesData(site, day, spp, occ) O O Oout S S Sout Structural Types: Given a structural type language S • Datasets, inputs, and outputs can be assigned structural types S S Semantic Types: Given an ontology language O (e.g., OWL-DL) • Datasets, inputs, and outputs can be assigned ontology types O O   Oout  Oin Semantically compatiblebut structurally incompatible A1 A2 Sout Sin Semantic & structural types can be combined using logic constraints  := (site,day,sp,occ)SpeciesData(site, day, sp, occ) (y)Observation(y), obsProp(y, occ),SpeciesOccurrence(occ) “Hybrid” types … Semantic + Structural Typing

  14. Semantic Type Annotation in Kepler • Component input and output port annotation • Each port can be annotated with multiple classes from multiple ontologies • Annotations are stored within the component metadata

  15. Component Annotation and Indexing • Component Annotations • New components can be annotated and indexed into the component library (e.g., specializing generic actors) • Existing components can also be revised, annotated, and indexed (hiding previous versions)

  16. Approach & SMS capabilities • Ontology-based “smart” search • Find components by semantic types • Find components by input/output semantic types • Ontology-based query rewriting for discovery/integration • Joint work with GEON project (see SSDBM-04, SWDB-04) Ontologies Iterative Development SemanticAnnotation Resource Discovery Workflow Validation Resource Integration Workflow Elaboration

  17. Browse for Components Search for Component Name Search for Category / Keyword Smart Search Find a component (here: an actor) in different locations (“categories”) • … based on the semantic annotation of the component (or its ports)

  18. Searching in context • Search for components with compatible input/output semantic types • … searches over actor library • … applies subsumption checking on port annotations

  19. Approach & SMS capabilities • Workflow validation and analysis • Check that workflows are semantically & structurally well-typed • Infer semantic type annotations of derived data (ie, type inference) • An initial approach and prototype based on mapping composition (see QLQP-05) • User-oriented provenance • Collect & query data-lineage of WF runs (see IPAW-06) Ontologies Iterative Development SemanticAnnotation Resource Discovery Workflow Validation Resource Integration Workflow Elaboration

  20. Navigate errors and warnings within the workflow Search for and insert “adapters” to fix (structural and semantic) errors … Statically perform semantic and structural type checking Workflow validation in Kepler

  21. Approach & SMS capabilities • Integrating and transforming data • Merge (“smart union”) datasets • Find mappings between data schemas for transformation • data binding, component connections (see DILS-04) Ontologies Iterative Development SemanticAnnotation Resource Discovery Workflow Validation Resource Integration Workflow Elaboration

  22. Smart (Data) Integration: Merge • Discover data of interest • … connect to merge actor • … “compute merge” • align attributes via annotations • open dialog for user refinement • store merge mapping in MOML • … enjoy! • … your merged dataset • almost, can be much more complicated

  23. Biomass Site Site Biomass a1 a2 a3 a4 a 5 10 b 6 11 a1 a3 a4 a 5.0 10 b 6.0 11 a 0.1 c 0.2 d 0.3 Merge Result a5 a6 a7 a8 0.1 a 0.2 c 0.3 d Under the hood of “Smart Merge” … • Exploits semantic type annotations and ontology definitions to find mappings between sources • Executing the merge actor results in an integrated data product (via “outer union”) a1 a3 a1a8 a4 a3a6 Merge a6 a4 a8

  24. Approach & SMS capabilities • Workflow design support • (Semi-) automatically combine resource discovery, integration, and validation • Abstract  Executable WF • … ongoing work! Ontologies Iterative Development SemanticAnnotation Resource Discovery Workflow Validation Resource Integration Workflow Elaboration Automated SWF Refinement

  25. Summary • Outlook: • Ontologies and semantic anotations for WF design & reuse • Put ontologies to actual use in Kepler • Continue to develop Kepler tools for annotation (KR observation ontology), discovery, integration, design, … • Issues & Challenges: • Tools/approaches for ontology (OWL) management, organization, reasoning • Open source (distributed) ontology (OWL) storage and reasoning • Tools and techniques for robust ontology versioning, and extension • Acknowledgements • Timothy McPhillips, Dave Thau (UC Davis) • Mark Schildhauer, Josh Madin, Matt Jones (UCSB) • Deana Pennington (UNM) • Rich Williams (Microsoft Research) • Ferdinando Villa, Sergey Krivov (UVM)

More Related