1 / 45

GEON IT Advances: ⁃ Data Integration ⁃ GEON Workbench ⁃ Scientific Workflows

Explore GEON's advanced data integration capabilities and the GEON Workbench. Overcome challenges in heterogeneous data formats, systems integration, and semantic mediation. Discover how GEON's ontology-enhanced tools facilitate scientific problem-solving and analysis pipelines.

perezleslie
Download Presentation

GEON IT Advances: ⁃ Data Integration ⁃ GEON Workbench ⁃ Scientific Workflows

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GEON IT Advances:⁃ Data Integration⁃ GEON Workbench⁃ Scientific Workflows Bertram Ludäscher Kai Lin Ilkay Altintas Efrat Jaeger San Diego Supercomputer Center University of California, San Diego www.geongrid.org

  2. The Problem: Scientific Data Integrationor: … from Questions to Queries … www.geongrid.org

  3. Information Integration Challenges: S4 Heterogeneities • Systems Integration • platforms, devices, data & service distribution, APIs, protocols, …  Grid middleware technologies + e.g. single sign-on, platform independence, transparent use of remote resources, … • Syntax & Structure • heterogeneous data formats (one for each tool ...) • heterogeneous data models (RDBs, ORDBs, OODBs, XMLDBs, flat files, …) • heterogeneous schemas(one for each DB ...)  Database mediation technologies + XML-based data exchange, integrated views, transparent query rewriting, … • Semantics • fuzzy metadata, terminology, “hidden” semantics, implicit assumptions, …  Knowledge representation & semantic mediation technologies + “smart” data discovery & integration + e.g. ask about X (‘mafic’); find data about Y (‘diorite’); be happy anyways! www.geongrid.org

  4. Information Integration Challenges: S5 Heterogeneities • Synthesis of analysis pipelines, integrated apps & data products, … • How to make use of these wonderful things & put them together to solve a scientist’s problem? • Scientific Problem Solving Environments • GEON Portal and Workbench (“scientist’s view”) + ontology-enhanced data registration, discovery, manipulation + creation and registration of new data products from existing ones, … • GEON Scientific Workflow System (“engineer’s view”) + for designing, re-engineering, deploying analysis pipelines and scientific workflows; a tool to make new tools … + e.g., creation of new datasets from existing ones, dataset registration,… www.geongrid.org

  5. domain knowledge Knowledge representation AGE ONTOLOGY Show formations where AGE = ‘Paleozic’ (without age ontology) Show formations where AGE = ‘Paleozic’ (with age ontology) Nevada +/- a few hundred million years Ontology-Enabled Application Example:Geologic Map Integration www.geongrid.org

  6. Querying by Geologic Age … www.geongrid.org

  7. Querying by Geologic Age: Result www.geongrid.org

  8. Querying by Chemical Composition … (GSC) www.geongrid.org

  9. Querying by Chemical Composition: Results Note the fine differences in shades of gray: DO know: It’s NOT there! DON’T know! (not registered) OK – we got to work on the color coding ;-) www.geongrid.org

  10. Querying w/ British Rock Classification (BRC) Uses a GSC  BRC inter-ontology articulation mapping www.geongrid.org

  11. British Rock Classification Query: Results Uses a GSC  BRC inter-ontology articulation mapping www.geongrid.org

  12. but first: what states are we looking at? The Query: Show sedimentary rocksThe Puzzle: Find the 17 differences in the results… www.geongrid.org

  13. Sedimentary Rocks: BGS Ontology www.geongrid.org

  14. Sedimentary Rocks: GSC Ontology www.geongrid.org

  15. DataTables GeoChemDB GeolAgeDB Need for Knowledge-enabled Integration • A geologist analyzing chemical data from a pluton finds no recognizable correlation between variables. • What possible scenarios can he examine to understand this heterogeneity? • Measured ages also show a scatter • What is the significance of the observed spread in measure time? • Knowledge Representation Research: • concept maps & ontologies • process maps & ontologies • semantic types • … to facilitate (even) “smarter” tools www.geongrid.org

  16. A Prerequisite: Resource Registration (1a) Register ontologies • geologic age; rock classifications (GSC, BGS), seismology; … (1b) optionally: register inter-ontology articulations • e.g. GSC ontology  BGS ontology (2a) Item-level dataset registration • ADN metadata; other controlled vocabularies & ontologies (e.g. geologic age timescale (USGS), SWEET (NASA), …) (2b) Item-detail registration • e.g. associate values in a column with a concept (3) Use ontology-based query UI / application • e.g. query by geologic age and chemical composition www.geongrid.org

  17. Demonstration Preview NOTE: A technology demonstration, not a content demonstration (vocabulary, ontology, maps, …) • Ontology Registration (geologicAge.owl) • Dataset Registration (myShapeFiles.zip) • Item-Level Association (12) • GEONsearch • metadata, spatial, temporal, concept-based • GEONworkbench • use of workspace e.g. composing new maps from existing ones … resume with GEON workflow overview www.geongrid.org

  18. Other distributed apps Kepler, DLESE, … Client Access (via web services) myOntology.owl myDataset.foo Search condition(s) spatialtemporalconcept User actions add delete manipulate metadata metadata ResourceRegistration GEONsearch GEONworkbench GEON Workspace (user) GEON Catalog SRB Log GEONmiddleware external services Gazetteer, DLESE, … Geologic Age, Chronos, … Demonstration Preview User Access (via Portal) www.geongrid.org

  19. Domain Knowledge Ontologies Arizona www.geongrid.org CYBERINFRASTRUCTURE FOR THE GEOSCIENCES 19 Dataset to Ontology Registration (Item-level) www.geongrid.org

  20. GEON Search: Concept-based Querying  Portal Demonstration www.geongrid.org

  21. Scientific Problem Solving Environments • GEON Portal and Workbench (“scientist’s view”)  previousdemonstration • a workbench for using existing/integrated tools • Kepler Workflow System (“engineer’s view”) • for (semi-)automating “scientific workflows” and “analysis pipelines” • a tool for making and deploying new tools • some features: • … low-level plumbing to high-level conceptual flows … • connect reusable components (“actors”, “boxes”) to form apps • abstraction via nesting of subworkflows into composite actors • deploy automated workflows on the Grid and/or with custom Uis • demonstrations available (“Kepler2Go-1.” CD for Summer Institute) www.geongrid.org

  22. www.geongrid.org CYBERINFRASTRUCTURE FOR THE GEOSCIENCES 22 A Kepler Scientific Workflow inline documentation canvas for design and execution monitoring component (actor) libraries www.geongrid.org

  23. Translating query xml response to web service xml input format. worldImage XML SOAP response Look Inside Sample GEON Dataset Extraction & Processing www.geongrid.org

  24. Annotation form www.geongrid.org CYBERINFRASTRUCTURE FOR THE GEOSCIENCES 24 GEON Dataset Registration www.geongrid.org

  25. ADN metadata Metadata display validation www.geongrid.org CYBERINFRASTRUCTURE FOR THE GEOSCIENCES 25 GEON Dataset Registration Registering www.geongrid.org

  26. www.geongrid.org CYBERINFRASTRUCTURE FOR THE GEOSCIENCES 26 Putting it all together … www.geongrid.org

  27. HPC workflow www.geongrid.org CYBERINFRASTRUCTURE FOR THE GEOSCIENCES 27 GEON Workflows & KEPLER http://kepler-project.org www.geongrid.org

  28. Using Kepler for Geological Data Integration Workflows Ilkay Altintas presenting joint GEON work of: Efrat Jaeger Bertram Ludäscher Kai Lin Ashraf Memon San Diego Supercomputer Center University of California, San Diego www.geongrid.org

  29. Some Requirements for a Scientific Workflow System (1/2) • …it should work… (No kidding!) USER REQUIREMENTS: • Design tools-- especially for non-expert users • Ease of use-- fairly simple user interface having more complex features hidden in the background • Reusable generic features • Generic enough to serve to different communities but specific enough to serve one domain (e.g. geosciences) • Extensibility for the expert user-- almost a visual programming interface • Registration and publication of data products and “process products” (=workflows); provenance www.geongrid.org

  30. Some Requirements for a Scientific Workflow System (2/2) TECHNICAL REQUIREMENTS: • Error detection and recovery from failure • Logging information for each workflow • Allow data-intensive and compute-intensive tasks (Maybe at the same time) • HPC+X (From Dr. Berman’s last GSM talk) • Allow status checks and on the fly updates • Visualization… • Semantics and metadata… • Certification, trust, security… Ask the experts in this room  www.geongrid.org

  31. Kepler is… • … a scientific workflow system • … a cross-project collaboration New contributing partners: • Cheminformatics: Resurgence (Kim Baldridge et al.) • Life Sciences: EOL (Mark Miller et al.) • Data Mining: SKIDL (Tony Fountain et al.) • Neuroinformatics: BIRN (coming…) • … an emerging open source tool for “scientific discovery workflows” Kepler 1.0 alpha release  Summer Institute www.geongrid.org www.geongrid.org CYBERINFRASTRUCTURE FOR THE GEOSCIENCES 31

  32. Some Recent Actor Additions Browser-based user interface Queries & Transformations Generic WS Invocation SQL Queries File Transfer SMTP-based messaging SRB Access CommandLine Execution Globus Job Execution Real-time data streaming www.geongrid.org

  33. Web Services  Actors (WS Harvester) 1 2 4 3 ”Minute-made” (MM) WS-based application integration • Similarly: MM workflow design & sharing w/o implemented components www.geongrid.org

  34. GEON Contributions to Kepler • System demonstration • Using Kepler Features • GEON workflows in detail • Dataset Registration Model • Processing Datasets on the Fly and Registering with the GEONworkbench www.geongrid.org

  35. Conclusions • Evolving system – GEON is a significant contributor • Plans for new generic and project-specific extensions • Second alpha release available as CD • Installers for Windows, Linux, MacOSX • Daily version tests and JWS installer generation • User manuals and developer documentation is coming soon! • More: next week during the Summer Institute … Kepler project website: http://kepler-project.org Thanks! www.geongrid.org

  36. E N D GEON IT Advances:⁃ Data Integration⁃ GEON Workbench⁃ Scientific Workflows Bertram Ludäscher Kai Lin Ilkay Altintas Efrat Jaeger San Diego Supercomputer Center UC San Diego www.geongrid.org

  37. Related Publications • Semantic Data Registration and Integration • On Integrating Scientific Resources through Semantic Registration, S. Bowers, K. Lin, and B. Ludäscher, 16th International Conference on Scientific and Statistical Database Management (SSDBM'04), 21-23 June 2004, Santorini Island, Greece. • A System for Semantic Integration of Geologic Maps via Ontologies, K. Lin and B. Ludäscher. In Semantic Web Technologies for Searching and Retrieving Scientific Data (SCISW), Sanibel Island, Florida, 2003. • Towards a Generic Framework for Semantic Registration of Scientific Data, S. Bowers and B. Ludäscher. In Semantic Web Technologies for Searching and Retrieving Scientific Data (SCISW), Sanibel Island, Florida, 2003. • The Role of XML in Mediated Data Integration Systems with Examples from Geological (Map) Data Interoperability, B. Brodaric, B. Ludäscher, and K. Lin. In Geological Society of America (GSA) Annual Meeting, volume 35(6), November 2003. • Semantic Mediation Services in Geologic Data Integration: A Case Study from the GEON Grid, K. Lin, B. Ludäscher, B. Brodaric, D. Seber, C. Baru, and K. A. Sinha. In Geological Society of America (GSA) Annual Meeting, volume 35(6), November 2003. • Query Planning and Rewriting • Processing First-Order Queries under Limited Access Patterns, Alan Nash and B. Ludäscher, Proc. 23rd ACM Symposium on Principles of Database Systems (PODS'04) Paris, France, June 2004. • Processing Unions of Conjunctive Queries with Negation under Limited Access Patterns, Alan Nash and B. Ludäscher., 9th Intl. Conference on Extending Database Technology (EDBT'04) Heraklion, Crete, Greece, March 2004, LNCS 2992. • Web Service Composition Through Declarative Queries: The Case of Conjunctive Queries with Union and Negation, B. Ludäscher and Alan Nash. Research abstract (poster), 20th Intl. Conference on Data Engineering (ICDE'04) Boston, IEEE Computer Society, April 2004. www.geongrid.org

  38. Related Publications • Scientific Workflows • Kepler: An Extensible System for Design and Execution of Scientific Workflows, I. Altintas, C. Berkley, E. Jaeger, M. Jones, B. Ludäscher, S. Mock, 16th International Conference on Scientific and Statistical Database Management (SSDBM'04), 21-23 June 2004, Santorini Island, Greece. • Kepler: Towards a Grid-Enabled System for Scientific Workflows, Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher, Steve Mock, Workflow in Grid Systems (GGF10), Berlin, March 9th, 2004. • An Ontology-Driven Framework for Data Transformation in Scientific Workflows, S. Bowers and B. Ludäscher, Intl. Workshop on Data Integration in the Life Sciences (DILS'04), March 25-26, 2004 Leipzig, Germany, LNCS 2994. • A Web Service Composition and Deployment Framework for Scientific Workflows, I. Altintas, E. Jaeger, K. Lin, B. Ludaescher, A. Memon, In the 2nd Intl. Conference on Web Services (ICWS), San Diego, California, July 2004. www.geongrid.org

  39. Additional Material (for questions etc) www.geongrid.org

  40. Multi-Hierarchical Rock Classification System (GSC)… a target ontology (after conversion to OWL) for geologic map registration … Genesis Fabric Composition Texture www.geongrid.org

  41. Inside Ontology-Enabled Map Integration User: “Show formations from Cenozoic!” Age Ontology Cenozoic Query Rewriting Quaternary Tertiary select FORMATION where AGE=“Tertiary” or AGE=“Quaternary” PERIOD FORMATION PERIOD LITHOLOGY ABBREV Arizona Montana West Map Rendering Color Definition www.geongrid.org

  42. Data Source Wrapping and Integration ABBREV Arizona PERIOD FORMATION AGE Idaho NAME Colorado PERIOD LITHOLOGY Utah TYPE PERIOD Nevada FMATN TIME_UNIT Wyoming NAME Livingston formation FORMATION PERIOD Tertiary-Cretaceous Montana West AGE New Mexico NAME PERIOD LITHOLOGY andesitic sandstone Montana East FORMATION PERIOD www.geongrid.org

  43. Gravity Modeling Design Workflow • Idea: Comparing observed & synthetic gravity models • Steps: • Extracting and merging gravity depths from heterogeneous data sources for a Lat/Lon bounding box (databases, web services). • Projecting and interpolating data sources into the same coordinate systems. • Differencing observed and synthetic models. • Displaying Differential raster image. www.geongrid.org

  44. Grid Interpolation • Interpolating queried gravity data on the grid and displaying it using a color schema. • Currently IDW interpolation algorithm supported. Future plans: Minimum Curvature, TIN, Kriging and Spline. • Output: either ascii x,y,z,p or ESRI ascii grid format. • Display: using global mapper service. www.geongrid.org

  45. Gravity Modeling Design Workflow www.geongrid.org

More Related