1 / 15

Intelligent Distributed Data Management in Earth system science

Intelligent Distributed Data Management in Earth system science. K. Ronneberger, DKRZ, Germany S. Kindermann, DKRZ, Germany T. Brücher, University of Cologne, Germany H. Ramthun, M&D, Germany M. Stockhause, MPI-Met, IFM-Geomar, Germany. QFLUX: Humidity flux calculation. Structure.

gwyn
Download Presentation

Intelligent Distributed Data Management in Earth system science

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Intelligent Distributed Data Management in Earth system science K. Ronneberger, DKRZ, Germany S. Kindermann, DKRZ, Germany T. Brücher, University of Cologne, Germany H. Ramthun, M&D, Germany M. Stockhause, MPI-Met, IFM-Geomar, Germany

  2. QFLUX: Humidity flux calculation 1st EU-Review May 15.-16 2007

  3. Structure • What is Earthsystem Science about? • Typical workflows • Traditional infrastructure • Why can grid-technology help? • Limits of the current practice • Outline of possible and existing use areas • How do we use this technology? • Conceptual Outline of the developing infrastructure • Demo of an example workflow • Potential impact and vision • Next steps and challenges 1st EU-Review May 15.-16 2007

  4. Typical workflow Distributed Climate Data Model Data Scenario data Observation Data 1 Find & Select Data description 2 Collect & Prepare Analysis Dataset 3 Analyse Result Dataset 4 Visualize Earthsystem Sciences • Goal: learn about the past, the present, and possible futures of the earth system • Community: internationally and interdisciplinary distributed but strongly interconnected • Method: Analysing, comparing and processing data • Input: data from observations and/or other modelling studies 1st EU-Review May 15.-16 2007

  5. An example workflow: “qflux” Location Various data centers & portals Institutional storage & computing facilities local facilities Personal Computer Datavolume Several PB ~3,1TB (300-500 files) ~10,3GB (28 files) ~76 MB ~6MB ~66KB 1 Find & Select relevant & available datasets Temperature Specific humidity Distributed Climate Data Wind speed 2 Collect & Prepare a temporal and spatial subset of the data Analysis Dataset 3 Analyse the integrated, transport of humidity between selected levels Result Dataset Visualize selected result 4 1st EU-Review May 15.-16 2007

  6. Potential use of grid technology • Search & select • Different portalswith differentauthenticationsand datadescriptions • Collect & prepare • Different access mechanisms of thedifferentproviders • Pre-processing requires sufficient local facilities • Analyse • Existing tools and already processed data are available locally and miss proper description • Visualize • Detached from the remaining workflow Current issues Central unique authentication to a commoncatalogue with standardizedmetadata Shared resources with standardized access hiding proprietary access mechanisms Commonly defined tool description Log processing steps and automatically republish processed data Integrate basic visualization (first peep) into the workflow 1st EU-Review May 15.-16 2007

  7. C3 Grid and EGEE - the components Find & select • Central web-portal:unique entrance point to common central metadata catalogue (Lucene index) and access facility • Standardized Metadata: hierarchical description of discovery- and some use-aspects of the data (ISO 19115/ISO 19139) • Standardized access interface:hide the complexity of specific data access mechanisms and pre-processing functionalities (webservice technology) • Automatic update and republishing of metadata:metadata of data processing is logged, managed and can be harvested (AMGA + java extension, OAI-PMH server ) Collect & prepare analyse visualize 1st EU-Review May 15.-16 2007

  8. Data access in ESR grid projects 1st EU-Review May 15.-16 2007

  9. SE • Publish (ISO 19115/19139) CE WN WN WN (f) Publish (ISO 19115/19139) WN WN WN OAI-PMH server Webservice Interface (b) Harvest (OAI-PMH) (g) Harvest (OAI-PMH) Bridging EGEE and C3 German Climate Data Providers: WDC Climate WDC RSAT WDC Mare DWD AWI PIK IFMGeomar MPI-Met GKSS EGEE LFC Catalog Data Resource Metadata C3Grid data interface Climate Data Workspace AMGA Metadata Catalog UI OAI-PMH server Webservice Interface Lucene Index Web Portal C3 1st EU-Review May 15.-16 2007

  10. Demo • Search-, discover-, and select- functionalities of the portal • Upload and register data to EGEE • Trigger the example workflow qflux from the portal 1st EU-Review May 15.-16 2007

  11. SE (c) Stage & Provide (f) Transfer & Register (lcg-tools) (b) Retrieve (jdbc or archive) CE WN WN WN (f) Publish (ISO 19115/19139) WN (g) Register (Java-API) WN WN Webservice Interface Webservice Interface (d) notify (e) Request (webservice) (a) Request (webservice) Upload pre-processed data to EGEE EGEE LFC Catalog Data Resource Metadata C3Grid data interface Climate Data Workspace AMGA Metadata Catalog UI OAI-PMH server Webservice Interface OAI-PMH server Webservice Interface Lucene Index Web Portal C3 • Find & Select (2) Collect & Prepare 1st EU-Review May 15.-16 2007

  12. SE (c) retrieve (lcg-tools) qflux (d) Update (Java-API) CE (b) submit (glite) WN WN WN (f) Publish (ISO 19115/19139) WN WN WN Webservice Interface (e) Return graphic (a) Request (webservice) (g) Harvest (OAI-PMH) Trigger qflux workflow EGEE LFC Catalog Data Resource Metadata C3Grid data interface Climate Data Workspace AMGA Metadata Catalog UI OAI-PMH server Webservice Interface OAI-PMH server Webservice Interface Lucene Index Web Portal C3 (3) Analyse (4) Visualize 1st EU-Review May 15.-16 2007

  13. Potential Impact Ease and accelerate the search, discovery, access and processing of German ESR data • Potential impact on the German ESR-community Provide a framework to easily and consistently exchange and manage esr-data and tools between EGEE and traditional earth science data-storage-systems • Potential impact on current and potential EGEE ESR-community Other portals or infrastructures can be integrated analogously to EGEE • Potential impact on international ESR-community Built on international standards thus easy adaptable/expandable by other disciplines and by further partners • Potential impact on other disciplines 1st EU-Review May 15.-16 2007

  14. Next steps • Expand the demonstrated prototype to a reliable and stable system • Porting further workflows and some pre-processing functionalities to EGEE • Enlarge the user community 1st EU-Review May 15.-16 2007

  15. Future challenges or missing bricks • Establish a comprehensive and consistent security context to control access to (restricted) data with a single sign-on • C3Grid starts to implement a federated AA infrastructure based on Shibboleth • Describe analysis-services to improve discovery, use and share possibilities • First approaches to adapt ISO19119/19139 as a common metadata format for tool description • Modularize workflows to increase the flexibility and enable intelligent scheduling • First steps to implement a workflow information service 1st EU-Review May 15.-16 2007

More Related