150 likes | 249 Views
Data Discovery and Basic Processing within the German Collaborative Climate Community Data and Processing Grid (C3Grid) Project. Heinrich Widmann and Stephan Kindermann Model and Data / DKRZ / Max-Planck-Institute for Meteorology Hamburg, Germany. GO-ESSP at LLNL
E N D
Data Discovery and Basic Processing within the German Collaborative Climate Community Data and Processing Grid (C3Grid) Project Heinrich Widmann and Stephan Kindermann Model and Data / DKRZ / Max-Planck-Institute for Meteorology Hamburg, Germany GO-ESSP at LLNL Livermore, June 19th – 21st, 2006 C3Grid Home: www.c3grid.de
Overview • C3Grid Background • Data Analysis Workflows • C3Grid Architecture and Interfaces • Data Discovery and Metadata in C3-Grid • Data Information Service with Lucene • Data Access and Preprocessing • Summary
C3Grid Background • C3Grid • Status : month 10 of 36 (phase 1) • is the earth system science community grid within the German D-Grid initiative • D-Grid includes five further community grid projects (AstroGrid, HEP-Grid, InGrid, MediGrid, TextGrid) • is a community driven grid • Goal is to develop a grid infrastructure appropriate for typical climate analysis workflows • Stepwise introduction and integration
C3Grid Data Analysis Workflow Requirements Grid technologies ISO19115 / ISO19139 OAI-PMH + Lucene community webservice Shibboleth Globus Toolkit 4 WS-GRAM Requirements Metadata Discovery Data access (+ preprocessing) Security Scheduling Complex processing
C3Grid Architecture and Interfaces Data Access and Basic Processing Data Discovery
C3Grid Data Discovery and Data Access C3 Metadata catalog Portal ISO 19115 / 19139 OAI harvester Discovery - Discovery - Workflow composition Use Data request OAI-PMH Scheduling Data Management Service Grid Infrastructure Metadata Data Access Web Service resource provider Web server / OAI provider Prop. Xml Prop. Rel. job submission • oids • time/space constraints • processing constraints preprocessing DB Files WS-GRAM World Data Centers (Climate,Mare,RSAT), DWD PIK, IFM-Geomar,.. analysis job data data data workspace workspace workspace workspace
gridded data Data Items: <MD_Metadata http://www.isotc211.org/xxx"> <fileIdentifier ../> <resourceConstraints ../> <extent … spatial+temporal bounding box .. /> <contentInfo ..> <attributeDescription ../> <distributionInfo ..> <DS_Series> <composed_of> <composed_of> </MD_Metadata> Metadata Metadata Metadata Database “implicit” Metadata • Raw Experiment Data • 3D multi variable • files • Postprocessed • Experiment Data • 2D single variable • time series <MD_Metadata …. > Post-processing <MD_Metadata …. > Archive Database C3 ISO 19139 Metadata “Profile”
C3Grid Data Information Service with Lucene inverted index Portal Webserver Apache Axis + Servlet Container Web service frontend indexing of selected fields full-text index DIS Apache Lucene harvesting backend <MD_Metadata>...</MD_Metadata> <MD_Metadata>...</MD_Metadata> <MD_Metadata>...</MD_Metadata> <MD_Metadata>...</MD_Metadata> OAI-PMH Archiv Pangaea CERA cache for ISO19139 documents [T. Langhammber, ZIB, Berlin]
C3Grid Data Access and Preprocessing • Data access interface • Community-specific webservice (WSDL) • Solutions of the individual institutes will be adapted to support the webservice • e.g. triggering of local data processing tools • Support data base and file based storage types • More detailed use metadata will be provided during the extraction process with the data
C3Grid Data Access/Preprocessing Interface data data data CF standard names Local variable names Stage file webservice request contains : • ObjectList of OIDs requested • CFList of standard names • Space constraints • Time constraints • Target directory • File format, e.g. netCDF or grib • … Constraints necessary processing SOAP-XML StageFile Request Files Data Access Web service DB Access CDO processing
Summary • Grid development is application driven • Discovery is based on • ISO 19115/19139 based metadata catalog • Hierarchical, two-leveled metadata scheme • Text based search in the catalog • Data access is implemented by • Proprietary C3Grid data access interface (webservice) • Part of the use data are provided along with the data extraction
C3Grid Architecture User User Interface API (Web Services) GUI Monitoring Job Submission • DistributedGrid Infrastructure • GT4 based • new Metadata-Service Search Workflow Scheduler DMS (global) Matchmaking DIS ResourceInformationService Staging Data Transfer Service Harvesting Task Execution Site C3Grid Components OAI / WS File Management DMS (local) Resource Scheduler Base Data & Meta Data Pre-Proc Data Job Meta Data ArchiveInterface Grid Workspace AvailableResources DBMS/File DistributedData Archives Distributed Processing Resources