320 likes | 455 Views
C3-Grid * Federation System for Climate Data Handling. Stephan Kindermann German Climate Computing Center – DKRZ. * C ollaborative C limate C ommunity Grid Project (Part of D-Grid Initiative). Overview. C3Grid Overview: Architecture, Partners, Goals..
E N D
C3-Grid* Federation System for Climate Data Handling Stephan Kindermann German Climate Computing Center – DKRZ * Collaborative Climate Community Grid Project (Part of D-Grid Initiative) GO-ESSP 2008
Overview • C3Grid Overview: Architecture, Partners, Goals.. • C3Grid Federation System Components: • C3Grid ISO Discovery Metadata and Metadata Catalog A short interop. study: C3Grid ISO Metadata / Geonetwork • Data Access and Preprocessing • C3Grid Security • C3Grid / IPCC ? GO-ESSP 2008
C3Grid: Overview C3Grid Data Providers World Data Centers Universities Research Institutes IFM-Geomar FU Berlin Uni Köln Climate Mare RSAT DWD DKRZ PIK GKSS AWI MPI-M ISO Discovery Metadata Data Access Interface (B) (A) Collaborative Grid Workspace D-Grid (SRM, d-cache,..) C3RC Workflow Data + Data + ISO 19139 Discovery Catalog Metadata Metadata Grid Data / Job Interface C3Grid Data and Job Management Middleware Result Data Products + Metadata Portal ? ! GO-ESSP 2008
(A) Metadata for Data Discovery: Design and Implementation C3Grid Data Providers Data Access Interface ISO Discovery Metadata (A) ISO 19139 Discovery Catalog ? GO-ESSP 2008
(A) Metadata – harvesting and lookup components • Technology • ISO 19115/19139 metadata profile • OAI-PMH harvesting catalogue • lucene based catalogue search • GridSphere based portal • Fast Range Queries • Java API + Web Service Interface madeavailable on sourceforge.net see also: http://www.panfmp.org GO-ESSP 2008
(A) C3Grid ISO 19139 profile Design criteria: • no schema extensions, profiling by restriction • restriction using schematron constraints • „the granularity of the discovery metadata should reflect the logical organization of the data repository at a sufficiently coarse grained level“ (1) • CF based content description • Link to resource metadata infrastructure (GT4-MDS based) (1) Inspire: DT Metadata – Draft Implementing Rules for Metadata (version 2, 02/02/2007) GO-ESSP 2008
(A) C3Grid ISO Profile • Description at aggregate level (e.g. experiment) • Aggregate extent description with multiple verticalExtent sections • Sub-selection in data request GO-ESSP 2008
<contentInfo><MD_CoverageDescription> <attributeDescription><gco:RecordType>air_temperature</gco:RecordType></attributeDescription> <dimension xlink:href="#verticalCRS_hPa"><MD_RangeDimension> <descriptor><gco:CharacterString>K</gco:CharacterString></descriptor> </MD_RangeDimension></dimension> </MD_CoverageDescription></contentInfo> <contentInfo><MD_CoverageDescription> <attributeDescription><gco:RecordType>sea_surface_temperature</gco:RecordType></attrib…> <dimension xlink:href="#verticalCRS_m"><MD_RangeDimension> <descriptor><gco:CharacterString>K</gco:CharacterString></descriptor> </MD_RangeDimension></dimension> </MD_CoverageDescription></contentInfo> Reference to vertical CRS (A) C3Grid ISO Profile: CF usage Content description based on (extended) CF names Link to corresponding vertical CRS GO-ESSP 2008
(A) C3Grid ISO profile • Data Distributor Info: • reference to C3Grid resource metadata catalog (MDS) (names service endpoints) • (optional: service endpoints) GO-ESSP 2008
(A) C3Grid ISO profile • Data provenance description: • by now (data staging output): simple sequence of ProcessStep descriptions • later (c3grid processed data): combined Source/ProcessStep blocks + external data provenance store GO-ESSP 2008
C3Grid ISO Profile: A short geonetwork experiment • Federation building: • OAI-PMH, WebDAV, Z39.50, geonet • Full ISO metadata support (ISO19139/19119) • OGC CSW 2.0 reference impl. • RSS and GeoRSS newsfeeds • SKOS based thesauri • adaptable to new schema`s • schematron constraint checking • On roadmap: • flexible ISO profile support • shibboleth integration GO-ESSP 2008
C3Grid ISO Profile: A short geonetwork experiment GO-ESSP 2008
Building complex metadata federations … • Harvesting via: • CSW • OAI-PMH • Geneonet • Web-Dav GO-ESSP 2008
C3Grid ISO Profile: A short geonetwork experiment • Import / Edit / Search: ok • Missing: • content (CF) search • vertical search • temporal BBox search • data staging GO-ESSP 2008
complete portal protoype to seach, access (pre-process) data described by C3Grid ISO profile in 3 weeks based on geonetwork open source solution .. GO-ESSP 2008
(B) Data Access and Preprocessing C3Grid Data Providers World Data Centers Research Institutes University Partners Data Access Interface ISO Discovery Metadata (B) (A) Collaborative Grid Workspace Data + Data + Data Analysis Workflow ISO Discovery Catalog Metadata Metadata Result Data Products + Metadata ? ! GO-ESSP 2008
(B) Data Access and Pre-Processing: Implementation Data Staging Request Processing jobs • C3Grid Generation 1: secured plain web services • (status) • C3Grid Generation 2: WSRF service interfaces (scheduled november 08) • Generation 2+: full PKI/SAML security stack Data IDs Offer Time / resource estimation JSDL based description Selection: • lon, lat, alt • time • content: CF Output Properties Data Staging Web Service WS GRAM skeleton impl status .. Local resource manager Provider staging jars Provider staging scripts MD DB Flat File DB Distributed C3Grid Work Space Archive GO-ESSP 2008
C3Grid Middleware Components Scheduler: Globus WSRF based, accepts WSL workflow description: compute tasks + data staging tasks Datamanagement: Globus WSRF based, offer negotiation with scheduler, consistent view to distributed data, (later: replica management, caching) Globus MDS Resource Metadata Catalog: service registry, resource status Dependency on Globus SW stack, no high level impl. support tools, impl. Globus 4.1.x migration ??, problems with delegation impl. (insufficient docu. and guidance) GO-ESSP 2008
C3Grid Workflow Analysis task-related workflow-related interaction an moitoring via WS Notification standard Handler to facade single/ specific Tasks monitoring and management of workflow execution (individual) scheduling strategy to optimize the management analysis and preparation of workflows GO-ESSP 2008
(C) Security Infrastructure „Home attributes + VO attributes“ Identity Provider Home Organisation Attribute Provider Virtual Organisation Browser Shibb. login SAML SAML Portal C3Grid Middleware Webstart app Delegation Service GridShib SAML tools wflow client Grid Service X509 Grid-proxy Grid Service SAML Grid Resource <..SAML Assertions..> policy GridShib for GT SAML GRAM / DataRAM SLCS (CA) MyProxy Personal / Group Account DFN GO-ESSP 2008
(C) Security Infrastructure • Status: • Shibb IdP`s running at core C3Grid partners • Online CA for short-lived credentials tested, set up & operated by DFN (the German NREN) • Online CA (DFN-SLCS) accreditation process with EUGridPMA started • SLC contain campus attributes as SAML assertion • Java Webstart app to bootstrap SLCS in development at DFN • GridShib SAML Tools (v0.6.0) tested • Prototype of shibbolized GridSphere portal tested • open issues with GT4 proxy-delegation implementation • Next: • Integration of components • Virtual home organization for C3 users without a Shibboleth IdP • Integration of VO attributes (shibbolized VOMS) GO-ESSP 2008
C3Grid / IPCC Use Case • (0) IPCC Metadata harvested / mirrored in CERA DB (WDCC) • Metadata visible in C3 Portal • User issues IPCC data import from external repository • User OpenID IdP / + IPCC_Access role external repos • Download ?? C3 Repository • C3Grid grants access to users with IPCC_Access role • ‘grant procedure ?’: before each wflow exec. contact to IdP/AttributeService ?? or more offline method ? Analysis wflow Wflow result publication IPCC data import C3RC / C3 Workspace GO-ESSP 2008
Appendix GO-ESSP 2008
C3Grid Content Info (Version 2) <contentInfo> <MD_CoverageDescription> <attributeDescription> <gco:RecordType> CF_name_with_attribute </gco:RecordType> </attributeDescription> <contentType> <MD_CoverageContentTypeCode codeList="http://wis.wmo.int/2006/catalogues/cf-standard-name-table.xml" codeListValue="air_temperature"> air_temperature with a cell_methods attribute including time:mean (interval: 1 day) </MD_CoverageContentTypeCode> </contentType> <dimension xlink:href="#verticalCRS_hPa"><MD_RangeDimension><descriptor> <gco:CharacterString>K</gco:CharacterString> </descriptor></MD_RangeDimension</dimension> </MD_CoverageDescription> </contentInfo> GO-ESSP 2008
Security Aspect: C3Grid step 0 step 1 GO-ESSP 2008
parent collection 0..1 * has_parent is_part_of p_data is_generated_by + • Time stamp • Description • Citation info 0..* process step has_input + source • Description is_generated_by (C) Data Reuse of Analysis Results: Metadata Generation Portal Context description of Analysis Data: • Aggregation • Processing history WS Interface Lucene+ Index OAI-Harvester OAI-PMH Server C3Grid Workspace wflow “quality check” m_tool API Prototype (Python) GO-ESSP 2008