260 likes | 372 Views
Metadata Concepts / Use in Climate Research. Stephan Kindermann , Martina Stockhause German Climate Computing Center (DKRZ) Hamburg, Germany. Overview. Metadata descriptions: sources, usage data level, preservation level, model level, domain knowledge level
Metadata Concepts / Use in Climate Research Stephan Kindermann, Martina Stockhause German Climate Computing Center (DKRZ) Hamburg, Germany
Overview Metadata descriptions: sources, usage data level, preservation level, model level, domain knowledge level Metadata standards, IT-principles
Metadata descriptions: sources, usage (I) Data Description Level: source: model run output format: gib, netcdf3/4 container formats (including basic metadata) metadata homogenization(„Climate and Forecast Convention (CF)“ conformance, CMOR2 compliance, controlled vocabs) usage: analysis tools, data access script, data search ( „linked data principle“) (II) Data Preservation Level: target: legacy data centers (e.g. WDCC) format: internal DB, various external formats, e.g. ISO 19139, DIF, .. usage: long term data storage and access, citation e.g. using DOIs
Metadata descriptions: sources, usage (IIl) Model Description Level: source: Researcher interviews, online questionnaire format: CIM(Climate Metadata for Climate Modelling Digital Repositories - Metafor FP7) Con-CIM: UML, APP-CIM: XSD + vocabs) usage: model intercomparison, scientific portals, information space browsing / search (lV) Semantic Annotion Level: source: data metadata, model metadata, domain knowledge metadata format: OWL (RDF) usage: user navigation in portals, „faceted search“ etc. deployments: Earth System Grid CMIP5 portal, IS-ENES portal
B) Metadata standards, IT principles (I) Data Description Level: Metadata File naming convention based on CVs building uniform URIs (DRS, Data Reference Syntax) Data Activity/Product/Institute/Model/Exp/frequ/realm/Variable/ensemble Grib, netcdf data containers 10`s of PBytes Data servers MD catalogue servers Enabling „linked data“ wget http://server.org/Activity/Product/../ensemble
B) Metadata standards, IT principles (II) Data Preservation Level: WDCC Metadata Concept CERA GUI IS-ENES Portal … search API • Scalability • Sustainability • Flexibility • User friendly GUIs Common CV CERA2 DB schema OWL conceptual model QC, DOI assignment, .. Tape Archive
B) Metadata standards, IT principles (III) Model Description Level: Metafor FP7 project: Common Information Model (CIM) • Formal metadata model of the climate modelling process • It includes descriptions of the experiments being undertaken, the simulations being run in support of these experiments, the software models and tools being used to implement the simulations and the data generated by the software. • CMIP5 use case: CV collection, CMIP5 questionnaire
Metafor CIM overview CONCIM (UML) Automatic translation ISO, Geographic Markup Language (GML) series APPCIM (XSD) CMIP5 portal(s) IS-ENES portal Metafor catalogue CIM Instances(interliked XML files)
Automatic XML RDF translation ESG OWL instances IS-ENES1 portal CMIP5 gateway(s) 1Infrastructure for the European Network for Earth System Modelling
(IV) Semantic Annotation Level B) Metadata standards, IT principles Portal(s) ESG Gateways RDF CIM XML OWL ontologies: http://ontologies.ucar.edu/owl Data object XML Triple Store IS-ENES Portal Content Management System Community content RDF Triple Store Rel. DB Evolving OWL model
THREDDS Data Server Metafor / CIM Questionnaire MD on model+simulation MD on data MD Quality Checks L2 Data Quality Checks L2 QC DB MetadataRepository CMIP5 Quality Control Files Data Metadata CIM Metadata Data in prescribed DRS Syntax Information MD Quality MD Data MD
THREDDS Data Server Metafor / CIM MD on model+simulation +data+quality MD on data QC DB Data Quality Checks L3 double check, cross checks CMIP5 STD-DOI Publication TIB:DOIRegistrationAgency Data Data Node Metadata DOI Target Pageaccess todata and metadata Filesystem STD-DOI Catalogue QualityMD Data MD InformationMD Longterm Archive STD-DOI MD Information MD WDCC:DOI Publication Agent
(IV) Semantic Annotation Level B) Metadata standards, IT principles Portal(s) ESG Gateways RDF CIM XML OWL ontologies: http://ontologies.ucar.edu/owl Data object XML Triple Store IS-ENES Portal Content Management System Community content RDF Triple Store Rel. DB Evolving OWL model
2010-07-07 16:49:13 INFO triplestorefill.utility Adding item <ComponentModel at /test7/echam> with ID echam at http://localhost:8080/test7/echam 2010-07-07 16:49:13 INFO triplestorefill.sesameconnector Storing RDF... (1118 byte) 2010-07-07 16:49:13 INFO triplestorefill.sesameconnector RDF data: @prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix owl: <http://www.w3.org/2002/07/owl#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix dc: <http://purl.org/dc/elements/1.1/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . @prefix isenes: <http://www.enes.org/isenes#> . isenes:echam rdf:type isenes:ComponentModel . isenes:echam foaf:page <http://plone.dkrz.de/test7/echam> . <http://plone.dkrz.de/test7/echam> foaf:topic isenes:echam . isenes:echam dc:title "ECHAM" . isenes:echam rdfs:label "ECHAM" . isenes:echam rdfs:comment "Global circulation model" . isenes:dkrz isenes:isResponsibleFor isenes:echam . isenes:echam isenes:hasResponsible isenes:dkrz . isenes:joachim-biercamp rdfs:label "Joachim Biercamp" . isenes:joachim-biercamp rdf:type foaf:Person . isenes:dkrz rdfs:label "DKRZ" . isenes:dkrz rdf:type foaf:Organization . isenes:joachim-biercamp isenes:isMemberOf isenes:dkrz . isenes:dkrz isenes:hasMember isenes:joachim-biercamp . isenes:dkrz dc:title "DKRZ" . isenes:joachim-biercamp foaf:mbox "biercamp@dkrz.de" „save“ Triple Store
(B) From a user`s perspective Bildchen: Plone seite mit „related info“ portlet
(B) From a user`s perspective Bildchen: Plone Seite nach Klick auf „related“ link: faceted search
Summary • international CMIP5 / IPCC effort is key driver for collection • / standardization of CVs, Metadata, • conceptual models (Ontologies) • Metadata mainly used for • model intercomparison, uniform data search / access • + data processing • Prepare for Climate Impact Community use cases !!
..workshop reminder.. - Usage and quality of descriptive keyword type of metadata used in your domain to manage data. - Types of usages of this metadata (management, retrieval, research statistics, machine processing, etc). - The standards used for your metadata descriptions (structure, elements, vocabularies). - Adherence to common IT principles (explicit syntax, registered semantics, use of PIDs, etc). - Compliance with the recommendations to be found in the report of the e-IRG task force on Data Management http://www.e-irg.eu/publications/e-irg-task-force-reports.html ..therefore we would like the presenters to focus on a few points allowing all of us to draw conclusions at the end:
Producers: providers of models, tools, model results, HPC ecosystem, Grid .., community Motivation • Consumers: ENES community, impact community Portal E-infrastructure components Governance Agreements, Commitments, Sociology,.. Virtual Earth System Modeling Resource Centre CMIP5/AR5/+ data services Ticketing AAI Collaboration Metadata (CIM,..) Protocols APIs
IS-ENES vERC Portal Requirement E-Infra component Technology used (A) Community info presentation (models, tools, descriptions,..) Content Management Sytem (CMS, Collab.Tool) Plone + IS-ENES „content-types“ Project Management / Ticketing Tool Redmine (B) Community development support Zope/Plone plugin(s) (C) Data portal to AR5 archives Web Framework (external) Metafor service(s) (external) ESG-gateway (D) CIM metadata Web service (proxies) Python info collector based using Atom, OAI-PMH,.. protocols (E) External content / metadata collection Info (XML) harvester „Cross-selling“ Semantic interlinking (F) Additional value provisioning RDF triple store (Sesame)