260 likes | 379 Views
SeaDataNet Training Course. Introduction to SeaDataNet Metadata. Roy Lowry British Oceanographic Data Centre. Overview. An introduction to the SeaDataNet metadata formats covering Purpose Entity definition History Population Strengths Weaknesses. Overview. SeaDataNet metadata formats
E N D
SeaDataNet Training Course Introduction to SeaDataNet Metadata Roy Lowry British Oceanographic Data Centre
Overview • An introduction to the SeaDataNet metadata formats covering • Purpose • Entity definition • History • Population • Strengths • Weaknesses
Overview • SeaDataNet metadata formats • European Directory of Marine Organisations (EDMO) • Cruise Summary Report (formerly ROSCOP) • European Directory of Marine Environmental Datasets (EDMED) • European Directory of the Ocean Observing System (EDIOS) • SeaDataNet Common Data Index (CDI) • European Directory of Marine Environmental Research Projects (EDMERP)
EDMO • Purpose • Provides SeaDataNet with an address book of organisations associated with marine data • Provides descriptions of these organisations • Entity definition • Any group of people sharing a common postal address engaged in activities associated with marine data acquisition and use • History • Developed by Maris during SEA-SEARCH in response to a need to improve address metadata management across the project
EDMO • Population • On-line Content Management System fronted by a web form (http://www.sea-search.net/organisations/) • Partners are responsible for maintenance of their national record set • Management supported by a reasonably sophisticated access control system that authenticates users and grants access to the appropriate database subset
EDMO • Strengths • The maintenance tool. Please use it to look after the entries for your country • Provides a single point of entry for SeaDataNet metadata documents associated with a given organisation • Centralisation of metadata common to other catalogues, replacing four independently maintained address metadata repositories • Rich information content, including descriptions, logos and spatial location information
EDMO • Weaknesses • Simple data model is poorly equipped for the management of organisational evolution • Organisations merge, fragment, rename and move • All we can do in EDMO is document this using plain language fields • Text fields contain embedded markup • These look very nice when displayed through the search interface • However, the markup causes problems generating XML documents for record transport between systems • Examples including graphics and relative URLs break when transported by copy/paste
CSR • Purpose • To document the operational and data generation activities of an oceanographic research cruise • Entity definition • A subject of some controversy • I am a metadata purist and support the definition of a ‘cruise’ as the interval of time between leaving port and returning to port • Thus for a 3-leg cruise I would generate 3 CSR records whilst others would generate just one. I do this because: • Combining records is easier than splitting them • Cruise ‘legs’ for some ships can be VERY different (e.g. 3 legs of a Meteor cruise: one JGOFS, one OMEX, one WOCE) • Merging ‘legs’ is a slippery slope – I’ve even encountered a single record covering the activities of two ships three months apart
CSR • Entity definition (continued) • Problem with my definition is that the real world creates grey areas. For example, does a personnel change by pilot boat in an estuary count as ‘docking’? • Others, extend the definition to cover any activity collecting oceanographic data (shoehorning) • I believe this is a very bad thing to do • The activity super-class and other activity sub-classes are much better described by other metadata standards (e.g. in OGC Observations and Measurements) • Later on in SeaDataNet we could consider incorporating some of these to further enrich our metadata portfolio • In the meantime remember that it is NOT necessary to have every measurement covered by a CSR. If it isn’t appropriate, don’t create one.
CSR • History • Originally a paper form developed by IOC called a ROSCOP • Replaced in 1990 by the Cruise Summary Report with richer content (but the name ROSCOP stuck) • Numerous on-line databases developed during the 1990s • Primary repositories now DOD for SeaDataNet partners and ICES for non-SeaDataNet
CSR • Population • On-line web-form (http://www.sea-search.net/roscop/welcome.html) • XML schema available for bulk transfers • Strengths • Flexible population mechanisms • Long history with a massive legacy population • Cruise is (or should be) a well defined concept to oceanographers
CSR • Weaknesses • “Parameter” vocabulary • Really a vocabulary describing shipborne activities • No clear equivalent elsewhere for interoperability, but ontological mapping to multiple vocabularies might provide a solution • On-line systems developed using plaintext fields when controlled vocabularies would have made interoperability between repositories more straightforward • Spatial coverage limitations • Coarse-grained • Described using Marsden Squares but BODC has deployed a Web Service to convert these to ISO19115/DIF standard bounding boxes
EDMED • Purpose • To describe marine environmental datasets to promote their discovery • Entity definition • A dataset, but what is a dataset? • ISO19101 defines a dataset as ‘an identifiable collection of data’ which covers everything from the parameters measured on a single water sample to the 7,500,000 CTDs is the USNODC World Ocean Database • Sound judgement is needed to decide upon appropriate granularity • Best approach is to establish objective criteria • Worth remembering that a measurement may be included in more than one dataset • Posing this question to metadata specialists can provide good sport!
EDMED • History • Developed by BODC in late 80s • Adopted by EU MAST Data Committee, then SEA-SEARCH and now SeaDataNet • Population • Form interface to stand-alone Access database that is submitted to BODC for ingestion • XML schema available for bulk transfers • Strengths • Content quality controlled on ingestion, therefore standards are high • Rich content developed during SEA-SEARCH
EDMED • Weaknesses • Developed in splendid isolation, including vocabularies, therefore interoperability with other systems is difficult • Heavy dependence on plaintext fields: a problem that should be addressed during SeaDataNet
EDIOS • Purpose • To describe marine environmental datasets comprising data that are collected repeatedly, regularly and routinely in order to promote their discovery (initially for operational planning purposes) • Entity definition • A dataset comprised of data that are collected repeatedly, regularly and routinely, but what is a dataset (c.f. EDMED)? • History • Developed as an EU project led by EuroGOOS • Inherited by SeaDataNet
EDIOS • Population • Currently an issue • There is a Word-based form (the MIF) • Developed in parallel to the data model and database with no evidence of communication • Completed MIFs entered into the database at BODC, requiring significant interpretation and information rehashing (long and painful process) • SeaDataNet work in progress • IFREMER/BODC working to produce an XML schema to facilitate large-scale transfer • Maris/BODC developing a web-form based content management system along the lines of EDMO
EDIOS • Strengths • Rich data model based on structured fields with minimal plaintext • Data model includes hierarchical relationships between entities (project one-to-many observing programmes one-to-many measurement series) • Data model includes support for complex spatial objects (polygons not boxes) • Data model is particularly well suited to the description of operational oceanographic systems
EDIOS • Weaknesses • At the start of SeaDataNet EDIOS had 17 local vocabularies • Extremely poor content governance • Undergoing replacement with managed SeaDataNet standard vocabularies (6 down 11 to go) • Legacy content has not been systematically quality controlled
EDIOS • How is EDIOS different from EDMED? • Both are content standards designed to describe datasets • Any dataset described by an EDMED document could be described by an EDIOS document and vice versa • Once vocabularies have been harmonised and some mappings set up it should be possible to generate an EDMED document from an EDIOS document • Generation of an EDIOS document from an EDMED document will never be possible
EDIOS • How is EDIOS different from EDMED? • SeaDataNet convention is to use EDIOS for ‘qualifying’ datasets and EDMED for everything else • EDMED currently has a working population mechanism, but EDIOS does not • Advice to partners • Identify datasets to be described by EDIOS documents, map them to the EDIOS data model (relational schema and Access prototype on BSCW) and gather together the necessary information • Prepare EDMED documents for all other data sets and get them into BODC • Submit EDIOS entries to BODC once the necessary systems are operational
CDI • Purpose • To provide an ultra-light discovery metadata description of accessible SeaDataNet data objects • Used to build a manageable fine-grained index of discrete data objects (millions of entries) • Entity definition • The fundamental SeaDataNet data delivery unit such as a current meter record or a CTD profile • History • Developed by SEA-SEARCH as a pilot for SeaDataNet
CDI • Population • XML schema describing files that should be generated automatically from existing digital indexes • Strengths • Light content makes efficient handling of large numbers of records possible • Weaknesses • Light content restricts available information
EDMERP • Purpose • Description of European marine research projects and programmes • Entity definition • A co-ordinated collection of marine data acquisition activities in Europe • History • Developed by Maris during SEA-SEARCH
EDMERP • Population • Access form: resulting mdb file submitted to Maris • On-line content management system planned • Strengths • Provides centralised project metadata • Weaknesses • Local vocabularies and plaintext
That’s All Folks! Questions or Geoff?