1 / 25

CCLRC Scientific Metadata (CSMD) Model April 2004 NESC

CCLRC Scientific Metadata (CSMD) Model April 2004 NESC. Model Motivation. A common general format/standard for Scientific Studies and data holdings metadata does not exist By proposing Model and Implementation:

farica
Download Presentation

CCLRC Scientific Metadata (CSMD) Model April 2004 NESC

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CCLRC Scientific Metadata (CSMD) Model April 2004 NESC

  2. Model Motivation • A common general format/standard for Scientific Studies and data holdings metadata does not exist • By proposing Model and Implementation: • Form a specification for the types of metadata studies should captured by Scientific Studies • Ease citation, collaboration, exploitation and Integration • Allow easy Integration of distributed heterogeneous metadata systems into a homogeneous (albeit virtual) Platform

  3. Structure of Metadata Model • The CCLRC Scientific metadata model (CSMD) is a study-data set orientated model: • Indexing • Provenance • Data Description • Data Location • Access Conditions • Related Material

  4. What influenced CSMD • CIP from Earth Observation • DDI from Social Sciences • DublinCore from the Library community • Publication only metadata • XSIL as used on LIGO • Low level ‘Scientific Data Objects’ focus • CERA from the MPIM • A bit specific to Earth Sciences but close • … hence the need to develop out own General Model – CCLRC Scientific Metadata Model

  5. some Model aims • Abstract class orientated description of the types of metadata that should be captured by Scientific Studies • Create a denominator for Scientific Study metadata which form a specification • Metadata workshop at NIEES 2002 during a discussion on metadata standards – are people capturing metadata at the moment – simple answer given was no !!

  6. CSMD Used on DataPortal • XML Implementation used as Data Interface for DataPortal • Single view of heterogeneous systems/schemas • Acts as a stress test of the model • Limitations feed into Model Requirements • New requirements fed back into implementation

  7. Model Breakdown: Provenance • The Study contains the following metadata: • The Study Name • The Study Institution • The Investigator • Extended Study Information • Abstract • Funding • Start and End times • Investigations

  8. Investigations • A Study can have more than one investigation; possible enumerations are experiment, simulation, measurements etc. – investigations contain: • Name • Investigation Type • Abstract • Resource • Link to DataHolding

  9. Topic (for indexing) • Keywords • Discipline (i.e. domain) • Keyword Source (e.g. domain dictionary) • Keyword • Subjects • Discipline • Subject Source (e.g. domain taxonomy) • Subject

  10. Access Condition & Related Material • Access Conditions • Contains a list of users or groups who are allowed access to the metadata and data, or a pointer to an access control system which contains such data for this study • Related Material • One or many links and or textual descriptions of material related to this study e.g. earlier studies or parallel studies

  11. Data Description holds a logical description of the Study’s data: Data Name Type of Data Status Data Topic Parameters Related Data Ref Relation type (e.g. derived) Data Location contains the link between logical name and physical URI’s Data Name Locator(s) Data

  12. More on Parameters • Parameters contain a lot of information about the data objects (DO) and collections • A collection/DO can have many parameter entries, each parameter entry contains: • Parameter derivation (e.g. measured/fixed) • The value • The units • Range • Error margin • Parameter aggregation is also supported

  13. Cardinality Issues • The model recommends a certain cardinality of elements • Certain metadata components are necessary for one to have an instance of the implemented model – treating everything as optional is not acceptable • It is though implementations may modify this more to their needs – model attempts to remain ideal (i.e. most common Cardinality)

  14. Enumeration Issues • Enumerations (or controlled vocabularies) e.g. types of investigator, types of institutions; these are distinct from the model e.g. as taxonomies are. • However they are necessary for the model to work so implementations e.g. CCLRC DataPortal XML implementation of the model propose some enumerations for common things • Recognised and relevant controlled vocabularies are hoped to be used by implementation where they are available

  15. Conformance Level • For a complete metadata study-dataset record a large amount of metadata has to be stored/processed • So it’s useful to have conformance levels • Model uses 5 levels • Each level specifies more metadata (and Indexing information) should be held

  16. Level 1 • Type of Information captured: • Study and Investigation metadata with indexing at the Study level • Level 1 metadata is similar to library/publication style metadata (e.g. DublinCore)

  17. Level 2 • Type of Information captured: • Level 1 + DataHolding metadata (i.e. DataSets and DataObjects)

  18. Level 3 • Type of Information captured: • Level 2 + related material, Access condition, indexing to data collection levels

  19. Level 4 • Type of Information captured: • Level 3 + indexing to data object level and data object parameter information

  20. Level 5 • Type of Information captured: • All metadata components are filled as L4 + funding, resources used, facilities used etc

  21. Conformance Levels • L1 is similar to library/publication style metadata (e.g. DublinCore) • The current DataPortal uses somewhere between L2 and L3 – indexing at study level moving towards collection level but with parameter information • Envisaged only new systems designed with CSMD will conform to L4+ • Benefit of conformance levels; the higher the level of conformance to the CSMD the richer the clients that operate on the data can be • e.g. identifying datasets and objects which link directly to keywords/taxonomies and not just studies

  22. Facilities using CSMD • CCLRC Facilities (via CCLRC DataPortal): • ISIS - Neutron Spallation at Rutherford Appleton Laboratory (test) • SR – Synchroton Radiation source at Daresbury Laboratory (test) • British Atmospheric Data Centre (BADC) at RAL (prototype) • External Facilities (via CCLRC DataPortal): • Max-Planck-Institut für Meteorologie (MPIM) in Hamburg • External Projects using CSMD • NERC funded E-mineral ‘environment from the molecular level’ • EPSRC funded E-materials project • Manchester MyGrid project uses an adapted version • ISIS (RAL) have taken data needs inhouse and use a model based heavily on CSMD

  23. The Future • Increased use/recommendation for use of Controlled vocabularies • Increased support for formal identification systems • Feeding relevant ideas from other standards • Update XML and Relational implementations so they more closely track the model. • Look into internationalisation issues and see if these effect the model or the implementations

  24. More information • Latest Model description • http://www-dienst.rl.ac.uk/library/2002/tr/dltr-2002001.pdf • For an XML implementation and Relational Implementation, newer draft of the model documentation e-mail: • dataportal@dl.ac.uk with the subject containing [metadata model request]

More Related