260 likes | 366 Views
Joining the Dots Managing and identifying geolocated data by DOIs and IGSNs. Jens Klump | OCE Science Leader Earth Science Informatics. 20 August 2014. Mineral Resources Flagship. A few words to introduce myself.
E N D
Joining the DotsManaging and identifying geolocated data by DOIs and IGSNs Jens Klump | OCE Science Leader Earth Science Informatics 20 August 2014 Mineral Resources Flagship
A few words to introduce myself ... 1992 – 1995 B.Sc. in geology and in oceanography, Univ. Cape Town, South Africa. 1995 B.Sc. (Honours) in geology (exploration geochemistry) from Univ. Cape Town, South Africa. 1996 – 1999 PhD in marine geology (biogeochemistry) from Univ. Bremen, Germany. 1999 – 2000 training in application and database development, project management. 2000 – 2001 IT project manager for DIE ZEIT (weekly newspaper, Hamburg, Germany). 2001 – 2014 senior research scientist at the German Research Centre for Geosciences GFZ, Potsdam, Germany. Since March 2014 CSIRO OCE Science Leader Earth Science Informatics. Joining the dots | Jens Klump
Previous Work • Supporting the research data value chain • Understanding data management in geoscience research • Development of project and enterprise research data solutions • Development and implementation of persistent identifiers (DOI, IGSN) • Integration of data from heterogeneous sources • Information models to describe data and processes • Semantic technologies for data interoperability • Adoption of new technologies • Studies on HPC, visualisation, 3D printing, internet of things • Application of information technology to geosciences • Sensor web enablement in environmental monitoring networks and in the laboratory, • Data driven research on natural gas hydrates Joining the dots | Jens Klump
DOI: Data Publication and Citation Making data part of the record of science
HTTP Error 404 Joining the dots | Jens Klump
History of DOI • “Link rot” was recognised as a problem early on and led to the development of the handle system of persistent identifiers in 1995. • DOI proposed 1997 and in production since 1998. • First DOI for data minted 2004 in the context of DFG project. • A business model had to be found to expand DOI for data to an international scale. • DataCite founded in 2009. • 31 members at present, 3.6 M datasets registered (1.2 M in last 12 months) • Total journal publications was estimated at 1.8 M articles for 2012. • Some of the data sets are really fine grained. Joining the dots | Jens Klump
Data in publications http://dx.doi.org/10.1594/GFZ.SDDB.1043 Joining the dots | Jens Klump
Access to data • Description • Citation • Related materials • Download data • Download metadata • ISO19115 • NASA DIF • DataCite • eSciDoc http://dx.doi.org/10.1594/GFZ.SDDB.1043 Joining the dots | Jens Klump
DOI for data • Resolution • Resolution from DOI to URL provided by Handle service. • Granularity? • What is the smallest identifiable object? • Identity? • What exactly is identified by a DOI? • Versioning? • Updates, corrections, errata … • Time series? • Continuing time series from environmental monitoring Joining the dots | Jens Klump
The Ship of Theseus Paradox Year 1 Year 2 Change one plank Year 3 Change one plank Year n Change one plank Joining the dots | Jens Klump
The Ship of Theseus Paradox Year n Collected planks Year 1 Year 2 Change one plank Year 3 Change one plank Year n Change one plank Joining the dots | Jens Klump
The ship of Theseus Paradox Can any object be identical with another object? Is it the equivalent object we are looking for? What is represented by the identifier? Formally the Ship of Theseus Paradox can be approached by introducing the concept of perdurantism. The perdurantist view is that an individual has distinct temporal parts throughout its existence. Perdurantism is usually presented as the antipode to endurantism, the view that an individual is wholly present at every moment of its existence Joining the dots | Jens Klump
Single item Joining the dots | Jens Klump
Appended time series Joining the dots | Jens Klump
Updated item Joining the dots | Jens Klump
Snapshots Joining the dots | Jens Klump
Collection Joining the dots | Jens Klump
Publication of Geodata doi:10.1594/GFZ.SDDB.1202 Joining the dots | Jens Klump
Repositories vs. Services • How should data identifies by DOI be disseminated? • File based: • Generic, close to original record of science, OAIS compliant. • Limited for use by user agents (machines), often requires manual interventions. • Services: • Machine friendly, use can be automated. • Storage not OAIS compliant. • File based data can be transformed into services. Joining the dots | Jens Klump
IGSN: International Geo Sample Number Connecting Geology to the Internet of Things
Internet of Things “The Internet of Things refers to uniquely identifiable objects (things) and their virtual representations in an Internet-like structure.” Joining the dots | Jens Klump
Internet of Things • Specimens are a basic unit for Geoscience observations. • basic unit in data reporting. • basic unit for data discovery, access, and analysis. • Access to information about the samples is essential for evaluation and interpretation of specimen-based data. • Access to physical specimens allows to build more comprehensive datasets and facilitates re-use of resources. • No standard way to access information about specimens • Few online repository catalogues • Few disciplinary catalogues (e.g. Index of Marine & Lacustrine Geological Samples, IODP) • Incomplete specimen metadata in publications – if any. Joining the dots | Jens Klump
Why do we need identifiers for specimens? Locations of rock specimens in EarthChem called “M1”. Joining the dots | Jens Klump
Globally Unique Identifiers • Verification of literature data without GUID for data and drill holes or samples required in-depth knowledge of the organisational structures of ocean drilling. • Data were available, but difficult to find. • Search involved PANGAEA and SEDIS (IODP). Joining the dots | Jens Klump
Literature, Data, Samples doi:10... Search: ... doi: ... doi:10.1594/... doi:10... IGSN hdl: ... doi:10.1594/... doi:10.1594/... Joining the dots | Jens Klump
Why not use DOI for specimens? • DOI could be used for specimens. • Remember, it’s a digital identifier for objects, not only digital objects. • Historically, TIB Hannover declined to register DOI for specimens on formal grounds. This was prior to DataCite. • The use case of dealing with physical specimens called for a different set of rules even though structures are similar to DataCite. • Based on the Handle system, IGSN can easily be merged with DataCite in the future. Joining the dots | Jens Klump