310 likes | 460 Views
Pilot Implementation: Publication and Citation of Scientific Primary Data Result of CODATA WG, supported by DFG. Jan Brase Learning Lab Lower Saxony, Uni. Hannover Michael Lautenschlager WDC for Climate Model and Data / Max-Planck-Institute for Meteorology
E N D
Pilot Implementation:Publication and Citation of Scientific Primary DataResult of CODATA WG, supported by DFG Jan Brase Learning Lab Lower Saxony, Uni. Hannover Michael Lautenschlager WDC for Climate Model and Data / Max-Planck-Institute for Meteorology ERPANET WS, Cork, Ireland, 17+18.06.04 IDF Member's Meeting, London, 22.06.04
Roots • CODATA1) National Committee initiated WG, grant-aided by DFG • Working Period • September 2001 to May 2002 • Result • Final Report "Konzept zur Zitierfähigkeit wissenschaftlicher Primärdaten" or "Conception of Citing Scientifc Primary Data", Hannover, 29.05.2002 • Continuation • Two year project for pilot implementation funded by DFG starting in October 2003 • (1) CODATA - Committee on Data for Science and Technology)
Northern Hemisphere temperature response for scenario IS92a NH mean temperature anomaly relative to 1961 – 1990 mean of the IPCC DDC greenhouse gas only experiments ECHAM4 / 3 : DT = 4.3°C ECHAM4 / 2 : DT = 2.5°C ECHAM4 / 1 : DT = 0.7°C Each curve is connected with appr. 1TB data (numbers)
ECHAM4 / 1:Temperature 2000 -8°C to -12°C Corresponding to point 1 in NH temperature anomaly CO2 = 370 ppmv ECHAM4/OPYC greenhouse gas only according to IS92a
ECHAM4 / 2:Temperature 2050 -4°C to -8°C Corresponding to point 2 in NH temperature anomaly: CO2 = 500 ppmv ECHAM4/OPYC greenhouse gas only according to IS92a
ECHAM4 / 3:temperature anomaly 2099 0°C to -4°C Corresponding to point 3 in NH temperature anomaly: CO2 = 690 ppmv ECHAM4/OPYC greenhouse gas only according to IS92a
Problem and Solution • Shortcomings in data provision and interdisciplinary use • Rules of good scientific practise are not taken into account in all cases. • Data sources are widely unknown. • Data are achived without context. • Data cannot be cited as independent entities • Method of solution: publication of primary data as independent entities • Persitent Identifier with global resolving mechanism for data archive and context referencing (scientifc datamodel at archive level) • Integration into library catalogues in order to find data together with articles • STD-DOI application profile: meta data kernel + items for electronic publication (interface between scientific data archives and libraries)
Credits in Science • "Citation Index": Scientific efficiency is "measured" by publications. • Extra work for data publication is currently not acknowledged. • Data processing, context documentation, quality assurance. • Recommendation: Data publications should be included in the standard scientific "Citation Index". • Motivation of the individual scientist. • Connection between person and primary dataset. • Citable Data publications • support the rules of good scientific practise. • encourage inter-disciplinary data utilisation. • Make data searchable in library catalogues together with articles • Closes the gap between scientifc literature and related data sources
Criteria for Persistent Identifier Allocation • Critical points are securing of data quality and stable connection between identifier and data entity • Allocation is restricted to syntax control and completeness, i.e. expert data description and long-term archiving • Scientific quality assurance is expected by the author and will be reviewed during the allocation process. • Published primary data cannot be changed like published articles. • Stable connection between identifier reference and data entity as well as long-term availability of the primary data are essential and must be ensured (e.g. ICSU WDC's)
M&D/MPIM Climate Models International DOI Foundation GFZ Geophysics Marum/AWI Observations Global Handle System TIB Hannover Registr.Agency Data Storage Long-term Archiving In WDC Data Storage Long-term Archiving In WDC Data Storage Long-term Archiving DDB URN-Knot TIB-ORDER Library Catalogue DFG Project "Publication and Citation of Scientific Primary Data"
Primary data publication • During her research for the World Data Center Climate (WDCC) the scientist Mrs. Weather gains primary data about the weather in Hannover in the year 2003. • As usual the primary data is tested, evaluated, stored and administrated at the WDCC. • In addition Mrs. Weather registers the primary data at the TIB (Primary data publication by STD-DOI/URN assignment)
Registration of primary data • After quality assurance WDCC transmits to the TIB the URL where the data can be accessed, together with a XML-file containing all relevant metadata (generated from scientific data model) • Including all information obligatory for the citing of electronic media (ISO 690-2) • language • publisher • publishing date • publishing place • author • title • size • edition
Identifier • The TIB is saving this information about the primary data and awards the primary data with a unique identifier for registration: a DOI • DOI (Digital Object Identifier) is a system for persistent and actionable identification and interoperable exchange of intellectual property on digital networks • Coordinated by the International DOI foundation (IDF)
Citing primary data In her publications, Mrs. Weather is now citing this primary data with its unique DOI, maintaned from the TIB: doi:10.1594/WDCC/W_Han_2003_MMB_2 10.1594(Prefix) stands for the TIB as the registration agency. WDCCstands for the respective research institute. W_Han_2003_MMB_2is the internal name of the Data
Resolving the DOI • These DOI can be resolved (and the data can be cited) in every browser worldwide in three ways: • http://dx.doi.org/10.1594/WDCC/W_Han_2003_MMB_2 • http://doi.tib-hannover.de:8000/10.1594/WDCC/W_Han_2003_MMB_2 • Or by Doi://10.1594/WDCC/W_Han_2003_MMB_2 • (after installing a browser plugin)
Usage scenario 1 • Mr. Storm is reading publications from Mrs. Weather in a journal and would like to analyse her data under different aspects. • In his publication ”Comparison of the weather from Hannover and Miami” Mr. Storm cites Mrs. Weathers data using its DOI, refering to the uniqueness and own identity of the original data. • Citation example: Weather, 2003: Weather in Hannover for 2003. [doi:10.1594/WDCC/W_Han_2003_MMB_2]
Usage scenario 2 • Mr. Nice is writing a paper about the sales figures of ice cream in Hannover in 2003, but he has no information about the weather. • He uses the TIB as the central registration agency to start a metadata search over the registered primary data. • The result is doi:10.1594/WDCC/W_Han_2003_MMB_2 • He resolves the DOI to find the data sufficient. • The metadata refers him to the WDCC as publisher and data archive. • In his paper he cites the data again using their DOI.
URN • In cooperation with the German Library (DDB) in Frankfurt, every dataset is also registered with an unique URN, having the same structure as the DOI: DOI-Structure: 10.1594/WDCC/W_Han_2003_MMB_2 URN-Structure: Urn:TIB:10.1594/WDCC/W_Han_2003_MMB_2
Current situation • In cooperation with • World Data Center Climate (WDCC), Max Plank Institut für Meteorologie, Hamburg • Geoforschungszentrum Potsdam • World Data Center MARE, Uni. Bremen and Alfred Wegener Institute Bremerhaven • Learning Lab Lower Saxony, Uni. Hannover • the TIB Hannover now is the world‘s first registration agency for scientific and technical data (STD-DOI).
Technical • A Handle server is installed at the TIB Hannover, so TIB is able to register and resolve DOIs. • The TIB officially received a DOI Prefix (10.1594) • The first data sets have been stored at the TIB by hand. • The automatic registration process is under development.
Technical realization Central Library database Göttingen DDB International DOI Foundation Metadata storage • Cocoon-Webserver • XML-basiert • XSL-Transformierung • Handle Server DOI registration URN registration Data URL with XML-file WDCs GFZ
Outlook • 2004 • We expect abaout 10.000 datasets until the end of the year. • 2005 • The system shall be widened for other science fields • 2006 • The TIB Hannover shall become the central registration agency for scientific primary data
Further information • Project webpage: • http://www.std-doi.de • TIB Handle Server: • http://doi.tib-hannover.de:8000 • DOI Foundation: • http://www.doi.org • URN registration of the DDB: • http://www.persistent-identifier.de