180 likes | 356 Views
Conception of Citing Scientific Primary Data (Result of CODATA WG, supported by DFG). Michael Lautenschlager WDC for Climate Max-Planck-Institut für Meteorologie IDF Coordination Meeting, Hannover, 04.09.2003. Who are we?
E N D
Conception of Citing Scientific Primary Data(Result of CODATA WG, supported by DFG) Michael Lautenschlager WDC for Climate Max-Planck-Institut für Meteorologie IDF Coordination Meeting, Hannover, 04.09.2003
Who are we? CODATA, the Committee on Data for Science and Technology, is an interdisciplinary Scientific Committee of the International Council for Science (ICSU). We are established over 30 years and our secretariat is housed at 51, Bld de Montmorency, 75016 Paris, France. What are our objectives? In short, the reason for CODATA is to help foster and advance science and technology through developing and sharing knowledge about data and the activities that work with data. CODATA
CODATA National Committee initiated WG, grant-aided by DFG Working Period September 2001 to May 2002 Result Final Report "Konzept zur Zitierfähigkeit wissenschaftlicher Primärdaten" or "Conception of Citing Scientifc Primary Data", Hannover, 29.05.2002 Continuation One year project for pilot implementation funded by DFG CODATA WG
Carola Kauhs (Head of Library, Max-Planck-Institut für Meteorologie, Hamburg) Dr. Michael Lautenschlager (WG-Speaker and Director WDC for Climate; Gruppe Modelle und Daten am Max-Planck-Institut für Meteorologie, Hamburg) Dr. Manfred Reinke (Scientific Information Systems, Stiftung Alfred-Wegener-Institut für Polar- und Meeresforschung, Bremerhaven) Prof. Dr. Gerhard Schneider (Head of Computing Centre, Universität Freiburg) Dr. Irina Sens (Deputy Head of Technische Informationsbibliothek und Universitätsbibliothek Hannover) Dr. Uwe Ulbrich (Institute für Geophysics and Meteorology, Universität zu Köln) Dr. Joachim Wächter (Head of Data and Computing Centre, GeoForschungsZentrum Potsdam) WG-Members
Problems Concept Scientific and Technical Data DOI Pilot Project Cost Model Content
Limitation to geo-referenced data Primary data with defined space-time relation, e.g.. observational stations, satellites, climate modells Limitation to research Data Especially data from time limited projects Widely dispersed, not long-term saved, poorly documented Exclusion of data from civil services and agencies Centrally archived and documented, but access restrictions Partly scale of charges and fees for dissemination WG Constraints
Shortcomings in data provision and interdisciplinary use Rules of good scientific practise are not taken into account in all cases. Data sources are widely unknown. Data are achived without context. Method of resolution: publication of primary data Persitent Identifier (PI) for long-term data referencing Individual scientists will be motivated to document and to customise their primary data. Preferred dissemination by Internet Standard in science Allows for direct data access Problem and Solution
"Citation Index": Scientific efficiency is "measured" by publications. Extra work for data publication is currently not acknowledged. Data processing, context documentation, quality assurance. Recommendation: Data publications should be included in the "Citation Index". Motivation of the individual scientist. Connection between person and primary dataset. Citable Data publications support the rules of good scientific practise. encourage inter-disciplinary data utilisation. Credits in Science
Scientific Journals Restriction to original scientific work Only limited interest in data publication Copy rights on the data are shifted to publishers Example: Cristallography Measured spectral data of scientific publications are collected by the publisher in a central database Data access is controlled by the publishers, only limited decision about data access by sciences Primary data are considered as self-contained entities Databases and data products are fundamentals for different publications How to reference and to cite primary data entities? Publication of Primary Data in Journals
Concept developed for web publications: Uniqueness Identification of units of intellectual property Metadata kernel Description of referenced entity Immutability Identifier are allocated nonrecurring, entity left unchanged Stable connection Connection between identifier and referenced entity is stable Central resolution Entity must be accessible by the identitfier Persistent Identifier
Critical points are securing of data quality and stable connection between identifier and data entity PI allocation is restricted to syntax control and completeness, i.e. expert data description and long-term archiving Scientific quality assurance is done by the author / originator. High-quality data sets achieve good positions in the "Citation Index" Stable connection between PI reference and data entity as well as long-term availability of the primary data are essential. Criteria for PI Allocation
"Digital Object Identifier" (DOI) identifies and administers units of intellectual properties independent of the form and the granularity. DOI consists of an organisation dependent prefix and the identifier. 10.1007/s102360100001 DOI connects object identificiation with URL (= storage location of the object) and with metadaten kernel (= description of the object) Global handle system is provided by IDF (International DOI Foundation), consistent entry point Commercial application: Links to publications across different publishers DOI-System
Uniform Resource Name (URN) Supervision by IETF (Internet Engineering Task Force) Similar structure and functionality compared with DOIurn:nbn:de:gbv:089-33217752945 Application Non-commercial usage in library projects (e.g. registration of online dissertations by the DDB) Central resolution system comparable to DOI is yet not implemented Perferable PI for scientific primary data is presently the DOI URN
Concept: DOI for primary data in responsibility of sciences Allows for access regulations without commercial background and Copy rights remain by the data originators Structure: DOI metadata kernel will be expanded by bibliographic specifications which allow for citation as for written publications Allocation of a STD-DOI will be assessed as data publication Data set / -entity is then citable as independent object like "Author, publication year: dataset name, STD-DOI" DOI system does not substitute an expert data model, which is located at the expert level of long-term archiving Application Profile: STD-DOI(Scientific and Technical Data DOI)
On top of the DOI metadata kernel as defined in the handbook Additional items for citation of electronicdocuments are the basis: AuthorSTD-DOI TitleDOI-Kernel Sub-title if applicable Publication dateSTD-DOI Institution / PublisherSTD-DOI Data amount / no. of pagesSTD-DOI Place of publicationSTD-DOI Identification number (DOI, ISBN)DOI-Kernel URLDOI-Kernel Language if applicable Edition / version if applicable Volume / series if applicable STD-DOI Metadata
International DOI Foundation Agent Cuneiform Writing Global Handle System Registration Agency Agent Weather Agent Cristall DOI-Prefix: xxx Sub-Prefix xxx1 xxx2 xxx3 DOI-Metadata Entry Primary Data Data Storage Long-term Archiving Application Profile: STD-DOI Architecture of Primary Data DOI Contract Securing of Compliance with Allocation Criteria
M&D/MPIM Climate Models International DOI Foundation GFZ Geophysics Marum/AWI Observations Global Handle System TIB Hannover Registr.Agency Data Storage Long-term Archiving In WDC Data Storage Long-term Archiving In WDC Data Storage Long-term Archiving DDB URN-Knot DFG Project "Publication and Citation of Scientific Primary Data"
Pilot phase: project funding (DFG) (P1) Feasibility study including overall costs (P2) Pilot implementation Operation:Accounting on work load basis (O1) One-time charge for DOI registration and maintenanceor (O2) One-time charge for registration and annual charge for maintenance Support by project funding agencies O1 "One-time charge" fits better the project funding limitations. It must be allowed to include STD-DOI and long-term archiving in project grants. Cost Model