210 likes | 371 Views
PANGAEA Archiving and Publication of Scholarly Data for the Long Tail of Science . Michael Diepenbroek. What is PANGAEA?. Information system for long -term archiving and publication of data from earth & environmental sciences ( since 1993)
E N D
PANGAEAArchiving and Publication of Scholarly Data for the Long Tail of Science Michael Diepenbroek
Whatis PANGAEA? • Information systemforlong-term archivingandpublicationofdatafromearth & environmental sciences(since 1993) • Accreditedbythe „World Meteorological Organisation“ (WMO) as„World Radiation Monitoring Center“ (WRMC)(since 2007) • Accreditedbythe „International Council for Science“ (ICSU) as World Data Center„Publisher for Earth & Environmental Science“ (World Data Center) (since 2001)
PANGAEA - contents • Integral partofscience • More than 160 European to international projectssince 1995 (www.pangaea.de/projects) • highlyheterogenous &dynamic • multidisciplinary Total numberofdatasets ~350.000 Data volume <2 PB Increase ~5% per year
PANGAEA - technicalarchitecture Harddisk + tape (silo) Sybase IQ Sybase ASE Editorial System RDB warehouse Curators Webserver Middleware Ticket System PANGAEAsearchengine IQ interface Various services Users
PANGAEA - interoperability Portals • CARBOOCEAN • EUR-OCEANS • IODP - SEDIS • ICSU WDS portal • ESONET/EMSO Broker function • GBIF, OBIS Sensor webs • ESONET/EMSO, Statoil Conformto global standards • ISO19xxx, OGC, W3C, OAI
PANGAEA – interoperability data management & longterm archiving Frontends / portals catalogues catalogues protocols WS(SOAP/WSDL) Elsevier,Scopus … marshaller PANGAEAweb frontend Index PANGAEA Geoserver(OGC) gml, kml INSPIRE XSLT GEOSS OGC CSW IODP RDB ISO19115 ICSU WDS harvester ISO19115 Thomson Reuters Dublin Core EUR-OCEANS DIF harvester OAI-PMH CARBOOCEAN DIF PubMed Dublin Core OpenAire harvester Darwin Core OCLC DIGIR Darwin Core Google harvester STD-DOI DOI registration OBIS WS(SOAP/WSDL) ISO690 GBIF DOI registry DataCite
The Long Tailof Data Professionallymanaged & publisheddata Large scalemonitoring & computeddata & disciplinarydatacenters Unmanaged open accessdata Unmanaged & non-publicdata Data from individual scientists, labs, orsmallerprojects Fitness ofuse Total volumeofscientificdata
Publishing data with PANGAEA • Citable & persistent (DOI) • CC-BY License • Quality data • QA/QC -> reviewprocedures • Efficientusage • (Meta)data & interoperabilitystandards(mashinereadable) • FITNESS OF USE! XLSX TXT DOC XML Data Set Data Set NetCDF PDF Data Set Data Set GRIB CSV Data Set Data Set … XLS Data Set Data Set … Data Set • OECD principlesandguidelinesforaccesstoresearchdata (2007)
Data publication- citability time Article Data Article Data Data Article Data Article Data
Impact on citationrates 35% to 69% more citations! Piwowar HA, Day RS, Fridsma DB (2007) Sharing Detailed Research Data Is Associated with Increased Citation Rate. PLoS ONE 2(3): e308. doi:10.1371/journal.pone.0000308 courtesy of Jon Sears (AGU)
Collaborationbetweendatacenters & sciencejournals • linking editorial workflows • linking services
Linking infrastructure Data archive Catalogues Data archive Bibliometrics Data archive Publishers Data archive Data archive …
ICSU WDS perspective Web of Knowledge Google Scholar Scopus Catalogues Certified Data Archives Registries Journals Bibliometric Services ICSU WDS Crossref DataCite ORCID CrossData Thomson Reuters Citation Indexes
WDS Certification & accreditation • Trustworthinessof WDS data holders and service providers • Evaluation criteria: based on a compilation of international standards and best practices • Certification authority: WDS Scientific Committee 2014/03: 75 members
WDS/RDA WGs and IGs • Publishing workflows • Publishing Services • Incentives (Bibliometrics) • Trustedrepositories & services • Costcompensationmodels e-Infrastructures Fitness ofuse Scientific research projects Total volumeofscientificdata
Someconclusions • Publishing datagivesbenefittoprovidersandhassignificantimpact on dataquality. • „Fitness ofuse“ is an importantaspectofdataqualityand a prerequisiteforintegratingdatafrom different sources. • Certificationiskeyfortheevaluationofthequalityofservicesanddata. • Scalableservicesareneededtoembeddatapublicationsintothecurrentscholarlypublishingsystem