190 likes | 372 Views
Linking Data from ScienceDirect Articles. Presented by: IJsbrand Jan Aalbersberg Hannover, DataCite Meeting Date: June 8, 2010. Linking to & from Data from & to ScienceDirect Articles. Presented by: IJsbrand Jan Aalbersberg Hannover, DataCite Meeting Date: June 8, 2010.
E N D
Linking Data fromScienceDirect Articles Presented by: IJsbrand Jan Aalbersberg Hannover, DataCite Meeting Date: June 8, 2010
Linking to & from Data from & to ScienceDirect Articles Presented by: IJsbrand Jan Aalbersberg Hannover, DataCite Meeting Date: June 8, 2010
Linking Data in ScienceDirect • The Past • Supplementary data • Entity links to databases • The Present • Some considerations • PANGAEA-type linking • A Future • Getting even closer connected
The Past (supplementary data) • Raw research data delivered as supplementary data • Available for limited number of data set types / formats • Data distributed over multiple articles and publishers • Format frozen in time – not maintained for preservation • Only available for smaller data sets (at most few 10 MBs) • Limited access due to use of existing publishing platforms • Data and article remain nicely coupled / packaged • Supplementary data always being peer-reviewed
The Past (entity linking - manual) • Authors manually identify (and tag) entities that are mentioned in articles and of which associated data is present (or registered) in databases, like GenBank, MINT, Uniprot, PDB, CCDC, ... • Very accurate and unambiguous • However, requiring author effort • Publisher takes care of actual linking • Reciprocal linking usually taken care of
The Past/Present (entity linking – automatic) • Sometimes automatically (e.g., NextBio and Reflect) • Easily extendable to new / other entities • Works retrospectively on older content • Does create recall / precision errors
The Present (some considerations) • STM, “Brussels Declaration”, June 2006: • “... believe that, as a general principle, data sets, raw data outputs of research, and sets or subsets of that data should wherever possible be made freely accessible ...” • Data sets should be freely accessible – at publisher? • Scientists prefer independent data repositories • Need for single domain-specific coordination • Huge costs for maintenance and preservation • Proper deposit mechanism needed • Through publisher? Extra overhead vs. ease of use • Enforcing deposit prior to publication • If community-supported, surely a possibility • Data set standardization is needed for optimal use
The Present (more considerations) • Scientist needs the combination of formal publication record and the raw data sets • To get optimal interoperability, close collaboration between publisher and data set repositories needed • Publisher should “enable and support” raw data sets • Submission: enforce if supported by community • Discoverability: interconnect article with data sets • Reciprocal linking at deepest level possible • PANGAEA-type linking • Data feeds from publisher to repositories? • Managing large amount of data set repositories? • DataCite as single discussion partner
The Present (PANGAEA linking) • Author submits article to publisher • Author submits data set to repository • At article publication, repository links article DOI to associated data set DOI, creating actual connection • User sees link to ScienceDirect from PANGAEA • User sees link to PANGAEA from ScienceDirect: SD Article SD Server articles USER PANGAEA Server data + associations link
A Future (tighter interoperability) • Not just a link to / from data and journal article • But provide integrated experience for scientist • Single page (environment) with data and article SD Article SD Server articles USER Supplementary Data Server data sets
A Future (tighter interoperability) • Not just a link to / from data and journal article • But provide integrated experience for scientist • Single page (environment) with data and article • Some users prefer it other way around; so also offer: Data Set Data Set Server data sets USER Article Server articles
A Future (inline supplementary data) • Structures submitted as supplementary data files (MOL files) • Displayed inline through Reaxys application / service
Linking to & from Data from & to ScienceDirect Articles Presented by: IJsbrand Jan Aalbersberg Hannover, DataCite Meeting Date: June 8, 2010
Creating the best User Experienceby integrating Data with Articles Presented by: IJsbrand Jan Aalbersberg Hannover, DataCite Meeting Date: June 8, 2010
Creating the best User Experienceby integrating Data with Articlesrequires close collaboration between data set repositories and publishers Presented by: IJsbrand Jan Aalbersberg Hannover, DataCite Meeting Date: June 8, 2010