420 likes | 587 Views
Integrate external services in DSpace submission process. How to make self-deposit easy and improve metadata quality and presence of full-text Andrea Bollini – Susanna Mornati. Topics. Some context: CINECA a brief overview DSpace as part of a CRIS solution.
E N D
Integrate external services in DSpace submission process How to make self-deposit easy and improve metadata quality and presence of full-text Andrea Bollini – Susanna Mornati
Topics • Some context: • CINECA a brief overview • DSpace as part of a CRIS solution • Integration of external services: • Bibliographic database: Scopus, PubMed, CrossRef, ArXiv, etc. • Publishers policy: Sherpa/Romeo • Make the repository an active actor: • Discovering missing content • Improve Fulltext presence www.cineca.it | Integrate externalservices in DSpacesubmissionprocess | OR2013| July 2013
The Company as last week! • InteruniversityConsortium • No-Profit • Founded in 1969 • Headquarter in Bologna • 57 Members • 54 Universities • 2 Researchinstitutes • MIUR • Owned companies: Kion, SCS. • Employees: 400 (+150 Kion) • Total turnover: 70M€ www.cineca.it | Integrate externalservices in DSpacesubmissionprocess | OR2013| July 2013
The Merge • The “mergingprocess” of the threeItalianConsortiastarted in September2012 • Itwasconcluded in July1st 2013 (last week!) • 67 Members • More than 700 employees (+ 150 Kion) • The onlyItalianInteruniversityConsortium 2.0 www.cineca.it | Integrate externalservices in DSpacesubmissionprocess | OR2013| July 2013
What CINECA does www.cineca.it | Integrate externalservices in DSpacesubmissionprocess | OR2013| July 2013
Howwe work withUniversities • CinecaBoardofDirectors • UniversityCustomers • Focus Groups • UniversityCustomers • CinecaTechnicalBoard Tech Road Map Apps Road Map U-GOV & SURplusRestrictedBoard Requirements Requirements ProductManagers Board CustomerService Board Technical& Delivery Board www.cineca.it | Integrate externalservices in DSpacesubmissionprocess | OR2013| July 2013
Authentication Solutionsfor HE = ERP = Best ofBreed AU Gateway GW www.cineca.it | Integrate externalservices in DSpacesubmissionprocess | OR2013| July 2013
SURplus: CINECA’ CRIS System • An interoperableinfrastructuremadeofdifferentcomponents • Ingestion of data from any legacy systems adopted by an institution • Maintenance of specific functional requirements, data model and preferred technologies at the level of applications • Data warehouse and Business Intelligence tools to facilitate aggregations of data and the application of measurement parameters and algorithms www.cineca.it | Integrate externalservices in DSpacesubmissionprocess | OR2013| July 2013
SURplus: Dimension • Beginningofactivities: 2004 • 9 institutions • 22 institutionalrepositories • Total modules: 77 www.cineca.it | Integrate externalservices in DSpacesubmissionprocess | OR2013| July 2013
Topics • Some context: • CINECA a brief overview • DSpace as part of a CRIS solution • Integration of external services: • Bibliographic database: Scopus, PubMed, CrossRef, ArXiv, etc. • Publishers policy: Sherpa/Romeo • Make the repository an active actor: • Discovering missing content • Improve Fulltext presence www.cineca.it | Integrate externalservices in DSpacesubmissionprocess | OR2013| July 2013
DSpace:SURplus’ Open Archive Module CINECA is a registered service provider at DuraSpace Long-termcollaboration with DSpace community, since 2003 The OA Module, developed on DSpace: • Managescollection and disseminationofresearchresults • Simplifies data collection’s processes • Service Integration Upgrades are periodicallyreleasedto the open source community www.cineca.it | Integrate externalservices in DSpacesubmissionprocess | OR2013| July 2013
DSpace-CRIS:SURplus’ Expertise & Skills DSpace-CRIS: designedtogether with the Hong Kong University & releasedas open-source “disseminationofentities’ descriptions in the researchenvironmentwhich go beyondpublications” www.cineca.it | Integrate externalservices in DSpacesubmissionprocess | OR2013| July 2013
IR as part of a CRIS system: whatchange? Professional support HA infrastructure Dedicated team • Benefits: • Strong deposit mandate • More funding advocacy • Issues to mitigate: • IR become a critical application • Author have a “requirements” perception • Wasting time • Late submission The information already exists in other database! Make the submission process easy www.cineca.it | Integrate externalservices in DSpacesubmissionprocess | OR2013| July 2013
Topics • Some context: • CINECA a brief overview • DSpace as part of a CRIS solution • Integration of external services: • Bibliographic database: Scopus, PubMed, CrossRef, ArXiv, etc. • Publishers policy: Sherpa/Romeo • Make the repository an active actor: • Discovering missing content • Improve Fulltext presence www.cineca.it | Integrate externalservices in DSpacesubmissionprocess | OR2013| July 2013
New first submissionstep Free searchform Available providers: each provider is a spring service Mainmetadata common to allpublicationtypes (article, book, etc.) Title of the contribution Year Authors/Editors www.cineca.it | Integrate externalservices in DSpacesubmissionprocess | OR2013| July 2013
New first submissionstep Lookup by uniqueidentifier Each provider declareswhichidentifiersisable to manage www.cineca.it | Integrate externalservices in DSpacesubmissionprocess | OR2013| July 2013
New first submissionstep For eachresult providers are shown that match the record. Groupingisdone via DOI www.cineca.it | Integrate externalservices in DSpacesubmissionprocess | OR2013| July 2013
Modal box publicationdetails Records from different providers are merged to getrichermetadata The systemguesses a collection for the submissionbut the user can changeitifrequired www.cineca.it | Integrate externalservices in DSpacesubmissionprocess | OR2013| July 2013
Manual submission Whenlookupfails the user can alwaysproceedmanually www.cineca.it | Integrate externalservices in DSpacesubmissionprocess | OR2013| July 2013
Batch import from external source Import data (identifiers or structured text) can be inputedmanually or uploadedas a file Format/provider must be specified by the user www.cineca.it | Integrate externalservices in DSpacesubmissionprocess | OR2013| July 2013
Batch import from external source • Request are processed: • Inline for specific providers and/or within configured data limits Submitter can immediately complete the pre-filled submissions • In a background process • Submitter will receive a summary email with import result • Pre-filled submissions are available as in-progress submission in the MyDSpace The legacy batch import feature for JSPUI hasbeenalreadysharedas pull request on GitHub, seeDS-1252 www.cineca.it | Integrate externalservices in DSpacesubmissionprocess | OR2013| July 2013
EnhancedDescribestep: showingmetadata source www.cineca.it | Integrate externalservices in DSpacesubmissionprocess | OR2013| July 2013
WGET http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi ?db=pubmed&id=23297105&retmode=xml&rettype=full PubMedLookup Provider Mapping file Split, aggregate fields Derive data ISSN Journal title … public classPubmedItem { private StringpubmedID; private Stringdoi; private Stringissn; private Stringeissn; private StringjournalTitle; private Stringtitle; private StringpubblicationModel; private Stringyear; private String volume; private Stringissue; private Stringlanguage; private List<String> type; private List<String> primaryKeywords; private List<String> secondaryKeywords; … JAVA Bean PubMed record <bean name="pubmedLookupProvider" class=“...lookup.PubmedLookupProvider"> <property name="pubmedService" ref="pubmedService"/> </bean> implements SubmissionLookupProvider Mapping file Enhancer plugins arXivLookup Provider JAVA Bean <bean name="pubmedService" class=“...service.PubmedService"/> Technical details DSpace Item arXiv record Translationlogic original normalized Normalized record Translationlogic Normalized Repository Mapping file public class PubmedLookupProvider extends ConfigurableLookupProvider public abstract class ConfigurableLookupProvider … ScopusLookup Provider JAVA Bean Scopus record Mapping file www.cineca.it | Integrate externalservices in DSpacesubmissionprocess | OR2013| July 2013
Topics • Some context: • CINECA a brief overview • DSpace as part of a CRIS solution • Integration of external services: • Bibliographic database: Scopus, PubMed, CrossRef, ArXiv, etc. • Publishers policy: Sherpa/Romeo • Make the repository an active actor: • Discovering missing content • Improve Fulltext presence www.cineca.it | Integrate externalservices in DSpacesubmissionprocess | OR2013| July 2013
Enhanced upload step • Using the ISSN or EISSN provided in the describestep • the upload formisimprovedshowing on the right side the publisher policy from the Sherpa/Romeo database www.cineca.it | Integrate externalservices in DSpacesubmissionprocess | OR2013| July 2013
Enhanced upload step Access policy for the bitstream: Open access, embargo, intranet, etc. Deposit of fulltext to the national database for individual CVs www.cineca.it | Integrate externalservices in DSpacesubmissionprocess | OR2013| July 2013
Topics • Some context: • CINECA a brief overview • DSpace as part of a CRIS solution • Integration of external services: • Bibliographic database: Scopus, PubMed, CrossRef, ArXiv, etc. • Publishers policy: Sherpa/Romeo • Make the repository an active actor: • Discovering missing content • Improve Fulltext presence www.cineca.it | Integrate externalservices in DSpacesubmissionprocess | OR2013| July 2013
Whatis the problem? • (very) late submissions produce some issues for the repository both at technical and organization level: • The system is subjected to periods of intense input activities. DSpace, but in general IR software, scaleswell for readoperationslesswell for writeoperations • IR staff involved in workflowgetlot of task to perform in small period Getresearcheraware Remindresearcherabout IR presence Interceptearly new content www.cineca.it | Integrate externalservices in DSpacesubmissionprocess | OR2013| July 2013
How weplan to mitigate the problem? • Citation databases provide APIs to perform search (we already use them for the lookup) and in some cases they provide additional APIs or searchfilters/indexes to make more raffinatedsearch and allow scanning of the database. • The interestingfilters/indexes are: • Time based (muchbetterifrelated to insertion in the citation database) • Author ID (betterifrelated to a «standard/common» identifieras ORCID) • Affiliation • Subjectcategory www.cineca.it | Integrate externalservices in DSpacesubmissionprocess | OR2013| July 2013
Implementation idea • Allow the researcher to store personal preferencesabout scanning: • Enabled providers (e.g disable arXiv if you are not a physicist) • Frequencies • Subjectcategoriesfilters • AuthorIDswill be stored/retrieved from the Researcherprofile. • Subjectcategoriescould be proposed from previousitems or researcherprofile. www.cineca.it | Integrate externalservices in DSpacesubmissionprocess | OR2013| July 2013
DSpace-CRIS: Researcherprofile www.cineca.it | Integrate externalservices in DSpacesubmissionprocess | OR2013| July 2013
Who are the potential targets? • ORCID • Scopus • Web of Science • arXiv • PubMed Central • DBLP • REPEC The Repositoryitself! www.cineca.it | Integrate externalservices in DSpacesubmissionprocess | OR2013| July 2013
The repositoryas source of missingcontent? • The submitter has to match authors of publication with the University staff to higthlight internal authors • Sometimesmatches are missing • Othertimesmatches are wrong (homonymous) • Externalauthorscouldbecome «internal» at some point in the future www.cineca.it | Integrate externalservices in DSpacesubmissionprocess | OR2013| July 2013
The repositoryas source of missingcontent? • Send email to internal «co-authors» when a submissionisdone preventwrongattribution (and reduce duplication) • Allowresearcher to unclaimpublications from herprofile last chance to fixwrongattribution • Allowresearcher to claimpublications fixmissingattribution and/or engagement of new researcher The last twofeatures are included in the DSpace-CRISaddon www.cineca.it | Integrate externalservices in DSpacesubmissionprocess | OR2013| July 2013
Currentimplementation: claim/unclaimpublications in the repository You can claimit A Active, simpleclaim S Makeit a selectedpublication H Claimitbuthide from you public profile Thisis the current status of the publication U Unlinked www.cineca.it | Integrate externalservices in DSpacesubmissionprocess | OR2013| July 2013
Currentimplementation: claim/unclaimpublications in the repository You can unclaim a publication U Unlink www.cineca.it | Integrate externalservices in DSpacesubmissionprocess | OR2013| July 2013
Currentimplementation: claim/unclaimpublications in the repository www.cineca.it | Integrate externalservices in DSpacesubmissionprocess | OR2013| July 2013
Topics • Some context: • CINECA a brief overview • DSpace as part of a CRIS solution • Integration of external services: • Bibliographic database: Scopus, PubMed, CrossRef, ArXiv, etc. • Publishers policy: Sherpa/Romeo • Make the repository an active actor: • Discovering missing content • Improve Fulltext presence www.cineca.it | Integrate externalservices in DSpacesubmissionprocess | OR2013| July 2013
Improvefulltextpresence • Use the Sherpa/Romeo policy database to analyzerepositorycontent • Use external database API to find an actualfulltext (arXiv, pubmed, ...whynot the publisherversion via librarysubscription?) • Send email to researcher to validate foundPDFs or ask for an «author» versions • Use statistics to encourage upload www.cineca.it | Integrate externalservices in DSpacesubmissionprocess | OR2013| July 2013
Sherpa/Romeo Statistics (Example) 51% ISSN 36% Not in Sherpa 24.000 items 32% green 21.000 items 7,3% have a fulltext… 5,3% open access www.cineca.it | Integrate externalservices in DSpacesubmissionprocess | OR2013| July 2013
SURplus: prevision 2014 • 50+ institutionalrepositories (DSpace) • 10 researchportals (DSpace-CRIS) www.cineca.it | Innovative Open Source Technologies for a CRIS: SURplus | euroCRIS | May 2013
Thank you! Andrea Bollini a.bollini@cineca.it • SURplus - http://www.cineca.it/en/content/surplus • DSpace-CRIS - http://cilea.github.com/dspace-cris