The Emerging Framework for Scholarly Communication

The Emerging Framework for Scholarly Communication Steve Hitchcock The Open Citation Project (OpCit), Southampton University These slides prepared for The Future of Journal Publishing at Nottingham University, 22 March 2002 OpCit is a joint JISC-NSF International Digital Libraries Project 1999-2002

Emerging framework: the hypothesis …Scholarly electronic information will be ‘seamless’ and ‘integrated’

Scholarly electronic information will be ‘seamless’ and ‘integrated’ The provable truth, using Google* “seamless integration of information” 500 results, mostly companies offering network and inter-application software

Scholarly electronic information will be ‘seamless’ and ‘integrated’ The provable truth, using Google* “seamless integration of information” 500 results, mostly companies offering network and inter-application software “seamless access to information” almost 1000 sites, portals and gateways to the fore

Scholarly electronic information will be ‘seamless’ and ‘integrated’ The provable truth, using Google* “seamless integration of information” 500 results, mostly companies offering network and inter-application software “seamless access to information” almost 1000 sites, portals and gateways to the fore “seamless linking” 450 sites, leading with journal publishers and databases * Results based on Google searches November 2001

What is “seamless integration”? From any given document the user might expect to be able to retrieve any related document within one mouse click. Typically what is related is defined, and linked, by the author or publisher or other service provider, and is constrained by the tools and information services at their disposal. Longer term the relation may be anything the user might consider to be related.

Is seamless integration possible for therefereedscholarly literature? For scholarly research papers - those destined for peer reviewed journal publication, by authors who have no intention of receiving direct payment for publication of the work they produce - this prospect raises two subsidiary questions about the ‘seamlessly integrated’ literature: Will it be complete (from the viewpoint of every user)? Will it be free (or appear to be free)?A work may appear to be free to the user when it is accessed via a library, for example. The refereed scholarly literature will need to be complete, everywhere, if seamless integration, even on a modest scale, is to be achieved.

Achieving seamless integration – Web services Emerging Web services standards are motivated by the need to connect business processes, especially databases, across the Web. The basic platform for Web services is XML plus HTTP, maintaining the ubiquity and simplicity of the Web. Web services are based on three mechanisms: to register a service (e.g. Web Service Definition Language, WSDL) to find a service (e.g. a registry such as Universal Description, Discovery, and Integration, UDDI) to communicate (e.g. Simple Object Access Protocol, SOAP) http://www.w3.org/2002/ws/ Digital library architectures are evolving to include Web services-like components, and may ultimately migrate to the emerging standards

Progress in libraries Site licenses for electronic journals, and more aggregated content from database services Alternative journals, e.g. support for the Scholarly Publishing & Academic Resources Coalition (SPARC), to increase competition in the journal market by facilitating partnerships with publishers and other journal producers Open Archives Initiative, interoperability standards to facilitate the efficient dissemination of content Fast-track standardization of OpenURL, to link users to these subscription and document services, recognising this vast new array of electronic content would need to be accessible and navigable by users within the library’s information environment

Site licences By licencing access to ‘bundled’ collections of e-journals, libraries can claim to have satisfied their objective of better value for money in terms of cost per page delivered to users. The ‘site’ from which users access content could be an institution, a state-wide group of institutions (e.g. OhioLINK), a national collective, such as in Canada, or even all the people of a nation, as in Iceland. The UK has the National Electronic Site Licence Initiative (NESLI), which brokers deals between publishers and participating institutions. The OhioLINK strategy: “Enablers rather than gatekeepers” OhioLINK claims to have overcome “the entrenched and limiting economic practices of vendors to individual institutions, and the library-imposed, self-limiting, collection development mentality of information rationing that pervades our community.”

Making appropriate connections Site licenses give libraries access to more journal titles. Another outcome of the serials crisis is that fewer, non-core journals are subscribed to and libraries have resorted to just-in-time document delivery and collections from licensed full-text aggregators. Library users may thus have authority to access a paper free of charge via one library service or another. This has become know as the ‘appropriate copy’ problem. OpenURL is a generalized framework for communicating and resolving links and supports software solutions to the appropriate copy problem. OpenURL is described as an ‘interoperability specification’.

Syntax of OpenURL http://(who you are, where you are, your institution)/(where you want to go) ABC An OpenURL is mediated by the HTTP protocol BASEURL, data about the user, typically inserted during transport between servers. One interim mechanism is to store the BASEURL as a cookie in the user’s browser. The cookie identifies the resolver that provides context-sensitive services for the user. QUERY, points to the referenced object, which might be an identifier, e.g. Digital Object Identifier (DOI) Metadata derived from an authored reference Partial metadata - a secondary service identifies the required document OpenURL has been proposed as a National Information Standards Organization (NISO) standard http://library.caltech.edu/openurl/

Example OpenURL architecture OpenURLs might be based on CrossRef–DOI services (from Beit-Arie et al., 2001, D-Lib Magazine, September) http://www.dlib.org/dlib/september01/caplan/09caplan.html

The Open Archives Initiative (OAI) The OAI (http://www.openarchives.org/) defines A Metadata HarvestingProtocol (MHP), an application-independent interoperability framework that can be used by a variety of communities engaged in publishing content on the Web Two classes of participants Data providers expose metadata about content Service providers issue protocol requests to data providers OAI is a very simple, low-barrier-to-entry interface, shifting implementation complexity and operational processing load away from the data repositories to the developers of federated search services, repository redistribution services, etc.

Digital library protocols: OAi MHP vs Z39.50? “We should not think about the world becoming partitioned between Z39.50-based resources and MHP-speaking resources, but rather about bridges and gateways. “A Z39.50-speaking server can fairly easily be made MHP-compliant, and I would expect to see the development of gateway or broker services that make Z39.50 servers available for open archives metadata harvesting in cases where the individual server operators do not want to undertake this development work.” Lynch (2001) ARL Bimonthly Report,No. 217, August http://www.arl.org/newsltr/217/mhp.html

OAI service providers: an example The Open Citation project: interposing an OAI service provider between document source and user interface

Creating information interfaces Portals have become important interfaces in the scholarly environment. Portal strategies by publishers (e.g. Elsevier’s ScienceDirect) by associated networked information services (e.g. Ingenta), by library resource discovery networks (e.g. JISC’s RDN) have yet to establish a pre-eminent model. This is because all have concentrated on content, mostly owned content. The best next-generation portals will build services on top of content, and for researchers will become the starting point for all lines of enquiry.

Information interfaces: RDN example JISC RDN is a good example of building on content to provide new services and adaptable interfaces. The individual subject networks, in medicine, engineering, humanities and others, can be searched as though they were one unified repository, and an interface presenting users with this search facility can be embedded in any library Web page. Guiding the implementation of these services is the JISC Information Environment (from Powell and Lyon 2001)http://www.ukoln.ac.uk/distributed-systems/dner/arch/dner-arch.html

Access and interfaces: implications for journals Digital information, rich in media and resources, formal and informal, mediated by multiple services, presents the user with an array of choices that might answer his or her queries most efficiently. Those queries might be expressed as input to a search engine, or by selecting a link. Where might these citations come from? Personal emails, discussion lists, open access services such as OAI, eprint archives, newsletters, library services, Z-gateways and academic subject portals, as well as formal research papers and commercial indexing services. There will be many more. The journal package has traditionally been bound in issues and volumes. With the advent of multiple networked sources mediated by services such as OpenURL, the binding has been unstitched.

What are digital journals for? Journals will be scaled back to the single essential function of quality control, in the form of managed peer review Access to journal contents will be mediated by multiple interfaces - open access services, portals and information interfaces, other than just the journal. Journals cannot remain the exclusive provider of peer-reviewed papers

Post-Google Electronic journals exist in a post-Gutenberg and a post-Google information environment By March 2001 the Internet Archive had stored 10 billion Web pages (100 terabytes of data) The ability to locate a specified item of information precisely and instantly among the mass of information available on the Web has profound implications. In the electronic environment the search engine has become the de facto interface to information, rather than the fragmented packages that have migrated from the print world.

Multiple cooperating services in the communication chain FROM Documents User interface http Server Client TO OpenURL, OAI, JISC IE, Site licenses, eprint archives, etc.

Building eprint archives EPrints.org software for building institutional eprint archives for author self-archiving Version 2.0 February 2002 OAI-compliant Free open source software Developed at the Electronics and Computer Science Department, University of Southampton http://www.eprints.org/

A maximising strategy for authors Authors who self-archive their papers in OAI-compliant institutional or discipline-based eprint archives will Maximise interfaces to their work Maximise access to their work Maximise impact of their work

Maximising access: arXiv example Decreasing citation latencies: The latency of the citation peak has been reducing over the period of the archive, i.e. each year papers are cited sooner and more often

Maximising impact: arXiv example More highly cited papers show higher and more sustained download frequencies

Maximising interfaces Measuring arXiv access and impact data: the Open Citation project has mined: Usage data from selected arXiv mirror server logs Reference lists from 155,000+ arXiv papers to build CiteBase, an open citation database • CiteBase, a new interface to the refereed literature http://citebase.eprints.org

Initiatives promoting open access to scholarly research papers Budapest Open Access Initiative (BOAI), funded by George Soros' Open Society Institute. Open access "gives readers extraordinary power to find and make use of relevant literature, and gives authors and their works vast and measurable new visibility, readership, and impact.” February 2002, has received almost 1800 signatories to date http://www.soros.org/openaccess/read.shtml Public Library of Science, scientists urge publishers to allow the research reports that have appeared in their journals to be distributed freely by independent, online public libraries of science. Open letter March 2001, received almost 30 000 signatories http://www.publiclibraryofscience.org/

“A dynamic digital archive” Scientists and researchers, Nobel Laureates among them, have produced the clearest declaration of their requirement for access to published research papers – a comprehensive collection that can be efficiently indexed, searched, and linked: “Unimpeded access to these archives and open distribution of their contents will enable researchers to take on the challenge of integrating and interconnecting the fantastically rich, but extremely fragmented and chaotic, scientific literature.” (Roberts et al. 2001)

Links and references Beit-Arie et al. Lynch Powell Gardner Sanville Frazier Roberts OAI BOAI OpenURL EPrints.org PLoS A copy of these slides will be found on the OpCit site http://opcit.eprints.org/. Look for Papers and Presentations Contact Steve Hitchcock sh94r@ecs.soton.ac.uk

The Emerging Framework for Scholarly Communication