290 likes | 382 Views
DOIs and the Secondary Publisher; a match made in heaven?. Andrea Powell Product Development Director CABI Publishing. It is a truth universally acknowledged…. ….that a secondary database in possession of millions of bibliographic references, must be in want of a linking solution
E N D
DOIs and the Secondary Publisher; a match made in heaven? Andrea Powell Product Development Director CABI Publishing THE SCIENCE THE SEARCH THE SOLUTIONwww.cabi-publishing.org
It is a truth universally acknowledged…. ….that a secondary database in possession of millions of bibliographic references, must be in want of a linking solution (with apologies to Jane Austen) THE SCIENCE THE SEARCH THE SOLUTIONwww.cabi-publishing.org
A bit about CABI Publishing • First publication in 1912 • Applied life sciences publisher • Database products at the heart of our publishing business (CAB Abstracts and Global Health) • Primary journals and books now account for 30% of turnover • Total turnover approx. £12 million THE SCIENCE THE SEARCH THE SOLUTIONwww.cabi-publishing.org
Some facts and figures • CAB Abstracts (1973-2004) contains 4.5 million bibliographic references • Our Archive (1912-1972) adds a further 2.2 million references • Our acquisitions database lists 9000 active publishers from whom we receive content • We receive about 7500 serials in any one year, from over 125 countries in over 50 languages THE SCIENCE THE SEARCH THE SOLUTIONwww.cabi-publishing.org
Oh, and not forgetting... … we also cover books, conference proceedings, technical bulletins, “grey literature”, websites, annual reports, theses…… (approx. 18% of total) THE SCIENCE THE SEARCH THE SOLUTIONwww.cabi-publishing.org
So what do we do? • Create a consistently indexed, standardised, searchable database to enable the discovery of this rich content • And then link the user to the full-text as seamlessly as possible THE SCIENCE THE SEARCH THE SOLUTIONwww.cabi-publishing.org
DOIs and CrossRef - a heaven-sent solution? • Universal, multi-publisher protocol • Cost-effective, although concerns at the beginning about escalation of look-up fee costs • Hurray! THE SCIENCE THE SEARCH THE SOLUTIONwww.cabi-publishing.org
Adding DOIs to the database • Creation of new field within Production Database • Development and implementation of new workflows to collect DOIs at most appropriate stage of our process • Matching our serials list against the CrossRef Metadata Database THE SCIENCE THE SEARCH THE SOLUTIONwww.cabi-publishing.org
Looking up DOIs - the early days • In early 2002, we were able to achieve 4% matching rates (ought to have been 18%) • Reasons for poor match rate:- timing of deposits- poor quality data- rigid matching algorithm- mis-match between our records and retrieved metadata THE SCIENCE THE SEARCH THE SOLUTIONwww.cabi-publishing.org
Our DOI look-up and implementation process • Two methods:- weekly look-up- twice-yearly batch look-up THE SCIENCE THE SEARCH THE SOLUTIONwww.cabi-publishing.org
Weekly look-up • Automated system built into our weekly mechanism for transferring records from our production system to our live database • Manual option to re-run this stage is also available if necessary • Records with no DOI value but with ISSN selected and extracted into a processing list THE SCIENCE THE SEARCH THE SOLUTIONwww.cabi-publishing.org
Weekly look-up • Each field is processed to replace CABI-specific formatting with URL-safe coding • Single query string constructed from the data from 50 records • Each new query added to the string, separated from neighbour using URL-safe line feed “%0A” • Approximately 3800 look-ups per week THE SCIENCE THE SEARCH THE SOLUTIONwww.cabi-publishing.org
Weekly look-up • We use a piped format:SN|DO|AU|VL|NO|PP|YR||PA*(*PA is our unique identifier) • Query string sent to CrossRef via web link: "http://doi.crossref.org/servlet/query?&usr=cabi&pwd=crpw1683&type=q&area=live&fuzzy=true&format=piped&qdata=SN|DO|AU|VL|NO|PP|YR||PA|%0A SN|DO|AU|VL|NO|PP|YR||PA|” THE SCIENCE THE SEARCH THE SOLUTIONwww.cabi-publishing.org
DOI Assignment • Web feed returned and converted into text file, which is processed to extract out individual queries • Each query then processed to recover the PAN (unique ID) and DOI data • PAN matched back to our database and DOI data embedded in record • BINGO! THE SCIENCE THE SEARCH THE SOLUTIONwww.cabi-publishing.org
Twice-yearly batch look-up • Entirely manual process, using text files and e-mails • Look-up process much the same, but date range added to selection process • Piped query strings output in batches of 1000 and prefixed with a CABI e-mail address • Each file of 1000 queries uploaded via CrossRef website • Results returned via e-mail and processed to extract PAN and DOI THE SCIENCE THE SEARCH THE SOLUTIONwww.cabi-publishing.org
Looking up DOIs - these days • Now consistently achieving 25-30% matching rate • Backfile look-ups are even better, at 40% • But how frequently should we add DOIs to our backfile - is twice a year enough? • Not yet querying for Books or Conferences, but plan to soon THE SCIENCE THE SEARCH THE SOLUTIONwww.cabi-publishing.org
Getting DOIs to the customer • A&I databases are typically delivered via a number of third parties, e.g. Ovid, ISI, EBSCO, Dialog…. • It’s taken until late 2004 for some vendors to implement DOIs in our database • Not all vendors use DOIs for linkage, preferring their proprietary systems THE SCIENCE THE SEARCH THE SOLUTIONwww.cabi-publishing.org
Other ways of linking to full-text • 40% is good, but that still leaves a lot of unmatched references! • User demand is for more and more full-text linkage - “good enough” generation won’t pursue non-linking items • Customers can tailor their own links with Link Resolvers • CAB Direct provides a default linking solution for subscribers without a Link Resolver THE SCIENCE THE SEARCH THE SOLUTIONwww.cabi-publishing.org
Digital Archives • Many primary and secondary publishers now digitising their archives • CAB Abstracts archive adds 2.2 million references, back to 1912 • Full-text linking more difficult with incomplete references, no ISSNs (pre-database era), lack of digital originals • Issue of timing again writ large! THE SCIENCE THE SEARCH THE SOLUTIONwww.cabi-publishing.org
The bigger picture • Researchers still use secondary databases heavily in their resource discovery processes • The amount of material to be indexed increases year by year • Secondary databases have to keep pace with changes in scholarly communication • We must put our content where the users are, not the other way round THE SCIENCE THE SEARCH THE SOLUTIONwww.cabi-publishing.org
Thank you Andrea Powell a.powell@cabi.org THE SCIENCE THE SEARCH THE SOLUTIONwww.cabi-publishing.org