210 likes | 402 Views
Reference Linking in Project Euclid. …with some thoughts on the preservation of digital collections. A presentation at the Workshop on Linking and searching in distributed digital libraries University of Michigan, Ann Arbor, University Library March 19, 2002 William R. Kehoe
E N D
Reference Linking inProject Euclid …with some thoughts on the preservation of digital collections. A presentation at theWorkshop on Linking and searching in distributed digital libraries University of Michigan, Ann Arbor, University Library March 19, 2002 William R. Kehoe wrk1@cornell.edu Digital Library and Information Technologies Cornell University Library
Overview • Context – what is Project Euclid? • Requirements – the constraints for the reference linking system • Implementation – some design views • Next Steps – our plans for the future • Preservation – thinking long-term about digital collections William R. Kehoe, Digital Library and Information Technology, Cornell University Library
What is Project Euclid? • A partnership of independent publishers of mathematics and statistics journals • Publishers provide born-digital versions of their print journals. • http://projecteuclid.org William R. Kehoe, Digital Library and Information Technology, Cornell University Library
Reference Linking: two viewpoints • The publisher’s point of view • Links to multiple resources add value to the electronic version. • MR numbers, CrossRef DOIs, web links are included in the reference when we find them • The library’s point of view • The appropriate copy problem—does a link lead to a copy for which the library has viewing/distribution rights. • Is the copy an authentic representation of the original? • Project Euclid represents publishers William R. Kehoe, Digital Library and Information Technology, Cornell University Library
Purpose References in article files are made available as links on HTML abstract pages <<PDF>> <<HTML>> William R. Kehoe, Digital Library and Information Technology, Cornell University Library
Requirements • Automatic processing • Extensibility to multiple reference styles • Extensibility to multiple input formats • Low-cost maintenance • High accuracy William R. Kehoe, Digital Library and Information Technology, Cornell University Library
<<PDF>> • Title • Author and affiliation • Abstract goes here • Body • References <<XML>> • Title • Author and affiliation • Abstract goes here • Body • References Implementation Conversion Look-up Extraction Creating Links Parsing Storing William R. Kehoe, Digital Library and Information Technology, Cornell University Library
<<PDF>> <<Text>> • Title • Author and affiliation • Abstract goes here • Body • References • Title • Author and affiliation • Abstract goes here • Body • References Conversion The converter is Derek Noonberg’s “pdftotext” utility. http://www.foolabs.com/xpdf/home.html Converter William R. Kehoe, Digital Library and Information Technology, Cornell University Library
Conversion/Extractionactivity diagram William R. Kehoe, Digital Library and Information Technology, Cornell University Library
Extraction A fragment of the perl module that extracts the references from the text version of an article William R. Kehoe, Digital Library and Information Technology, Cornell University Library
Parsing Method Factory getMRNum() getYear() getDOI() getTitle() getJournal() … more … Object view Reference MRNum LinkedString String Year DOI Title Journal William R. Kehoe, Digital Library and Information Technology, Cornell University Library
Parsing Each element of a Reference is extracted by a subroutine customized for how the element appears in a particular journal style. William R. Kehoe, Digital Library and Information Technology, Cornell University Library
Look-up Query • |IEEE Trans. Automat. Control|chang||||1994||||Stability, queue length and delay of deterministic and stochastic queue • |SIAM J. Control Optim.|Dupuis||||1989|||| Result set • 0018-9286|IEEE Trans. Automat. Control|Chang|39|5|913|1994|||95b:90029|Stability, queue length, and delay of deterministic and stochastic queueing networks. • |SIAM J. Control Optim.|Dupuis||||1989|||| William R. Kehoe, Digital Library and Information Technology, Cornell University Library
Link Creation An HTML anchor tag is inserted into the reference string and saved to an XML file. The User Interface module later uses the linkedString element when creating an Article Abstract page on the fly. It doesn’t have to know how to create the link. • <string>[Ar] V. ARNOLD , A-graded algebras and continued fractions, Comm. Pure Appl. Math. 42 (1989), 9931000.</string> • <linkedString>[Ar] V. ARNOLD , A-graded algebras and continued fractions, Comm. Pure Appl. Math. 42 (1989), 9931000. <a href="http://www.ams.org/mathscinet-getitem?mr=90h:32025" target="_blank">MR 90h:32025</a></linkedString> William R. Kehoe, Digital Library and Information Technology, Cornell University Library
Storing <referenceList> <reference> <refString></refString> <linkedString></linkedString> <title></title> <journal></journal> … more elements … </reference> <reference> … elements … </reference> </referenceList> Stored as an XML file William R. Kehoe, Digital Library and Information Technology, Cornell University Library
Display An element in an xml file provides… …an HTML link on the article’s abstract page … … which links to a MathSciNet page William R. Kehoe, Digital Library and Information Technology, Cornell University Library
Next Steps • More journals • Adding DOIs to the abstract page • Conversion from LaTeX files • Digitized back issues William R. Kehoe, Digital Library and Information Technology, Cornell University Library
Addendum on Digital Preservation • Libraries and others are considering ways to preserve our digital resources for the long term. • One possible solution is the LOCKSS system (Lots of Copies Keep Stuff Safe) • Another solution is to preserve the metadata needed to describe and reconstruct a collection while preserving and providing access to the data files. The Consultative Committee for Space Data Systems has published a Reference Model for an Open Archival Information System (OAIS). Many of the persons working with digital collections in the library and archive world are using this model to plan for long-term preservation. William R. Kehoe, Digital Library and Information Technology, Cornell University Library
Archival Information Package Content Information PreservationDescriptionInformation * DataObject RepresentationInformation FixityInformation ContextInformation 1 << file >>Digital Object ReferenceInformation Provenance Information Archival Information Package From the Reference Model for an Open Archival Information System (OAIS) OAIS-compliant systems also contain the metadata objects in yellow Most digital collections contain some form of the objects in blue. William R. Kehoe, Digital Library and Information Technology, Cornell University Library
OAIS Functional Model From the Reference Model for an Open Archival Information System (OAIS) William R. Kehoe, Digital Library and Information Technology, Cornell University Library
For More Information… • Project Euclid—http://projecteuclid.org • MR Batch Lookup—http://www.ams.org/mrlookup-support/technical_help.html#http • Consultative Committee for Space Data Systems—http://www.ccsds.org • Reference Model for an Open Archival Information System (OAIS)—http://www.ccsds.org/documents/pdf/CCSDS-650.0-R-2.pdf • LOCKSS—http://lockss.stanford.edu William R. Kehoe, Digital Library and Information Technology, Cornell University Library