200 likes | 307 Views
Implementing Reference Linking in PROLA. Mark Doyle Manager, Product Development The American Physical Society http://prola.aps.org/. The American Physical Society. 40,000+ members Founded in 1898 Mission: “diffusion and advancement of knowledge of physics”
E N D
Implementing Reference Linking in PROLA Mark Doyle Manager, Product Development The American Physical Society http://prola.aps.org/ CrossRef - Boston, MA
The American Physical Society • 40,000+ members • Founded in 1898 • Mission: “diffusion and advancement of knowledge of physics” • Publisher of Physical Review journals and Reviews of Modern Physics • 14,500 articles per year (100,000 pages per year) CrossRef - Boston, MA
What is PROLA? • Physical Review Online Archive • Covers all APS journals from 1893-present, but only 1893-1998 available • Separate subscription from current content journals • 1 year “migrated” each year • APS corpus is 330,000 articles CrossRef - Boston, MA
The Basic Problem • References in an article’s bibliography needs to linked to the full text article • Citation metadata given: author, journal, volume, page (or other enumeration) • Identify metadata, query linking partners, store results, create links for end users • Keep links up to date, keep system robust and fast, keep costs low CrossRef - Boston, MA
Three General Approaches • Static - query for links at time of publication, create a static HTML file with the appropriate links, serve that. • Dynamic - Store linking information in live database which is queried at the time the user requests the web page • Semi-dynamic - Pre-query links, update them periodically, generate HTML with links dynamically CrossRef - Boston, MA
Semi-Dynamic Approach • Lower investment in database technology • Lower costs to mirror • Fast for the user • High availability • Scales well with usage CrossRef - Boston, MA
APS Process Overview CrossRef - Boston, MA
XML File <references> …. <citation cid="C3"><ref><article><refauth>J. J. Boland</refauth>, <journal>Phys. Rev. Lett.</journal> <volume>67</volume>, <pages>1539</pages> (<date>1991</date>);</article></ref> <ref abbrev="prevau"><article><refauth>J. J. Boland</refauth> , <journal>J. Vac. Sci. Technol. A</journal> <volume>10</volume>, <pages>2458</pages> (<date>1992</date>).</article></ref></citation> ….. CrossRef - Boston, MA
Process Overview CrossRef - Boston, MA
Parse XML Bibliographic Record • Parse XML tagged references • Article’s DOI suffix becomes the primary key • Journal, volume, page information becomes a reference ID (J. Vac. Sci. Technol. A 10, 2458 gets mapped to JVacSciTechnolA.10.2458) • Table for DOI, reference id, citation number, reference number • Second table with article metadata for querying process. CrossRef - Boston, MA
Database Schema • ARTICLES (Phys. Rev. DOI, citation number, reference number, reference id) • ARTICLE_DATA (ref_id, first author, journal, volume, issue, enumeration, year) • ARTICLE_LINKS (ref_id, link type, link data) • QUERY_DATES (ref_id, link type, query date). CrossRef - Boston, MA
Query CrossRef and others • Nightly query of CrossRef for new references that don’t have DOI • Track batches in a Scheduler application • Table tracks link source (XREF, ADS, CAS, SPIN, INSPEC), linking data (DOI for XREF) for each reference ID. • Query dates table to track when we last queried something that didn’t match • Periodically rerun queries which haven’t matched CrossRef - Boston, MA
Links in the Database SQL> select link_type,link_data from article_links where ref_id='JVacSciTechnolA.10.2458'; LINK_TYPE LINK_DATA --------- ------------------------------ XREF 10.1116/1.577984 INSPEC JVTAD600001000000400245800000B SPIN JVTAD6000010000004002458000001 ADS 1992JVST...10.2458B CAS 1:CAS:528:DyaK38XltlygtLg%3D CrossRef - Boston, MA
Statistics • 330,000 articles (1893-present) • 6.4 million (journal) references • 3 million Phys. Rev. references • 1.4 million unique non-APS references • 210,000 CrossRef links (1.8 million links total) • Folding in the APS references which are also in CrossRef, about 30% of our references are in CrossRef CrossRef - Boston, MA
Process Overview CrossRef - Boston, MA
XML Linking File <?xml version="1.0"?> <apslinks> <citlink cid="1" rid="1"> <link ref_id="PhysRevLett.62.567” type="APS">PhysRevLett.62.567</link></citlink> … <citlink cid="3" rid="2"> <link ref_id="JVacSciTechnolA.10.2458" type="XREF">10.1116/1.577984</link> <link ref_id="JVacSciTechnolA.10.2458" type="INSPEC">JVTAD600001000000400245800000B</link> ….</apslinks> CrossRef - Boston, MA
Process Overview CrossRef - Boston, MA
Rendered Links CrossRef - Boston, MA
Conclusions • Simple and pragmatic solutions work • Marked up content makes it all fit together (obviates the need for extensive labor) • Modest resources are needed to implement and maintain the system • Scheme is easily expanded to include other linking targets CrossRef - Boston, MA
Contact information • http://prola.aps.org/ • doyle@aps.org CrossRef - Boston, MA