1 / 20

Implementing Reference Linking in PROLA

Implementing Reference Linking in PROLA. Mark Doyle Manager, Product Development The American Physical Society http://prola.aps.org/. The American Physical Society. 40,000+ members Founded in 1898 Mission: “diffusion and advancement of knowledge of physics”

yamka
Download Presentation

Implementing Reference Linking in PROLA

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Implementing Reference Linking in PROLA Mark Doyle Manager, Product Development The American Physical Society http://prola.aps.org/ CrossRef - Boston, MA

  2. The American Physical Society • 40,000+ members • Founded in 1898 • Mission: “diffusion and advancement of knowledge of physics” • Publisher of Physical Review journals and Reviews of Modern Physics • 14,500 articles per year (100,000 pages per year) CrossRef - Boston, MA

  3. What is PROLA? • Physical Review Online Archive • Covers all APS journals from 1893-present, but only 1893-1998 available • Separate subscription from current content journals • 1 year “migrated” each year • APS corpus is 330,000 articles CrossRef - Boston, MA

  4. The Basic Problem • References in an article’s bibliography needs to linked to the full text article • Citation metadata given: author, journal, volume, page (or other enumeration) • Identify metadata, query linking partners, store results, create links for end users • Keep links up to date, keep system robust and fast, keep costs low CrossRef - Boston, MA

  5. Three General Approaches • Static - query for links at time of publication, create a static HTML file with the appropriate links, serve that. • Dynamic - Store linking information in live database which is queried at the time the user requests the web page • Semi-dynamic - Pre-query links, update them periodically, generate HTML with links dynamically CrossRef - Boston, MA

  6. Semi-Dynamic Approach • Lower investment in database technology • Lower costs to mirror • Fast for the user • High availability • Scales well with usage CrossRef - Boston, MA

  7. APS Process Overview CrossRef - Boston, MA

  8. XML File <references> …. <citation cid="C3"><ref><article><refauth>J. J. Boland</refauth>, <journal>Phys. Rev. Lett.</journal> <volume>67</volume>, <pages>1539</pages> (<date>1991</date>);</article></ref> <ref abbrev="prevau"><article><refauth>J. J. Boland</refauth> , <journal>J. Vac. Sci. Technol. A</journal> <volume>10</volume>, <pages>2458</pages> (<date>1992</date>).</article></ref></citation> ….. CrossRef - Boston, MA

  9. Process Overview CrossRef - Boston, MA

  10. Parse XML Bibliographic Record • Parse XML tagged references • Article’s DOI suffix becomes the primary key • Journal, volume, page information becomes a reference ID (J. Vac. Sci. Technol. A 10, 2458 gets mapped to JVacSciTechnolA.10.2458) • Table for DOI, reference id, citation number, reference number • Second table with article metadata for querying process. CrossRef - Boston, MA

  11. Database Schema • ARTICLES (Phys. Rev. DOI, citation number, reference number, reference id) • ARTICLE_DATA (ref_id, first author, journal, volume, issue, enumeration, year) • ARTICLE_LINKS (ref_id, link type, link data) • QUERY_DATES (ref_id, link type, query date). CrossRef - Boston, MA

  12. Query CrossRef and others • Nightly query of CrossRef for new references that don’t have DOI • Track batches in a Scheduler application • Table tracks link source (XREF, ADS, CAS, SPIN, INSPEC), linking data (DOI for XREF) for each reference ID. • Query dates table to track when we last queried something that didn’t match • Periodically rerun queries which haven’t matched CrossRef - Boston, MA

  13. Links in the Database SQL> select link_type,link_data from article_links where ref_id='JVacSciTechnolA.10.2458'; LINK_TYPE LINK_DATA --------- ------------------------------ XREF 10.1116/1.577984 INSPEC JVTAD600001000000400245800000B SPIN JVTAD6000010000004002458000001 ADS 1992JVST...10.2458B CAS 1:CAS:528:DyaK38XltlygtLg%3D CrossRef - Boston, MA

  14. Statistics • 330,000 articles (1893-present) • 6.4 million (journal) references • 3 million Phys. Rev. references • 1.4 million unique non-APS references • 210,000 CrossRef links (1.8 million links total) • Folding in the APS references which are also in CrossRef, about 30% of our references are in CrossRef CrossRef - Boston, MA

  15. Process Overview CrossRef - Boston, MA

  16. XML Linking File <?xml version="1.0"?> <apslinks> <citlink cid="1" rid="1"> <link ref_id="PhysRevLett.62.567” type="APS">PhysRevLett.62.567</link></citlink> … <citlink cid="3" rid="2"> <link ref_id="JVacSciTechnolA.10.2458" type="XREF">10.1116/1.577984</link> <link ref_id="JVacSciTechnolA.10.2458" type="INSPEC">JVTAD600001000000400245800000B</link> ….</apslinks> CrossRef - Boston, MA

  17. Process Overview CrossRef - Boston, MA

  18. Rendered Links CrossRef - Boston, MA

  19. Conclusions • Simple and pragmatic solutions work • Marked up content makes it all fit together (obviates the need for extensive labor) • Modest resources are needed to implement and maintain the system • Scheme is easily expanded to include other linking targets CrossRef - Boston, MA

  20. Contact information • http://prola.aps.org/ • doyle@aps.org CrossRef - Boston, MA

More Related