1 / 33

Adam Chandler Cornell University Library

A demonstration of transparent and scalable OpenURL quality metrics for use in promoting metadata consistency across content providers. Adam Chandler Cornell University Library. Cornell University Library, Metadata Working Group Forum 16 October 2009. OpenURL model. OpenURL model cont.

Download Presentation

Adam Chandler Cornell University Library

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A demonstration of transparent and scalable OpenURL quality metrics for use in promoting metadata consistency across content providers Adam Chandler Cornell University Library Cornell University Library, Metadata Working Group Forum 16 October 2009

  2. OpenURL model

  3. OpenURL model cont. incoming OpenURL http://linkresolver.library.cornell.edu:4550/resserv?&url_ver=z39.88-2004&url_ctx_fmt=info:ofi/fmt:kev:mtx:ctx&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=item-level+usage+statistics+a+review+of+current+practices+and+recommendations+for+normalization+and+exchange&rft.auinit=c&rft.aulast=merk&rft.date=2009&rft.epage=162&rft.genre=article&rft.issn=0737-8831&rft.issue=1&rft.place=bingley&rft.pub=emerald+group+publishing+limited&rft.spage=151&rft.stitle=libr+hi+tech&rft.title=library+hi+tech&rft.volume=27&rfr_id=info:sid/www.isinet.com:wok:wos&rft.au=scholze,+f&rft.au=windisch,+n&rft_id=info:doi/10.1108%2f07378830910942991/ in our knowledge base? title: Library hi tech issn: 0737-8831 start date: 19970101 end date: link-to syntax for Emerald http://www.emeraldinsight.com/rpsv/cgi-bin/cgi?body=linker&reqidx=#@ISSN-HYPHEN#(#@DATE#)#@VOLUME#:#@ISSUE#L.#@SPAGE#

  4. OpenURL is pervasive Cornell link resolver alone: July 1, 2008 – June 30, 2009: 402,000 OpenURL service requests. 402,000 * 123(ARL libraries) = 49 million

  5. Cornell’s top 10 OpenURL sources • Web of Knowledge • WorldCat Local • Google Scholar • Webfeat (our “Find Articles” service) • EBSCOHost • OCLC FirstSearch • SilverPlatter • Weill Cornell Medical Center • SciFinder Scholar • PubMed

  6. … but quality of experience is difficult to benchmark • Wrong start end date in the local library's holdings knowledge base (see NISO KBART) • Semantically inaccurate metadata from the OpenURL origin (wrong ISSN, for example) • Wrong link-to syntax in link resolver • Fragile handling of incoming links by content provider

  7. … but quality of experience is difficult to benchmark • Inaccurate or missing Crossref DOI URL (sometimes the DOI registration process is out of sync with the mounting of articles) • Subscription errors (especially with the start of a new calendar year) • Syntactically incorrect or missing metadata from the OpenURL origin

  8. Literature review I can identify no systematic study designed and carried out to benchmark the quality of linking. The OpenURL standard was introduced some ten years ago.

  9. Wakimoto, Walker, and Dabbour (2006) Main finding: Users just expect full-text. When they do not get it they are disappointed. Jina Choi Wakimoto, David S. Walker, and Katherine S. Dabbour (2006). "The Myths and Realities of SFX in Academic Libraries." The Journal of Academic Librarianship 32 (2): 127–136

  10. Wakimoto, Walker, and Dabbour (2006) "Where does SFX start and where does it end? If an SFX request does not result in a full-text link, does the problem lie with the source database’s metadata, the construction of the OpenURL request, the SFX KnowledgeBase, the SFX software, the resulting target resource, or even the local library’s collection development plan?" (p. 134) Jina Choi Wakimoto, David S. Walker, and Katherine S. Dabbour (2006). "The Myths and Realities of SFX in Academic Libraries." The Journal of Academic Librarianship 32 (2): 127–136

  11. Blake and Knudson (2002) • “Increased outreach by librarians to authors emphasizing and promoting the importance of citation standards for electronic document retrieval.” Blake, Miriam E. and Frances L. Knudson. "Metadata and Reference Linking." Library Collections, Acquisitions & Technical Services 26 (3), (2002): 230.

  12. Blake and Knudson (2002) • “Increased communication between primary publishers and secondary publishers. Metadata corrections and updates need to be better coordinated.” (NISO KBART role) Blake, Miriam E. and Frances L. Knudson. "Metadata and Reference Linking." Library Collections, Acquisitions & Technical Services 26 (3), (2002): 230.

  13. Blake and Knudson (2002) • “Increased consistency in metadata within a single database and across databases. This would result in a higher success rate of linking and would allow the algorithms to be simpler. Simpler algorithms are easier to maintain and modify.” Blake, Miriam E. and Frances L. Knudson. "Metadata and Reference Linking." Library Collections, Acquisitions & Technical Services 26 (3), (2002): 230.

  14. Mellon funded planning grant for L'Année philologique 1. Canonical Citation Linking: http://cwkb.org In collaboration with Eric Rebillard, Professor, Classics and History, and David Ruddy, Cornell University Library 2. OpenURL Quality Is it possible to build a tool for evaluating the quality of OpenURLs from a content provider?

  15. Constant: Core elements used by content providers in their link-to targets title - 64% spage - 64% volume - 61% issue - 60% date - 48% aulast - 47% issn - 35% atitle - 35% DOI - 14% ISBN – 5% Based on an analysis of link-tos in the Cornell instance of the III WebBridge link resolver product.

  16. Variable: Frequency of element string patterns for all sources

  17. aulast First author's family name. This may be more than one word. In many citations, the author's family name is recorded first and is followed by a comma, e.g. Smith, Fred James is recorded as "aulast=smith"

  18. aulast if ($e =~ /aulast/) { $patterns{$neworigin}{$newsid}{$e}++; if ($elementhash{$e} =~ /^[A-Za-z]+$/) { $patterns{$neworigin}{$newsid}{"aulast_simple"}++; } elsif ($elementhash{$e} =~ /^[A-Za-z]+, .+$/) { $patterns{$neworigin}{$newsid}{"aulast_comma"}++; } elsif ($elementhash{$e} =~ /^[A-Z][a-z]+( [A-Z]\.)+$/) { $patterns{$neworigin}{$newsid}{"aulast_simpleplusinitial"}++;} else { $patterns{$neworigin}{$newsid}{"aulast_other"}++; } }

  19. aulast_other examples Ryan S Miller Louise D Bryant DAVID J MCKENZIE %C4%90okovi%C4%87 Indu B Ahluwalia Carreras-Sangr%c3%a0 Bautista-Casta%C3%B1o O%27Shea Melissa Ventura Marra Guan XueYing%3B Yu Nan%3B ShangguanXiaoXia

  20. spage First page number of a start/end (spage-epage) pair. Note that pages are not always numeric.

  21. spage if ($e =~ /spage/) { $patterns{$neworigin}{$newsid}{$e}++; if ($elementhash{$e} =~ /^\d+$/) { $patterns{$neworigin}{$newsid}{"spage_number"}++; } elsif ($elementhash{$e} =~ /^\d+-\d+$/) { $patterns{$neworigin}{$newsid}{"spage_number_number"}++; } elsif ($elementhash{$e} =~ /[A-Za-z].+\d/) { $patterns{$neworigin}{$newsid}{"spage_string_w_number"}++; } else { $patterns{$neworigin}{$newsid}{"spage_other"}++; } }

  22. spage_other examples • 1033 (6 pages) • 85(19) • 575 (11 pages) • 283...290 • PHYS • GLRM • 58,+VI

  23. date The publication date of the item or bundle encoded in the "Complete date" variant of ISO8601 (see http://www.w3.org/TR/NOTE-datetime). This format is YYYYMM- DD where YYYY is the four-digit year, MM is the month of the year between 01 (January) and 12 (December), and DD is the day of the month between 01 and 28 or 29 or 30 or 31, depending on length of the month and whether it is a leap year.

  24. date if ($e =~ /date/) { $patterns{$neworigin}{$newsid}{$e}++; if ($elementhash{$e} =~ /^\d{4}$/) { $patterns{$neworigin}{$newsid}{"date_dddd"}++; } elsif ($elementhash{$e} =~ /^\d{4}-\d{2}$/) { $patterns{$neworigin}{$newsid}{"date_dddd-dd"}++; } elsif ($elementhash{$e} =~ /^\d{4}-\d{2}-\d{2}$/) { $patterns{$neworigin}{$newsid}{"date_dddd-dd-dd"}++; } elsif ($elementhash{$e} =~ /^\d{4}-\d{4}$/) { $patterns{$neworigin}{$newsid}{"date_dddd-dddd"}++; } elsif ($elementhash{$e} =~ /^\d{8}$/) { $patterns{$neworigin}{$newsid}{"date_dddddddd"}++; } else {$patterns{$neworigin}{$newsid}{"date_dateother"}++; } }

  25. date_other examples • 1956 July • %7E1994 • June 5%2C 2002 • JUN 30 05 • 2006%282007%29 • 1922,+April+25th • %5B%5B1943-06-19%5D%5D

  26. issn International Standard Serials Number (ISSN). The issn may contain a hyphen, e.g. "1041-5653"

  27. issn if ($e =~ /issn/) { $patterns{$neworigin}{$newsid}{$e}++; if ($elementhash{$e} =~ /^\d{4}-\d{3}./) { $patterns{$neworigin}{$newsid}{"issn_number_number"}++; } elsif ($elementhash{$e} =~ /^\d{7}./) { $patterns{$neworigin}{$newsid}{"issn_number"}++; } else { $patterns{$neworigin}{$newsid}{"issn_other"}++; } }

  28. issn_other examples • 0065-2598%28print%29 • 0018-5345+%28ISSN+print%29 • ISSN ISBN 0-9525091-5-6. • 0021-8375%28print%29%7C1439-0361%28electronic%29 • 1471-2164+%28ISSN+online%29 • 0191-8699%3B0191-8699 • 0741-8329 (Print)%3B NLM Unique Journal Identifier%3A 8502311

  29. How often out of 402,000 Cornell OpenURLs?

  30. flat file output logsourceyear quarter origin sid metric count cornell 2009 Q1 csacsa:commabs-set-c atitle 154 cornell 2009 Q1 csacsa:commabs-set-c atitle_colon 101 cornell 2009 Q1 csacsa:commabs-set-c atitle_other 53 cornell 2009 Q1 csacsa:commabs-set-c aulast 159 cornell 2009 Q1 csacsa:commabs-set-c aulast_other 4 cornell 2009 Q1 csacsa:commabs-set-c aulast_simple 155 cornell 2009 Q1 csacsa:commabs-set-c date 159 cornell 2009 Q1 csacsa:commabs-set-c date_dddd 110 cornell 2009 Q1 csacsa:commabs-set-c date_dddd-dd 49 cornell 2009 Q1 csacsa:commabs-set-c isbn 6 cornell 2009 Q1 csacsa:commabs-set-c isbn_10 6 cornell 2009 Q1 csacsa:commabs-set-c issn 135 cornell 2009 Q1 csacsa:commabs-set-c issn_number-number 135 cornell 2009 Q1 csacsa:commabs-set-c issue 136 cornell 2009 Q1 csacsa:commabs-set-c issue_number 132 cornell 2009 Q1 csacsa:commabs-set-c issue_number_dash_number2 cornell 2009 Q1 csacsa:commabs-set-c issue_other 2 cornell 2009 Q1 csacsa:commabs-set-c spage 153 cornell 2009 Q1 csacsa:commabs-set-c spage_number 153 cornell 2009 Q1 csacsa:commabs-set-c title 160 cornell 2009 Q1 csacsa:commabs-set-c total 160 cornell 2009 Q1 csacsa:commabs-set-c volume 139 cornell 2009 Q1 csacsa:commabs-set-c volume_number 139

  31. Demonstration http://openurlquality.blogspot.com/

  32. Next steps • create a NISO structure to wrap around the metrics: “NISO OpenURL Quality Index” • add non-Cornell data from libraries and link resolver vendors (model is agnostic to source) • confirm and publicize key elements used by target syntaxes • can the quality of the global OpenURL network be modeled mathematically?

  33. How to stay in the loop http://openurlquality.blogspot.com/ Adam ChandlerDatabase Management and Electronic Resources Research LibrarianCentral Library OperationsCornell University Librarytel: 607-255-5760email: alc28@cornell.edu

More Related