230 likes | 310 Views
Pre-processing OpenURLs. Case Study: University of Kansas. John Miller jsmiller@ku.edu Univ. of Kansas EndUser 2004. Outline. the problems examples possible solutions / tools examples PubMed exception Benefits. The problems. inaccurate incoming OpenURLs data in the wrong element
E N D
Pre-processing OpenURLs Case Study: University of Kansas John Miller jsmiller@ku.edu Univ. of Kansas EndUser 2004
Outline • the problems • examples • possible solutions / tools • examples • PubMed exception • Benefits
The problems... • inaccurate incoming OpenURLs • data in the wrong element • incomplete incoming OpenURLs • GENRE sometimes missing • Author data missing or hidden • problematic journal title data • no logging for statistics
The problems... • lack of “optional” passing of OpenURL elements to Extended Services within LFP SysAdmin • only required elements are passed • absence of a required element eliminates the link • not really possible to pass an entire OpenURL to an External Service simply from within SysAdmin
Examples of problems ... • Journal Titles: • initial articles (Voyager doesn’t handle) • quotation marks (Voyager chokes on them) • internal dashes (SilverPlatter, et al.) • author and other characters at end of title (causes some title search links to fail) • SilverPlatter: • author(s) embedded in the PID rather than in AUTHOR or AULAST • Book titles appear in ATITLE rather than TITLE
&pid=<AN>0668115</AN> <AU>Ledesma-Liebana,-Patricia</AU> &aulast=Ledesma-Liebana,-Patricia Example of SilverPlatter author tagging: &pid=%3CAN%3E0668115%3C/AN%3E%3CAU%3ELedesma-Liebana%2c-Patricia%3C/AU%3E 6
Examples of problems ... • ISI (Web of Science): • journal titles appear only in STITLE, even though not abbreviated • mushes the volume and issue together incorrectly for Physical Review D -- is 4 digits, should be 2 or 3 • American Physical Society Journals • need to move ARTNUM to SPAGE • GENRE element often missing • Only PAGES is supplied, but need SPAGE • Need formatted data for later use, to get around the “only required are passed” problem
Possible solution ... • Intercept the incoming OpenURL • alter it, augment it • send the revised OpenURL to LFP • offers a generalized, flexible approach that can be improved over time • Coordinate with what is needed by individual Extended Services (e.g., OPAC searches, ILL form) • Use revised and augmented data supplied by the pre-processing program
Tools & Techniques ... • Pre-processor = fairly simple Perl / CGI program (but could be something else) • must be able to receive data from a URL, change it, and send a new URL elsewhere • substitute pre-processor’s URL for the normal LFP base URL • a willingness to fudge with some OpenURL elements that are infrequently used and not needed by LFP • e.g., a fake BICI element • create a log record of each “click”
Extended Service • Source • Index Citation • Catalog Record • Footnote pre-processor Link Resolver: Parser + Knowledge Base Perl - PHP - etc. Standard Target 10
(fake) BICI=sid|genre|atitle|full_author| title|date|volume|issue|spage| epage|issn|isbn|artnum • the above string enables LFP SysAdmin to look only for the presence of a BICI as a trigger for a particular extended service -- rather than the existence of a set of OpenURL elements • the extended service then has access to all of the above elements that exist • some elements (full_author, spage, epage) sometimes can be derived from others if they do not already exist (“full_author” is a locally-defined tag) 11
http://diglib.ku.edu/cgi-bin/illiad?bici=%BICI% BICI %BICI% 12
Log file DATE+TIME | SID | GENRE | TITLE | DATE | VOLUME | ISSUE | SPAGE | EPAGE | ISSN | ISBN 20040323094818|CAS:CAPLUS|article|Journal of Pharmaceutical Sciences|2003|92|8|1531||0022-3549| 20040323094920|ISI:WoK|article|BIODIVERSITY AND CONSERVATION|2004|13|1|1||0960-3115| 20040323095100|ISI:WoK|article|BIODIVERSITY AND CONSERVATION|2004|13|1|207||0960-3115| 20040323095247|ISI:WoK|article|BIODIVERSITY AND CONSERVATION|2004|13|1|275||0960-3115| 20040323095518|ISI:WoK|article|BASIC AND APPLIED ECOLOGY|2003|4|5|385||1439-1791| 20040323095749|ISI:WoK|article|CONSERVATION ECOLOGY|2002|6|2||14|1195-5449| 20040323095948|ISI:WoK|article|BIOLOGICAL CONSERVATION|2004|115|1|63||0006-3207| 20040323100026|SP:MLAB|article|Russian Studies in Literature|2001|37|3|89||1061-1975| 20040323100112|SP:MLAB|article|Russian Studies in Literature|2003|39|4|66||1061-1975| 20040323100519|ISI:WoK|article|AGRICULTURE ECOSYSTEMS &|2003|98|1-3|331||0167-8809| 20040323101518|SP:PY|article|American Psychologist|1954|9||632||0003-066X| 20040323101611|SP:PY|article|American Psychologist|1957|12||14||0003-066X|
20040323095247 | ISI:WoK | article | BIODIVERSITY AND CONSERVATION | 2004 | 13 | 1 | 275 | | 0960-3115| Date / Time: 20040323095247 SID: ISI:WoK GENRE: article TITLE: BIODIVERSITY AND CONSERVATION DATE: 2004 VOLUME: 13 ISSUE: 1 SPAGE: 275 ISBN: ISSN: 0960-3115 ARTNUM: 15
ASHP Midyear Clinical Meeting 103 Journal of Personality and Social Psychology 88 International Journal of Eating Disorders 74 Social Work 72 Psychological Reports 70 Child Development 67 Journal of the American Academy of Child and Adolescent Psychiatry 55 Journal of Adolescence 53 Child Abuse and Neglect 48 Addictive Behaviors 48 Journal of Youth and Adolescence 46 Journal of College Student Development 45 Drug Top 43 Child and Adolescent Social Work Journal 42 Journal of Applied Social Psychology 42 American Journal of Psychiatry 42 Journal of Applied Behavior Analysis 41 Perceptual and Motor Skills 41 Adolescence 41 Am J Health Syst Pharm 41 Nature 40 Annals of human biology 40 Smith College Studies in Social Work 40 Human Biology 40 (6,626 other titles with fewer than 40 clicks) (18,644 clicks altogether) March 2004 - “clicked on” titles
PsycInfo (SilverPlatter) 6931 Eric (CSA) 2128 Social Work Abstracts (SilverPlatter) 1034 MLA Bibliography (SilverPlatter) 1000 IPA (SilverPlatter) 867 SciFinder Scholar: CA Plus 724 Anthropology Plus (RLG) 552 ArticleFirst (OCLC FS) 435 Art Index (SilverPlatter) 429 Biological Abstracts (SilverPlatter) 344 Web of Science (ISI) 342 Sociological Abstracts (CSA) 255 Linguistics and Language Behavior Abstracts (CSA) 246 Anthropological Index, Royal Anthropological Institute (RLG) 243 Periodical Abstracts (OCLC FS) 235 GeoRef (SilverPlatter) 210 WorldCat (OCLC FS) 203 America: History and Life (ABC-Clio) 190 Sports Discus (SilverPlatter) 174 Education Abstracts (OCLC FS) 171 Social Service Abstracts (CSA) 154 Compendex (EV2) 145 SciFinder Scholar: Medline 139 Zoological Abstracts 136 PapersFirst (OCLC FS) 131 EconLit (SilverPlatter) 125 (33 other databases with fewer than 40 clicks) (18,644 clicks altogether) March 2003 - “clicked-from” databases
The “PubMed” exception • All that comes from PubMed initially is a PMID (PubMed Identifier) • Can log the identifier and the time, but nothing else • Requires redundant External Services to handle variations
This set-up is combined with custom XML to: (1) suppress duplicates when an incoming OpenURL satisfies more than one condition; and (2) supply a standard phrase 19
<xsl:for-each select="link"> <xsl:variable name="link-name" select="name"/> <xsl:choose> <xsl:when test="contains($link-name, 'ILLiad')"> <xsl:choose> <xsl:when test="position() = 1"> ... <ul class="list"> <li class="list-item"> <xsl:variable name="orig-url" select="url"/> <xsl:variable name="url"> <xsl:value-of select="$orig-url"/> </xsl:variable> <a target="_blank" href="{$url}"> Request a loan or copy of this item (if not available in the KU Libraries) </a> ... </xsl:when> <xsl:otherwise/> </xsl:choose> ... • from LFPDisplay.xsl • multiple ILLiad services • in priority order in SysAdmin • this shows only the first one with a standard phrase 20
Standard ILL phrase: “Request a loan or copy ...” 21
The Benefits? • More full text links work • More OPAC title searches work • Some impossible services become possible • more importantly, they become consistently possible • Source use statistics are compilable
Contact information: John Miller University of Kansas jsmiller@ku.edu