Data harvesting with JavaScript to enhance record display in the OPAC

Data harvesting with JavaScript to enhance record display in the OPAC Doug Eriksen - Seattle University May 17 - 20, 2009

Agenda • What can you harvest? • How do you harvest it? • Why harvester.js instead of WebBridge/Pathfinder Pro? • What can you do with the data? • Examples • Implementation Details

What data can you harvest? • Standard numbers • Title • Author • Persistent URL • Logged-in patron name • Almost anything, if you can think of a use for it

JavaScript harvesting • harvester.js - Client-side script scrapes the page for data and loads it into global variables • Custom wwwoptions create global variables • manipulator.js – inserts content built from the variables into empty placeholders built into bib_display.html, botlogo, or toplogo

Why not WebBridge/Pathfinder Pro? • Can’t harvest all the same data • Can’t insert content into arbitrary locations on the bib_display.html page • Limited to  • Can’t be inserted inside toplogo or botlogo • Can’t do Google Books previews or covers

What can you do with the data? • Pass data to another web site or service • Offer Google Books previews, cover images, formatted citations, custom exporters • Build custom links in the manner of WebBridge/Pathfinder Pro • Rewrite and improve your page • Better page title • Better print display • Better bookmarks

Examples • Google Books viewer • WorldCat.org links • Rewrite your page title • Social bookmarking • Custom print view

Google Books button This button only appears if a scan of the book (full or partial) is available for preview

Google Books Viewer This viewer loads right on top of you page in a lightbox. No need to send users to another page, or open a pop-up window.

WorldCat.org links • With a title’s OCLC# you can construct URLs that lead to four very useful resources on WorldCat.org • Other Editions • Formatted Citations • RefWorks Export • Citation File (Endnote) Export

Construct a WorldCat.org URL • For the OCLC# 63186207 these are the URLs: • http://summit.worldcat.org/oclc/63186207/editions/ • http://worldcat.org/oclc/63186207?page=citation • http://worldcat.org.proxy.seattleu.edu/oclc/63186207?page=refworks – Note that I ran this link through my proxy server so RefWorks will recognize my off-campus users • http://worldcat.org/oclc/63186207?page=endnote • Easily built using the bib_OCLC global variable populated by harvester.js • Caution: OCLC could change these URLs anytime

Rewrite your page title • You can customize the <title> element on some pages, but many pages get a system generated <title> • In my catalog they look like this: • <title>Seattle University Libraries/Lemieux</title> • On a bibliographic display I would like to use the title from the record as my page title

Rewrite your page title • <script type=“text/javascript”> document.title = "SU library - " + bib_title; </script> Before After

Social Bookmarking • Customize a social bookmarking tool, such as the AddThis button • Use the persistent URL instead of the current URL, use the page title of your choice

Custom Print View Default record display in my OPAC

Custom Print View Print view from my OPAC, created with a print style sheet and JavaScript to insert the patron name, and the persistent URL

Implementation Details • wwwoptions • The wwwoption LOGGEDIN_MSG typically looks like this: • LOGGEDIN_MSG=You are logged into %s as %s • Replace it with: • LOGGEDIN_MSG=You are logged into %s as <script type="text/javascript">var pName = '%s'; document.write(pName);</script>

Implementation Details • wwwoptions • The wwwoption ICON_RECORDLINK typically looks like this: • ICON_RECORDLINK=<div class="bibRecordLink"><a id="recordnum" href="%s" >Permanent URL for this record</a></div> • Replace it with this: • ICON_RECORDLINK=<script type="text/javascript">var url_tail='%s'; var url_full='http://library.seattleu.edu' + url_tail;</script>

Implementation Details • pName = logged in patron’s name • url_full = persistent URL for the record • Where do we get the rest of the data • harvester.js

harvester.js • Scans the page for table rows • Entire page, or div of your choosing • Looks for rows with two cells, and compares the contents of the first cell to a list of labels • When it finds a match for a label it harvests the data from the second cell and places it in a variable

harvester.js • Configuration variables • var label_tableContainer="fullRecord"; • var label_ISBN="ISBN"; • var label_ISSN="ISSN"; • var label_title="Title"; • var label_author="Author"; • var label_LCCN="LCCN"; • var label_OCLC="Record #”;

Live examples • With code! • http://www.seattleu.edu/lemlib/iug09/

Credits • Tina Ching from Seattle University Law library helped design and present version one of this script. • Adam Brin, from Bryn Mawr, posted a script to the IUG clearinghouse to collects the persistent URL and the book title, and build a social bookmarking button with the AddThis service. It was based on harvesting table rows and scanning them for labels to spot the data. • Erik Still from Boeing Library Services has been a sounding board and early tester for many of my “improvements” since Tina and I first presented this code.

See it in action • Seattle University, and SU Law • http://library.seattleu.edu • University of Oregon • http://janus.uoregon.edu/ • Mt. Hood Community College • http://lrc-srv.mhcc.edu/ • Bowdoin College • http://phebe.bowdoin.edu • Dowling College • http://library.dowling.edu/

Data harvesting with JavaScript to enhance record display in the OPAC

Data harvesting with JavaScript to enhance record display in the OPAC

Presentation Transcript

Strategies to Enhance the Utility of Data in ImmPort

Determining Effective Data Display with Charts

Record all data in ENB!

Data Display

Determining Effective Data Display with Charts

Working the A to Z List enhance journal access in the OPAC

Harvesting Data

How to display data badly

OPAC

Using the opac

Employing an RDBMS to Integrate and Enhance the Usability of Land Record Data

What is JavaScript? Embedding JavaScript with HTML JavaScript conventions Variables in JavaScript

What’s the problem with JavaScript?

LinkedIn data collection to enhance the business

Semalt: Data Scraping With Javascript

Employing an RDBMS to Integrate and Enhance the Usability of Land Record Data

How to display data badly

How to display data badly

How to display data badly

What’s the problem with JavaScript?

data harvesting