500 likes | 724 Views
IR Workshop Managing Scholarly Assets in Institutional Repositories: Sharing Experiences Among JULAC Libraries 24 February 2006, HKUST Library. Exploring IR Technologies. Ki Tat LAM Head of Library Systems The Hong Kong University of Science and Technology Library lblkt@ust.hk. Contents.
E N D
IR WorkshopManaging Scholarly Assets in Institutional Repositories:Sharing Experiences Among JULAC Libraries24 February 2006, HKUST Library Exploring IR Technologies Ki Tat LAM Head of Library Systems The Hong Kong University of Science and Technology Library lblkt@ust.hk
Contents • DSpace Software • SRW/U, Usage statistics, OpenURL • Cross-Searching Technologies • Search engines – Google • OAI-PMH - OAIster, Scirus, HKIR • HKIR • Standardization • Author names; subjects; document types; metadata schema • Document deposition versus linking • Research Assessment Exercise
DSpace Software • Jointly created by MIT Libraries and Hewlett-Packard Company [http://www.dspace.org/] • Open source software – released since 2002 • Adopted by HKUST Library for its IR since February 2003 [http://repository.ust.hk/] • Also adopted for HKUST’s Digital University Archives – migrated to DSpace in October 2004 [http://archives.ust.hk/]
DSpace Software [cont.] • HKUST’s Electronic Journals Online searching service will soon be migrated to DSpace [http://lbapps.ust.hk/ej/] • Adopted by CUHK for its IR (known as SiR) since mid-2004 [http://dspace.lib.cuhk.edu.hk/] • Adopted by CityU for its IR since 2005 [http://dspace.cityu.edu.hk/] • Will be adopted by HKIEd for building its IR
IR Software and Services • Open Source Software • DSpace • GNU EPrints • Fedora • See OSI Guide to Institutional Repository Software[http://www.soros.org/openaccess/software/] • Commercial Software • VITAL from VTLS Inc. – powered by Fedora • DigiTool from Ex Libris • Symposia from Innovative Interface Inc.
IR Software and Services [cont.] • Commercial Hosting Services • Digital Commons from ProQuest – powered by the bepress platform
DSpace at HKUST As of 19 February 2006, Home URL: http://repository.ust.hk/ IR Software: DSpace Version 1.3.2 System Software: Fedora Core 4 Linux; Tomcat 5.0; JDK1.4.2 Server: Intel Pentium 4 3GHz; 3GB RAM; 80GB hard disk Content: 2231 documents from 42 departments Usages: Documents were accessed 74,467 times since October 2004
DSpace at HKUST • Customizations • Document submission form • Add item form • CJK support • Authentication and authorization • SRW/U interface • Collection and Usage statistics • OpenURL linking
DSpace at HKUST [cont.] • SRW/U Interface • Search and Retrieval for the Web (or by URL) • Base URL: [http://repository.ust.hk/SRW/search/DSpace] • Alternative way of searching the repository - using standard web services • Allows search service providers to issue a federated search to various IRs and deliver the search results in their own GUI interface
Response to the following SRW search request: http://repository.ust.hk/SRW/search/DSpace?query=dc.creator+%3D+%22ip+nancy%22&operation=searchRetrieve&maximumRecords=1&startRecord=1...
XSLT-converted response to the following SRW search request: http://repository.ust.hk/SRW/search/DSpace?query=dc.creator+%3D+%22ip+nancy%22&operation=searchRetrieve&maximumRecords=1&startRecord=1...
DSpace at HKUST [cont.] • Size of the Repository [http://repository.ust.hk/dspace/dbstat.jsp] • Compiles in real time the number of items, collections and communities in the Repository • Top 20 Most Access Documents [http://repository.ust.hk/dspace/top20.jsp] • Compiled every month against the Tomcat web access logs • Excludes access by most robots
DSpace at HKUST [cont.] • OpenURL • All documents deposited in the HKUST IR must meet the open access criterion • Two solutions to link to non-open access documents were explored: • Direct linking to the documents as found in the library subscribed databases • OpenURL for Link Resolver • OpenURL approach was adopted because: • More persistent than vendor-provided URLs • Transparent to what databases locally subscribed
DSpace at HKUST [cont.] • One disadvantage of the OpenURL approach – what if the in-house link resolver fails to find a target link? e.g. • Host of the document is not OpenURL capable • Database not subscribed by the library • Target not profiled by the local link resolver • Developed a data entry interface to assist in the construction of OpenURL • Demonstration: • Sample item with OpenURL • Staff interface for OpenURL construction
Click on this image to launch HKUST’s WebBridge link resolverto locate the published version Documentdepositedin the Repositoryis apre-published version
Click on this link to retrieve the article hosted on Elsevier’s ScienceDirect platform
Build OpenURL Edit Item View Item OpenURL constructed
Check INNOPAC for bib record and then auto-insert the ISSNs to the form Click this link to test the OpenURL Click this button to create this OpenURL fragment
Cross-Searching IRs • Cross-searching approaches • If the IR site is open for robot access, documents are very likely available in major search engines, such as Google and Yahoo. • Indexing services harvest IR metadata using OAI-PMH protocol: • OAIster from University of Michigan [http://oaister.umdl.umich.edu/] • Scirus from Elsevier [http://www.scirus.com/] • HKIR – an experimental system by HKUST Library [http://lbapps.ust.hk/hkir/]
Draft Only Scirus search results page will look like this
Cross-Searching IRs [cont.] • OAI-PMH • A protocol developed by Open Access Initiative for harvesting metadata from distributed repositories • Most of the IR software, including DSpace, are OAI-PMH capable • Indexing services such as OAIster are OAI data harversters • IRs are OAI data providers
OAI-PMH’s XML outputin response to a“GetRecord” request Metadata in Unqualified Dublic Core metadata schema (oai_dc) OAI-PMH “GetRecord” request by URL:http://repository.ust.hk/dspace-oai/request?verb=GetRecord& ... 1783.1/1805
HKIR • HKIR - an experimental system developed by the HKUST Library to demonstrate the features of harvesting and cross-searching the scholarly and research output from the Hong Kong UGC funded institutions [http://lbapps.ust.hk/hkir/] • Powered by the DSpace software • Equipped with OCLC’s OAIHarvester2 software for harvesting OAI metadata from IRs
HKIR [cont.] • Databases harvested (as of 22 Feb 2006): • CUHK SiR [70 records] • CityU Institutional Repository [425 records] • HKUST Electronic Theses [1,681 records] • HKUST Institutional Repository [2,126 records] • HKU Theses Online [13,583 records]
A sampleHKIRrecord Click on this link to go to the record in CUHK’s IR This record was harvested from CUHK’s IR and it is in their Fine Arts collection
A sampleHKIRrecord showing fields labeled in qualified Dublin Core elements
HKIR [cont.] • Standardization Issues • Author names standardization • Subject analysis • Free vocabulary versus thesaurus • Adopt same thesaurus among institutions? • Document types • Adopt same set of definitions among institutions? • Metadata schema • Adopt same metadata schema? • Use oai_dc schema for OAI harvesting?
Author namesstandardization Author name assigned by HKUST Author name assigned by CityU
Document type assigned to the same article are different
HKIR [cont.] • Problem on loading harvested oai_dc metadata • oai_dc is the most popular metadata schema used by OAI data provider tools, e.g. • Virginia Tech’s VTOAI - used by HKUST and HKU in their Theses databases • OCLC’s OAICat - used by DSpace • oai_dc does not support qualified Dublin Core • The qualified DC fields stored in local DSpace have to be scaled down to simple DC when exporting records to OAI harversters
HKIR [cont.] • Mapping metadata back to qualified DC for loading to HKIR is challenging • Need to develop a HKIR version of schema that takes qualified DC
Metadata in oai_dc schema as received by the OAI harvester dc:dentifier.citation in local IR dc:dentifier.uri in local IR dc:dentifier.openurl in local IR
HKIR [cont.] • Document deposition and linking • Deposit all open access documents to the local IRs • If published version is in restricted access, then deposit the pre-published version and provide a link to the published version • Use OpenURL for linking as long as the document is in a database that can be reached via link resolvers • Otherwise, add the vendor-specific link to the metadata record
HKIR [cont.] • Research Assessment Exercise (RAE) • Assess the quality of the research output of the academic staff • Assist in assessing the research fund allocation to the funded institutions • UGC is conducting RAE 2006 [http://www.ugc.edu.hk/eng/ugc/publication/prog/rae/rae.htm] • Each eligible academic staff submits a maximum of six publications • Assessed by subject panels
HKIR [cont.] • High potential of utilizing the cross-institutional repository to assist academic staff to submit items and prepare reports • Go electronic – no longer need to collect submissions in printed format • IRRA (Institutional Repositories & Research Assessment) - a project that support RAE through IRs, for the UK RAE in 2008 [http://irra.eprints.org/] • Developing software for EPrints and DSpace to facilitate RAE tasks • DSpace version to be available in summer 2006
HKIR [cont.] • If we have a cross-institutional repository for Hong Kong IRs, then we may consider adding support for RAE to the system • Next round of UGC RAE is in 2011or 2012
Sample screen from an IR showing users selecting items for RAE submission[source: http://irra.eprints.org/software/bronze/eprints.html]