460 likes | 582 Views
International Conference on Developing Digital Institutional Repositories: Experiences and Challenges December 9-10, 2004, Hong Kong DSpace in Action Implementing the HKUST Institutional Repository System. Presented by K.T. Lam Head of Library Systems
E N D
International Conference on Developing Digital Institutional Repositories: Experiences and ChallengesDecember 9-10, 2004, Hong KongDSpace in ActionImplementing theHKUST Institutional Repository System Presented by K.T. Lam Head of Library Systems The Hong Kong University of Science and Technology Library lblkt@ust.hk
Table of Contents • From Idea to Creation • Why have an IR? • IR Software Selection • Major Features • Future Improvements • Conclusions Implementing the HKUST Institutional Repository System / K.T. Lam
From Idea to Creation • The idea of establishing an IR originated from a staff development workshop at HKUST Library on 26 November 2002, where Kimberly Douglas was invited to speak on “E-prints, OAI and Institutional Repository”. • After the workshop, a Task Force was formed to investigate the idea. • After two months of software evaluation, DSpace was selected to build the Repository. Implementing the HKUST Institutional Repository System / K.T. Lam
From Idea to Creation (cont.) • The IR System at HKUST was brought to life in February 2003, with the following configuration and data content: • DSpace Version 1.01 • Server with Intel Pentium III 733 MHz, 512 MB RAM, and RedHat Linux Release 7.3 • 105 Computer Science Technical Reports Implementing the HKUST Institutional Repository System / K.T. Lam
From Idea to Creation (cont.) • Background / Experience Facilitating the Creation • HKUST Library is an early supporter of the Open Access concept - joined SPARC (Scholarly Publishing & Academic Resources Coalition) in 2001 • Experience of conducting digital libraries projects, with CJK capabilities • Electronic Course Reserve - 1993 • Digital University Archives and Electronic Theses - 1997 • etc. Implementing the HKUST Institutional Repository System / K.T. Lam
From Idea to Creation (cont.) • Why have an IR? • To create a permanent record of the scholarly output of HKUST • No available access to some scholarly works published by our own faculty • Collections of working papers, technical reports, research reports floating around • Some of our scholarly works are in the public domain Implementing the HKUST Institutional Repository System / K.T. Lam
From Idea to Creation (cont.) • Why have an IR? (cont.) • To make HKUST’s scholarly output more globally and openly accessible • To support the international Open Access effort. “[T]he mission of disseminating knowledge is only half complete if it is not widely and readily available to society” - Berlin Declaration (http://www.zim.mpg.de/openaccess-berlin/berlindeclaration.html) Implementing the HKUST Institutional Repository System / K.T. Lam
From Idea to Creation (cont.) • IR Software Selection • The July/August 2004 issue of Library Technology Reports provides a very detailed discussion on institutional repository systems and functional requirements Implementing the HKUST Institutional Repository System / K.T. Lam
From Idea to Creation (cont.) • IR Software Selection (cont.) • Decision in the first meeting of the IR Task Force in mid December 2002: • follow Caltech's model, i.e. to base our IR on open source software and with OAI-PMH interface. • We therefore evaluated two IR systems: EPrints and DSpace Implementing the HKUST Institutional Repository System / K.T. Lam
From Idea to Creation (cont.) • IR Software Selection (cont.) • EPrints • Developed by University of Southampton • The very first open source IR software; since 2000 • Written in Perl, with MySQL database and Apache Web server Implementing the HKUST Institutional Repository System / K.T. Lam
From Idea to Creation (cont.) • IR Software Selection (cont.) • DSpace • Jointly developed by MIT Libraries and Hewlett-Packard Company • Open source software • Released on Sourceforge during our system evaluation period in late December 2002 • Written in Java, with PostgreSQL database, Lucene search engine, and a Tomcat web servlet container Implementing the HKUST Institutional Repository System / K.T. Lam
From Idea to Creation (cont.) • IR Software Selection (cont.) • We chose (almost two years ago) DSpace because: • DSpace began the development with the experience gained from EPrints - the very first and most popular open source IR software at that time • EPrints did not have full support on Unicode and is not Java- and servlet-based • Both EPrints and DSpace are open source software, fulfill our functional requirements, and follow state-of-the-art library standards Implementing the HKUST Institutional Repository System / K.T. Lam
Current Configuration of IR at HKUST As of 4 December 2004, Home URL: http://repository.ust.hk/ IR Software: DSpace Version 1.2 System Software: Fedora Core 2 Linux; Tomcat 5.0; JDK1.4.2 Server: Intel Pentium 4 2.4GHz, 1GB RAM Content: 1650 documents from 38 Departments Usages: Documents were accessed 9,051 times in the previous month Implementing the HKUST Institutional Repository System / K.T. Lam
Implementing the HKUST Institutional Repository System / K.T. Lam
Growth (May 2003 to September 2004) Implementing the HKUST Institutional Repository System / K.T. Lam
Major Features • This section covers the following topics • Data structure • Document submission form • Add item form • CJK support • OAI data provider • SRW/U interface • Google pilot project • Authentication and authorization Implementing the HKUST Institutional Repository System / K.T. Lam
Major Features (cont.) • Data Structure • Document Types • Preprints, technical reports, working papers, conference papers, journal articles, presentations, book chapters, patents, theses, etc. • Document Formats • Mainly PDF files; also contains PowerPoint files Implementing the HKUST Institutional Repository System / K.T. Lam
Major Features (cont.) • Data Structure (cont.) • DSpace data model • Communities (and sub-communities) • Collections • Items • Metadata • Bundles of bitsteams • HKUST implementation: Items are grouped by Departments (i.e. communities) and then by Document Types (i.e. collections). Implementing the HKUST Institutional Repository System / K.T. Lam
Community Collections Implementing the HKUST Institutional Repository System / K.T. Lam
CNRI Handle(Persistent Identifier) Document in PDF Implementing the HKUST Institutional Repository System / K.T. Lam
Major Features (cont.) • Document Submission Form • Faculty are apathetic about self-submission • DSpace’s submission and workflow functions are too lengthy; might scare off faculty • In need of a simple and effortless submission form - as a quick medium for submitting documents Implementing the HKUST Institutional Repository System / K.T. Lam
Major Features (cont.) • Document Submission Form (cont.) • Decided to develop our own form • Requires only very minimal data entry • Non-exclusive distribution license agreement • Library IR staff enhance the metadata of the submissions and then add them to DSpace ------- • Written in Perl • Submitted data stored in DSpace “Simple Archive Format” Implementing the HKUST Institutional Repository System / K.T. Lam
Implementing the HKUST Institutional Repository System / K.T. Lam
Implementing the HKUST Institutional Repository System / K.T. Lam
Major Features (cont.) • Add Item Form • Locally developed JSP application to add items to DSpace by Library IR staff • Allows IR staff to: • Create new item from scratch • Enhance the metadata from faculty submission and then add the item to DSpace Implementing the HKUST Institutional Repository System / K.T. Lam
Implementing the HKUST Institutional Repository System / K.T. Lam
Implementing the HKUST Institutional Repository System / K.T. Lam
Major Features (cont.) • CJK (Chinese, Japanese, Korean) Support • DSpace supports Unicode • Problem - Lucene search engine is unable to search by CJK characters • Solved by replacing DSpace’s Tokenizer with a CJKTokenizer - but has an interesting side effect • Problem - URL of query containing CJK characters is not properly encoded • Solved by setting Tomcat URIEncoding="UTF-8" and adding URLEncode() to one line of the java source code Implementing the HKUST Institutional Repository System / K.T. Lam
Implementing the HKUST Institutional Repository System / K.T. Lam
Implementing the HKUST Institutional Repository System / K.T. Lam
So, …. Sorting Problem.Can you figure out the logic behind? Implementing the HKUST Institutional Repository System / K.T. Lam
Major Features (cont.) • OAI Data Provider • DSpace is OAI-compliant • This means that OAI harvesters can easily collect the metadata (in Dublin Core format) from various IRs (including HKUST’s) for their added-value indexing/searching services. • For example: OAIster • OAI Path to IR at HKUST: http://repository.ust.hk/dspace-oai/request? Implementing the HKUST Institutional Repository System / K.T. Lam
http://repository.ust.hk/dspace-oai/request?verb=GetRecord& ... 1783.1/1805 Implementing the HKUST Institutional Repository System / K.T. Lam
Major Features (cont.) • SRW/U Interface • Search and Retrieval for the Web (or by URL) • Retain core functionality of Z39.50 but in the form of web services • This means search service providers can broadcast a search to various IRs and deliver the search results in their own GUI interface • SRW/U Interface for the IR at HKUST • Based on OCLC’s SRW/U software • URL: http://repository.ust.hk/SRW/ Implementing the HKUST Institutional Repository System / K.T. Lam
The results of a SRW/U search, with XSLT transformation Implementing the HKUST Institutional Repository System / K.T. Lam
Major Features (cont.) • Google Pilot Project • Initiated in March 2004 by the DSpace user community under the leadership by MacKenzie Smith • To improve access to DSpace IRs from within Google • HKUST is a participant of this project • Result - created a restrict=dspace search filter for use in the Google URL. For example: http://www.google.com/search?restrict=dspace&q=collaboration Implementing the HKUST Institutional Repository System / K.T. Lam
http://www.google.com/search?restrict=dspace&q=collaboration Implementing the HKUST Institutional Repository System / K.T. Lam
Major Features (cont.) • Authentication and Authorization • Authentication - by EPerson record created through user registration • Authorization - based on the policy settings on the object (community, collection, item, bitstream, etc.) • A&A are not a big concern to our IR • We do not use DSpace’s submission and workflow functions • It is open to the public • A&A only required when our library IR staff access DSpace’s administration functions Implementing the HKUST Institutional Repository System / K.T. Lam
Major Features (cont.) • DSpace Authentication and Authorization (cont.) • We have however customized DSpace to allow for campus-wide LDAP authentication • Mainly for a different project that also uses DSpace (Digital University Archives). • Transparent creation of EPerson record on-the-fly during authentication • We have also investigated the feasibility of hooking DSpace with Yale’s Central Authentication Services • With only little success - due to cumbersome stage transfer from authentication to authorization Implementing the HKUST Institutional Repository System / K.T. Lam
https://archives.ust.hk/ Login to see more… Implementing the HKUST Institutional Repository System / K.T. Lam
Future Improvements • Flatten community+collection structure - 2-level only, not deep enough • Linked collection - a collection that belongs to more than one community • Unable to search across multiple collections from multiple communities • Query Syntax not apparent to users, e.g. +water +rapid [for exact word match] "vapor generator" [for phrase search] Implementing the HKUST Institutional Repository System / K.T. Lam
Future Improvements (cont.) • Insufficient capability for sorting search results • Unable to display the number of items in a community and in a collection • We have developed a JSP page to display the size of the Repository • Does not have the capability of transferring an item from one collection to another; nor a collection from one community to another DSpace is open source software; its success depends on contributions from its user community Implementing the HKUST Institutional Repository System / K.T. Lam
Conclusions • DSpace was selected about two years ago to build the HKUST IR. • Make HKUST's scholarly research more openly and globally accessible. • Installing DSpace is straightforward, but tailoring it to work effectively in your institutional environment is not trivial. Implementing the HKUST Institutional Repository System / K.T. Lam
Conclusions (cont.) • Customization: • CJK support with UTF-8 encoding • Driven by the fact that faculty are apathetic about self-submission, a simple document submission form was developed. • Developed the “Add Item Form” to allow IR staff to add items to DSpace without the need of batch importing Implementing the HKUST Institutional Repository System / K.T. Lam
Conclusions (cont.) • By having the following implementations: • DSpace's built-in OAI support • OCLC's SRW/U on DSpace • Google’s DSpace search filter documents in the Repository are more fully exposed on the Internet for easy harvesting, searching and discovery Implementing the HKUST Institutional Repository System / K.T. Lam
Conclusions (cont.) • Finally, many many thanks to the DSpace team from MIT and HP for developing this high quality open source product! Thank you! 謝 謝! Implementing the HKUST Institutional Repository System / K.T. Lam