350 likes | 360 Views
The University of Cambridge Universal Catalogue: a work in progress Patricia Killiard Head of IT Services Cambridge University Library. University Library Dependent libraries Medical Library Scientific Periodicals Squire Law Library Betty & Gordon Moore Library. College libraries
E N D
The University of Cambridge Universal Catalogue: a work in progress Patricia Killiard Head of IT Services Cambridge University Library
University Library Dependent libraries Medical Library Scientific Periodicals Squire Law Library Betty & Gordon Moore Library College libraries Departmental & Faculty libraries Affiliated Institutions Other libraries associated with the University Libraries in the University of Cambridge UC
The Union Catalogue: Beginnings and growth • Began in 1982 with the Union List of Serials – non-MARC records based on a printed list • 1985 5 libraries began contributing short records for books to a Union Catalogue • 1987 UC first made available to the public with 53,000 records • 2002 90+ contributing libraries • New contributors are still joining • Software was written in-house and continued to be used until 2002
Standards ... • Early records were subject to no bibliographic standards to encourage contributions • Brief records due to cost of disk space in 1980s • No Authority control, even today • Independence of colleges, faculties and departments means no overall control of standards ... consequences for the UC • Serials records were non-MARC until 2002
Pre-2002 Union Catalogue Model • Consortial model with duplicate bibliographic records • No authority control • Completely separate from the authority-controlled file for the University Library • Separate Union List of Serials which was de-duplicated • Can still be seen at http://linux01.lib.cam.ac.uk/Catalogues/OPAC/xunion.shtml
Advantages and disadvantages of the old UC model Advantages • Ability to request preferred 3 libraries first • Some patron functionality, e.g. Patrons able to view books on loan • Each library’s holdings could be distinguished immediately Disadvantages • Lack of de-duplication in the main Union Catalogue • Large numbers of search results • Exclusion of the University Library holdings from the UC • Separation of serials catalogue from monographs
Voyager vision for Cambridge • Single de-duplicated Universal Catalogue incorporating all public databases, bringing University Library and other databases together • Based on authority-controlled records • All patron functionality possible through the UC • Libraries able to retain local rights over records and patron functionality • Local subject headings retained
From Consortial Catalogue toUniversal Catalogue • Department/Faculty and College databases in Voyager have multiple owning libraries - no record sharing • Could move to a Union Catalogue module by allowing record sharing within databases but ... • Requires political will • Is very slow since records would merge on a individual basis • Interim stage of merging confusing for patrons
Cambridge System Hardware Universal Catalogue Feeder databases Web Server
Sun Fire 4800 4 x T3 arrays configured in 2 partner groups 2 x 4 x 750MHZ CPU’s 16GB memory (8GB for each domain) Disk space is: 2 x 18GB (used for Solaris)and 2 x 9 x 36GB (in one T3 partner pair) for each domain Domain A (Hookea) holds all production databases Domain C (Hookec) holds UC Web server = Sun 280R 2 x 750MHz UltraSPARC III processors 4GB memory 72GB disk Test server = Sun 220R Hardware specifications
De-duplication • Indexes used: • 010, 020, 022, 0350, 0359 • Large proportion of records do not have ISBNs or LCCNs • De-duplication is very loose • Resulted in very low levels of de-duplication (3-15%) • De-duplication may actually reduce as the file accumulates due to addition of older records without control numbers
Replace vs Merge in de-duplication • Bi-directional merge profile should have been available in 2001.2 but not yet working • Essential in order to preserve British Education Index and local subject headings in 650._4 and 650._7 • Might be used in future to preserve other fields, e.g. 856 fields
Quality Hierarchy Leader/06 Leader/17 040$a 040$d * * DLC * as * * depfacaedb ab * * depfacaedb as * * depfacfmdb ab * * depfacfmdb as * * depfacozdb ab * * depfacozdb as * * collandb ab * * collandb as * * collpwdb ab * * collpwdb as * * otherdb ab * * otherdb * * * cambrdgedb
Trial UC build no. 1: Aug 2001 • First UC build with 2000.1.3 – built before remainder of system went live • Contributing files were all test loads of data for all libraries - very slow to configure and build • UC Phase 2 – should have had link back to holdings records but bug in 2000.1.3 prevented it from working • Upgrade to 2000.2.1 needed to make it work (Oct 2001) • No UB functionality • Very generic build using only 010, 020, 022 and 035 to de-duplicate
Trial build no. 2: Nov 2002 • 2 databases: cambrdgedb and depfacaedb with 2001.2 Beta • Bugs in Sysadmin affected • Duplicate detection profiles • Quality hierarchy • Bi-directional merge • Saving values in Sysadmin generally • Build failed several times at pre-bulk stage
Trial no. 3: March 2003 • Began March 2003, again with 2 databases • Early problems with matching location codes and Oracle database names • Further pre-bulk problems • Delayed while databases were clustered in March and upgraded to 2001.2.1 in early April • Build completed but • quality hierarchy failed to work • bi-directional merge • unable to test patron functionality
Production build • 21 July Initial load began with 2 databases: cambrdgedb and depfacaedb • Indexed and reviewed at this stage • 22 August load of remaining databases began • 28 August load and indexing complete • Currently under review • Authorities not loaded • UB not yet enabled • Bi-directional merge not yet functioning
Major issues to tackle • De-duplication of short records with no match points at present • Authority control in a non-authority controlled environment • Presentation of results to users: • Display doesn’t support multiple libraries in database: shows database name as location rather than holding library • Public names in OPAC need to be revised to reflect multiple libraries - 60 characters is not always sufficient
Short record de-duplication Option 1: Additional indexes • Creation of index solely for de-duplication purposes • Manual matching by cataloguers • Addition of local control number in matching records • Accurate but extremely slow • However, additional left-anchored indexes for de-duping, like 015 (BNB numbers) would help.
Short record de-duplication Option 2: • Combining indexes is probably the best way to tackle the very large numbers of short records • Algorithm to combine author, title, and publication date would be ideal Option 3: • Upgrading all short records through retrocon projects - expensive and not justified if only purpose is de-duplication
Serials: a special problem • Two types of serials records: • Short Union List of Serials records: identical for all libraries but multiple copies in each database • Upgraded serials records in all department/faculty and college databases • Need to ensure that • Higher quality records from departments etc. take precedence • Former Union List of Serials records do not diverge by controlling standards as they are upgraded
Authority control in the UC • Authority records from the University Library database will be loaded into UC • Local authorities discarded from Voyager build • No authorities in 7 out of 8 contributing databases • Options? • Load authorities into all databases? Too much space • Introduce authority control into other 7 databases through Web authorities or copying authority records from cambrdgedb - problem of cleaning up existing records
Presentation of search results • Patrons are interested in library holdings not database holdings • Location Limits appear to be possible only by database not library • May be able to work with access control groups and holdings sort groups • Random order of MFHDs very confusing
Patron issues: UB environment ... but not entirely • Full patron functionality in the UC OPAC was part of the Cambridge contract but recalls, holds and call slip requests not yet working • Patron records from all contributing libraries display in OPAC • Books on loan, requests, blocks, fines and fees from all libraries display in OPAC • Circulation clustered environment • UB installed but no reciprocal borrowing
Top Enhancements • Additional tools for de-duplication, preferably allowing combinations of indexes • Fix for the multiple MFHDs being delivered in random order - incomprehensible to the user • ISBN matching not ignoring text after first 10 digits (problem nos. 13283, 58877, etc.) • 020 __ |a 0335203884 and • 020 __ |a 0335203884(pbk) • Link from the UC record to the record in the contributing database would be very useful for Cambridge
University of Cambridge Universal Catalogue Can be seen at: http://hookec.lib.cam.ac.uk