280 likes | 429 Views
Automating Name Authority Record Updates and Bibliographic File Maintenance. A Proof of Concept. Lucas Mak Michigan State University Libraries. Catalog M anagement Interest Group, ALA Annual, Chicago, IL, June 29, 2013. Authority Control at MSU.
E N D
Automating Name Authority Record Updates and Bibliographic File Maintenance A Proof of Concept Lucas Mak Michigan State University Libraries Catalog Management Interest Group, ALA Annual, Chicago, IL, June 29, 2013
Authority Control at MSU • 1.5 millions Authority Records (1.1 millions NARs) • In-house • NACO institution • Database maintenance • Post-cataloging Authority Control • New Headings Report • Download NARs from SkyRiver • Updates to NARs not necessary caught • 1XX (No item cataloged under changed 1XX not in New Headings Report) • Elements other than 1XX (e.g. 4XX, 670)
LC/NACO NAF RDA Transition • PCC Day 1 for RDA NAR: Mar. 31, 2013 • Phased reissuance of NARs • Phase 1 • Scope • NARs with characteristics known to be at variance with RDA practice • Not candidates for any of the mechanical changes to be made during phase 2 • Adding a 667 note “THIS 1XX FIELD CANNOT BE USED UNDER RDA UNTIL THIS RECORD HAS BEEN REVIEWED AND/OR UPDATED” • Completed Aug. 20, 2012 (436,943 records processed) • Phase 2 • Programmatic changes to 1XX headings that are not acceptable under RDA (e.g., changes to Bible headings, spelling out Dept. and months, etc., abbreviations in the subfield $d for personal names) • Completed March 27, 2013 (371,942 records changed)
Updates of NARs by NACO institutions • Reviewing, upgrading, and recoding Phase 1 records to RDA • Adding any of the 17 new MARC fields (e.g. 046, 372, etc.) • Routine NAR maintenance • PCC post-RDA test guidelines “strongly encourage” to evaluate and recode the “RDA-acceptable AACR2 NARs” to RDA whenever possible
Objectives • To catch changes to NARs • Changes in 1XX • Addition, deletion, or updates of elements other than 1XX • To perform related BFM if 1XX in a NAR is changed
Tasks • To download NARs one-by-one/in bulk • To detect updates to NARs already existing in ILS • To overlay existing NARs with updated ones • Updates authorized access points (AAPs) in bib records if 1XX in NAR updated • To automate and link up the above tasks
Task #1: Download NARs • OCLC LCNAF SRU Service • Can be searched by LCCN • Available in multiple schema including MARCXML • SRU-based service (HTTP request) • FREE!! • But: • Updated every Monday night • Bulk download – by search term (e.g. after certain date) • Implementation • Search LCCNs one-by-one by AutoIt script • Around 10 records/sec. retrieved • Download XML files into one folder (files named by LCCN)
Task #2: NAR Update Detection • To compare NARs from ILS and NARs from LC/NACO NAF by XSLT • MARC 005 (timestamp) • If timestamp more current on the NAR from NAF Overlay the NAR in ILS
Task #3: Export/Overlay of NARs • MarcEdit • Export updated NARs into ILS • Through TCP/IP (Host address, Port, .mrc file) • One-by-one (though .mrc file can contain multiple NARs)
Task #4: Updates of Bib AAPs • XSLT • To detect changes in 1XX between old and new NARs • To build AAP conversion table (a TXT file) when 1XX is changed • AutoIt • Automate bib AAP updates by “Global Update” module in ILS • Read old and new AAPs from the TXT file and fill out info required in “Global Update” process
Task #5: Automation • Use AutoIt to: • Link up various steps in the workflow • Automate searching against OCLC LCNAF SRU Service by compiling and sending HTTP requests • Execute various XSLTs in a predetermined sequence • e.g. NAR comparison AAP comparison • Read TXT files (LCCN list, AAP conversion table) created by XSLT processes • Run MarcEdit to overlay obsolete NARs • Execute “Global Update” process
Basic Workflow Search by AutoIt LCCNs Extract by XSLT Retrieve Compare by XSLT ILS NARs LC/NACO NARs Updated Headings Updated NARs Extract by Create Lists Overlay by MarcEdit Global Update ILS
Data Integrity Issue #1 • No ILS ARN in extracted NARs • Needed for 949 overlay command • Solution • Extract “LCCN” & “ILS ARN” pair through Create Lists • Merge ARN into extracted NARs (907$a) by XSLT/MarcEdit
Data Integrity Issue #2 • NARs without 010 • 010 contains LCCN • Some LCCNs transposed into 035 • Original prefix (n, no, nb, nr) removed • Prepended with prefix (OCoLC) • Possibly done during system migration • Solution • Search string in 035 (excl. prefix) as keyword in SkyRiver • Retrieve complete LCCN from matched record • Search retrieved LCCN against OCLC Service and download the record
Data Integrity Issue #3 • Existing NARs without 005 • No timestamp • Bring in the new NAR whenever the old NAR lacks 005
Data Integrity Issue #4 • Local data in NAR • Local call no. (e.g. 050, 090, 053$5) • Institution code & initials (shared catalog) • Copy local data into new NAR before overlay
Search and Retrieval Issue #1 • “Blank” XML File from OCLC LCNAF SRU Service
Search and Retrieval Issue #1 (Cont’d) • No hit for some LCCNs • XML file size: < 2KB • LCCNs in places other than 010$a Not indexed • Cancelled LCCNs (010$z) • Solution • Compile a list of LCCNs with file size < 2KB • Search LCCNs in SkyRiver by Keyword • Get new LCCNs from 010$a • Search OCLC LCNAF SRU Service using new LCCNs • But …
Search and Retrieval Issue #2 • Keyword search in SkyRiver returns multiple hits • Undifferentiated & related NARs • Write LCCNs with multiple hits to a log file for manual review Person broken out from undifferentiated NAR Original undifferentiated NAR cancelled
Search and Retrieval Issue #2 (Cont’d) • Keyword search in SkyRiver returns multiple hits • Same numeral part of LCCN with different prefixes • Write LCCNs with multiple hits to a log file for manual review NAR contributed via RLIN NAR contributed via OCLC
Search and Retrieval Issue #2 (Cont’d) • Keyword search in SkyRiver returns no hit • The LCCN in question no longer exists in NAF • NAR containing cancelled LCCN was cancelled again • Loss of 010$z • Write no-hit LCCNs into log file for manual review
Search and Retrieval Issue #2 (Cont’d) • Keyword search in SkyRiver returns no hit • False negative • Space between prefix and number removed • Hyphen within number removed (e.g. n 85-342238 n 85342238) • Search normalized LCCNs • Delay in returning result for a search due to slow or unstable Internet connection speed • Set a longer wait time before trying to copy new LCCN • Run keyword search in SkyRiver in loop until • Number of entries in log file equals to immediate preceding round, or • File size of the no-hit log file equals zero
Global Update Issues • ILS interface navigation • AAPs with diacritics • Found by search in Global Update module but couldn’t be replaced • Code points & exact match in Global Update • Old AAPs not found • Corresponding bib records deleted “Orphan” NARs • Write LCCN to log file for manual review
Revised Workflow Retrieve New LCCN Search Search LCCNs Not Found & Search Found & Retrieve Extract Not Found/ Multiple Hits Compare LC/NACO NARs ILS NARs Updated AAPs Updated NARs Fishy NARs Merge Log File AAPs Not Found ARN- LCCN Global Update Overlay by MarcEdit ILS Extract
Test Results • 82,398 NARs tested • 81,362 NARs needed to be overlaid* • 4,584 AAPs became obsolete • 10,900 bib records had at least one heading flipped * Many NARs exported from ILS do not contain field 005
Limitations • Identities broken out from undifferentiated NARs can’t be detected • Partially taken care of by “New Headings Report” • AAPs have no corresponding NARs • Non-Latin script parallel APs in Field 880 • Scalability issues • Slow export using MarcEdit • Slow “Global Update” process • Memory intensive XSLT process • “Java heap space” out of memory error
Possible Enhancements • “Data Exchange” module for NAR overlay • Data Exchange module – record load function • Manual intervention needed • SQL backend of Sierra (Sierra DNA) • Write SQL commands to batch changes • But, EDIT function not yet available through SQL command • AACP (Automatic Authority Control Processing) • Flip AAPs matching 4XX in NARs to corresponding 1XX in an overnight process • Replace “Global Update” with AACP • “Rig” undated NARs by inserting obsolete AAP as 4XX • Export “rigged” NARs to ILS to trigger the overnight process • Overlay exported “rigged” NARs in ILS with original updated NARs
Questions? • Lucas Mak (makw@msu.edu)