260 likes | 269 Views
Identify and correct discrepancies and errors in library catalogs and periodical A-Z lists, especially regarding improbable and incorrect data. Learn methods to resolve issues and improve data management for more accurate holdings statements.
E N D
Periodical Holdings Audit: Correcting Discrepancies and Improbabilities in Catalogs andPeriodical A-Z Lists
The Problem(s) • Two systems contain periodical holdings information: • The library catalog (III) • The A-Z list (EBSCO Holdings Management/Full Text Finder) • The two systems don’t always agree • Historically maintained by two different departments • Bad data has been copied from one system to another • Data corrupted by EBSCO on original ingest
The Problem(s) Kinds of bad information: • Disagreement between systems • Disagreement between fields within a system • Impossible data (e.g. Holdings: 2010-1940) • Improbable data (e.g. 1869-Present) • Especially improbable for niche publications • Just plain incorrect (hard to detect)
The Solution(s) • Identify known problems • Record problems found in real life • Model the problem in a way detectable by algorithm • “Algorithm”, sadly, does not imply the lack of grunt work • Fix ’em all • Imagine problems you don’t know you have • You probably have those problems too • Fix them too Ken, you might be projecting… Photo:Max Halberstadt, Public Domain
What does “fix” mean RESEARCH! • Where data imported poorly to EBSCO, sometime the catalog alone is enough to clarify correct holdings statement • Often have to check with print holdings in person
Methods • A lot of Excel: • Filter • Copy filter results to new table • Filter again • PHP & MySQL • If you are or have access to a programmer, almost any scripting language would do: Perl, PHP, Python, etc.
Catalog vs. EBSCO data structure • EBSCO’s format does not allow for volume/issue information in a structured way • Only in the free-text, optional CoverageStatement field
Example 1: Records exist in FTF, not in catalog Scenario: Record was deleted from catalog after export to FTF, not deleted from FTF Process improvement: When records are removed or suppressed from the catalog, change them in FTF too. Disagreement
Example 1: Records exist in FTF, not in catalog Detection: • Export “serlist” records from catalog • Export records from FTF • Trim urls / bib records 7-digitse.g. (b1262517) • Compare using: • “Compare two lists” from MIT Bioinformatics & Research Computing • http://jura.wi.mit.edu/bioc/tools/compare.php • Remove FTF-only titles from FTF
Example 1: Records exist in FTF, not in catalog http://jura.wi.mit.edu/bioc/tools/compare.php
Example 2: Coverage “to Present” & End Date • EBSCO’s FTF metadata includes 3 columns related to holdings dates: • CustomCoverageBegin (date only) • CustomCoverageEnd (date only) • CoverageStatement (free text, supports volume #, date, etc.) • The use of multiple fields to cover the same information leads to the potential for discrepancies Disagreement
Example 2: Coverage “to Present” & End Date Detection: • Excel filter: CustomCoverageEnd= ‘Present’ • Excel find: Coverage statement contains ‘-v’ or ‘- v’
Example 3: Complex ≠ Simple holdings Scenario: FTF shows complex holdings and simple coverage statementOR Vice versa Disagreement
Example 3: Complex ≠ Simple holdings Detection: • Excel filter: ‘|’ (pipe) in CustomCoverage • Excel filter: does not contain ‘ , ’ (comma) in CoverageStatement And then: Decide what to do about it…
Example 4: Holdings "to present" but not listed as Retains Current • In our library, most current subscriptions are held in “Current Periodicals” • e.g.: “v.49(2012)-;Retains current volume in Current Periodicals.” • Places where that statement is missing are suspect • Some are legit, but many absences for current subscriptions indicate errors IMPROBABLE
Example 4: Holdings "to present" but not listed as Retains Current Detection: • Excel filter: CustomCoverageEnd = ‘Present’ • Excel filter: Coverage Statement does not contain ‘Retain’ Results: • Some correct • Some withdrawn • Some should have had Current Periodicals statement
Example 4b: Vice Versa “Retains current” but end date does not contain ‘Present’ • Found four total erroneous records • Errors in catalog • Errors in EBSCO ingest
Example 5: Special Collections to ‘Present’ • We have very few titles in Storage or Special Collections with current subscriptions. • There were 55 questionable titles • Most: Catalog record was out of date • Some: sloppy data ingest (e.g. a single volume or issue was recorded as the beginning of a series: e.g. n.10(1938) n.10(1938)- IMPROBABLe
Example 5: Special Collections to ‘Present’ Detection: • Limit holdings to PackageName = “THOMAS RARE” • Or one of several other special collections locations • Limit to CustomCoverageEnd contains “Present”
Example 6: Impossible Date Ranges Items with non-sequential holdings • Lib. Has: n.16(1985),n.22(1987),n.24(1988),n.56(1964)-; • Lib. Has: v.21(1896)-v.22(1987),v.28(1900)-v.86(1929) Detection: • Did not find a good way to do this! • Fixed them as we found them IMPOSSIBLE
Example 7: Old News We don’t have a science library anymore, but: • Solution: • Create List of Bib Records WHERECHECKIN has ‘sci’ IMPOSSIBLE
Example 8: LibHas vs. CoverageStmt What if we just look for basic textual disagreement?LibHas statement (catalog) is textual different from the Coverage Statement in EBSCO Disagreement
Example 8: LibHas vs. CoverageStmt • For each record, compare the catalog record with EBSCO’s url for the item Catalog: b10242582URL: http://ezra.wittenberg.edu/record=b1024258~S0 • Create List in catalog, export Record #, Title, LibHas. • Export titles from EBSCO, including URL
Example 8: LibHas vs. CoverageStmt • Match based on record # / URL, compare holdings statements • I did this with a PHP script & MySQL database, comparing strings • You could try Excel, something like: =INDEX(‘ebsco'!$V:$V,MATCH(B2,‘ebsco'!$E:$E))but I had trouble getting this to workIn the standard EBSCO export format $E:$E is the URL column, $V:$V is the Coverage StatementIn this example, B2 contains a catalog URL to match on
Example 8: LibHas vs. CoverageStmt Results of this approach • All records in main periodical collection (n = 2029) • LibHas != CoverageStatement (498) • Eliminate blank CoverageStatement (276) • Control for varied spacing and quotation marks (219) • Newly introduced by weeding project (156) • Other errors (63) Limitation: • Only works where CoverageStatement was defined
Future directions • Improve staff workflows • Periodic checks for data consistency • Exploring further mechanisms for comparisons/tests