1 / 26

Periodical Holdings Audit :

Identify and correct discrepancies and errors in library catalogs and periodical A-Z lists, especially regarding improbable and incorrect data. Learn methods to resolve issues and improve data management for more accurate holdings statements.

mbroderick
Download Presentation

Periodical Holdings Audit :

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Periodical Holdings Audit: Correcting Discrepancies and Improbabilities in Catalogs andPeriodical A-Z Lists

  2. The Problem(s) • Two systems contain periodical holdings information: • The library catalog (III) • The A-Z list (EBSCO Holdings Management/Full Text Finder) • The two systems don’t always agree • Historically maintained by two different departments • Bad data has been copied from one system to another • Data corrupted by EBSCO on original ingest

  3. The Problem(s) Kinds of bad information: • Disagreement between systems • Disagreement between fields within a system • Impossible data (e.g. Holdings: 2010-1940) • Improbable data (e.g. 1869-Present) • Especially improbable for niche publications • Just plain incorrect (hard to detect)

  4. The Solution(s) • Identify known problems • Record problems found in real life • Model the problem in a way detectable by algorithm • “Algorithm”, sadly, does not imply the lack of grunt work • Fix ’em all • Imagine problems you don’t know you have • You probably have those problems too • Fix them too Ken, you might be projecting… Photo:Max Halberstadt, Public Domain

  5. What does “fix” mean RESEARCH! • Where data imported poorly to EBSCO, sometime the catalog alone is enough to clarify correct holdings statement • Often have to check with print holdings in person

  6. Methods • A lot of Excel: • Filter • Copy filter results to new table • Filter again • PHP & MySQL • If you are or have access to a programmer, almost any scripting language would do: Perl, PHP, Python, etc.

  7. Catalog vs. EBSCO data structure • EBSCO’s format does not allow for volume/issue information in a structured way • Only in the free-text, optional CoverageStatement field

  8. Example 1: Records exist in FTF, not in catalog Scenario: Record was deleted from catalog after export to FTF, not deleted from FTF Process improvement: When records are removed or suppressed from the catalog, change them in FTF too. Disagreement

  9. Example 1: Records exist in FTF, not in catalog Detection: • Export “serlist” records from catalog • Export records from FTF • Trim urls / bib records 7-digitse.g. (b1262517) • Compare using: • “Compare two lists” from MIT Bioinformatics & Research Computing • http://jura.wi.mit.edu/bioc/tools/compare.php • Remove FTF-only titles from FTF

  10. Example 1: Records exist in FTF, not in catalog http://jura.wi.mit.edu/bioc/tools/compare.php

  11. Example 2: Coverage “to Present” & End Date • EBSCO’s FTF metadata includes 3 columns related to holdings dates: • CustomCoverageBegin (date only) • CustomCoverageEnd (date only) • CoverageStatement (free text, supports volume #, date, etc.) • The use of multiple fields to cover the same information leads to the potential for discrepancies Disagreement

  12. Example 2: Coverage “to Present” & End Date Detection: • Excel filter: CustomCoverageEnd= ‘Present’ • Excel find: Coverage statement contains ‘-v’ or ‘- v’

  13. Example 3: Complex ≠ Simple holdings Scenario: FTF shows complex holdings and simple coverage statementOR Vice versa Disagreement

  14. Example 3: Complex ≠ Simple holdings Detection: • Excel filter: ‘|’ (pipe) in CustomCoverage • Excel filter: does not contain ‘ , ’ (comma) in CoverageStatement And then: Decide what to do about it…

  15. Example 4: Holdings "to present" but not listed as Retains Current • In our library, most current subscriptions are held in “Current Periodicals” • e.g.: “v.49(2012)-;Retains current volume in Current Periodicals.” • Places where that statement is missing are suspect • Some are legit, but many absences for current subscriptions indicate errors IMPROBABLE

  16. Example 4: Holdings "to present" but not listed as Retains Current Detection: • Excel filter: CustomCoverageEnd = ‘Present’ • Excel filter: Coverage Statement does not contain ‘Retain’ Results: • Some correct • Some withdrawn • Some should have had Current Periodicals statement

  17. Example 4b: Vice Versa “Retains current” but end date does not contain ‘Present’ • Found four total erroneous records • Errors in catalog • Errors in EBSCO ingest

  18. Example 5: Special Collections to ‘Present’ • We have very few titles in Storage or Special Collections with current subscriptions. • There were 55 questionable titles • Most: Catalog record was out of date • Some: sloppy data ingest (e.g. a single volume or issue was recorded as the beginning of a series: e.g. n.10(1938)  n.10(1938)- IMPROBABLe

  19. Example 5: Special Collections to ‘Present’ Detection: • Limit holdings to PackageName = “THOMAS RARE” • Or one of several other special collections locations • Limit to CustomCoverageEnd contains “Present”

  20. Example 6: Impossible Date Ranges Items with non-sequential holdings • Lib. Has: n.16(1985),n.22(1987),n.24(1988),n.56(1964)-; • Lib. Has: v.21(1896)-v.22(1987),v.28(1900)-v.86(1929) Detection: • Did not find a good way to do this! • Fixed them as we found them IMPOSSIBLE

  21. Example 7: Old News We don’t have a science library anymore, but: • Solution: • Create List of Bib Records WHERECHECKIN has ‘sci’ IMPOSSIBLE

  22. Example 8: LibHas vs. CoverageStmt What if we just look for basic textual disagreement?LibHas statement (catalog) is textual different from the Coverage Statement in EBSCO Disagreement

  23. Example 8: LibHas vs. CoverageStmt • For each record, compare the catalog record with EBSCO’s url for the item Catalog: b10242582URL: http://ezra.wittenberg.edu/record=b1024258~S0 • Create List in catalog, export Record #, Title, LibHas. • Export titles from EBSCO, including URL

  24. Example 8: LibHas vs. CoverageStmt • Match based on record # / URL, compare holdings statements • I did this with a PHP script & MySQL database, comparing strings • You could try Excel, something like: =INDEX(‘ebsco'!$V:$V,MATCH(B2,‘ebsco'!$E:$E))but I had trouble getting this to workIn the standard EBSCO export format $E:$E is the URL column, $V:$V is the Coverage StatementIn this example, B2 contains a catalog URL to match on

  25. Example 8: LibHas vs. CoverageStmt Results of this approach • All records in main periodical collection (n = 2029) • LibHas != CoverageStatement (498) • Eliminate blank CoverageStatement (276) • Control for varied spacing and quotation marks (219) • Newly introduced by weeding project (156) • Other errors (63) Limitation: • Only works where CoverageStatement was defined

  26. Future directions • Improve staff workflows • Periodic checks for data consistency • Exploring further mechanisms for comparisons/tests

More Related