1 / 18

Margaret Briand Wolfe Systems Librarian Boston College ELUNA May 8, 2015

Using Publishing Profiles to dump data out of Alma needed for resource sharing systems such as HathiTrust. Margaret Briand Wolfe Systems Librarian Boston College ELUNA May 8, 2015. When the call for data comes. HathiTrust Rapid ILL Browzine Your data extraction headache here.

Download Presentation

Margaret Briand Wolfe Systems Librarian Boston College ELUNA May 8, 2015

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using Publishing Profiles to dump data out of Alma needed for resource sharing systems such as HathiTrust Margaret Briand Wolfe Systems Librarian Boston College ELUNA May 8, 2015

  2. When the call for data comes • HathiTrust • Rapid ILL • Browzine • Your data extraction headache here ELUNA 2015

  3. Frustrations dumping data out of Alma and Analytics • 5,000 row Excel export limit in Alma • 65,000 row Excel export limit in Analytics • Alma Bibliographic Export Processes • MARC21 Binary • MARC XML • Entire MARC is too much data to sift through • Alma APIs • Too slow for millions of records • Daily limit to the number of API calls ELUNA 2015

  4. Solution: Alma Publishing Profiles • Is set based • Can only be published in full once, subsequent publishing contains the delta • Ex Libris says full re-publish is coming in a future release • Need a place for the published files to land, such as S/FTP server ELUNA 2015

  5. HathiTrust Files Requirement Print Holdings in 3 separate files: • Single Print Monographs • Multi-Part Monographs • Serials ELUNA 2015

  6. BC’s Managed Sets for HathiTrust • Sets built for 9 separate libraries for both books and serials using the Advanced Repository Search • Physical titles where library = O’Neill and material type = Books • Physical titles where library = O’Neill and material type = Issue or Bound Issue • Can combine sets but once combined sets become itemized sets instead of logical sets • I combined all serials sets into one itemized set and all sub-library book sets into one itemized set • O’Neill books stayed in its own logical set ELUNA 2015

  7. Normalization Rules • Publishing Profiles can use normalization rules to determine what data is output • See Alma Help, browse normalization rules if unsure how to add or edit a rule • Briefly: Resource Management -> Cataloging -> Metadata Editor -> File -> New -> Normalization Rule OR • Resource Management -> Cataloging -> Metadata Editor -> Rules -> Normalization Rules ELUNA 2015

  8. Normalization Rules • We use a rule that removes all of the MARC fields except: • 001 – contains system number (MMS ID) * • 035 – contains OCLC number * • 022 – contains ISSN. Used when set is for serials • 074 – contains government document number • 901 – publishing profile puts item description in 901 subfield a (more on this soon) * Required by HathiTrust ELUNA 2015

  9. Publishing Profiles – Profile Details • Resource Configuration -> Configuration Menu -> Publishing Profiles -> Add Profile -> General Profile • BC ended up with 3 publishing profiles: • O’Neill Books – uses logical set • All other sub-libraries’ books – uses combined itemized set • All serials – uses combined itemized set • Under Content -> Publish On: Bibliographic Level • Under Publishing Protocol can choose: • FTP or OAI. BC uses FTP • MARC Output format = MARC21 XML or MARC 21 Binary • BC uses MARC21 XML, 10,000 records per file • Added filename prefix to distinguish files for each of the 3 sets ELUNA 2015

  10. Publishing Profiles – Profile Details ELUNA 2015

  11. Publishing Profiles – Data Enrichment • Under Bibliographic Normalization – select normalization rule you created to only export the MARC data you want • Under Physical Inventory Enrichment – Check Add Items Information if profile is for books. Set repeatable field = 901, set description subfield = a. This puts the item description/enumeration in 901 tag, subfield a. This is used to find multi-part monographs.  ELUNA 2015

  12. Publishing Profiles – Data Enrichment ELUNA 2015

  13. Publishing Profiles - Actions ELUNA 2015

  14. What to do with all those files • Unzip them – I wrote a PERL script to unzip all of the files FTP’d by Alma onto one of our servers • Process them – I wrote a PERL script to read each XML file and process each record in the file. To go to Hathi Trust each record needed an MMS ID and OCLC number. • For Serials files I added the ISSN(s) if present • Multi-part monographs could only be identified by the presence of a description field • If 074 then set Gov Doc indicator = 1 ELUNA 2015

  15. HathiTrust elements I ignored • Holding Status • CH – Current Holding • WD - Withdrawn • LM – Lost or missing • Condition • BRT – Brittle, damaged and/or deteriorating ELUNA 2015

  16. Why I ignored them • Alma does not distinguish between items that are deleted versus items that have been withdrawn. • Lost and Missing statuses are stored in the item processing type. Could add to data enrichment from items. • We store brittle or deteriorating condition in the item internal note. Ditto. ELUNA 2015

  17. Your Turn • What have you done? • How can we do this better? • What should we ask Ex Libris for to make this process easier? ELUNA 2015

  18. Contact Me Margaret Briand Wolfe briandwo@bc.edu ELUNA 2015

More Related