1 / 43

Putting it all together for Digital Assets

Putting it all together for Digital Assets. Jon Morley Beck Locey. LDS Church History Library.

shasta
Download Presentation

Putting it all together for Digital Assets

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Putting it all together for Digital Assets Jon Morley Beck Locey

  2. LDS Church History Library • Our collections consist of manuscripts, books, Church records, photographs, oral histories, architectural drawings, pamphlets, newspapers, periodicals, maps, microforms, and audiovisual materials. The collection continues to grow annually and is a prime resource for the study of Church history. Our collections contains approximately: • 270,000 books, pamphlets, magazines, and newspapers • 240,000 collections of original, unpublished records (journals, diaries, correspondence, minutes, etc.) • 3.5 million blessings for Church members • 13,000 photograph collections • 23,000 audiovisual items

  3. Digitization Objective Patrons–Globalization – To provide access to library content to 14 million church members and the public throughout worldwide. Remote sites for new content. Internal Operations – To create an automated digital pipelinefrom digitizing content to patron consumption. Extending to crowdsourcing in future.

  4. Agenda • The first half of this presentation will focus on “what” we want to accomplish with an automated digital pipeline. • The second half will focused on “how” we built the digital pipeline.

  5. What we want to accomplish Digital Pipeline

  6. Physical Assets Digital Content Rosetta Aleph (Master Record) EAD Tool (Encoded Archival Description) Primo (Discovery) Church History Library 2011 2010

  7. Rosetta Dual Role Two instances of Rosetta • DRPS – Digital Records Preservation System (dark archive) • DCMS – Digital Content Management System (public display)

  8. Digital Pipeline Ingest Collection Metadata Metadata Rosetta Aleph (Master Record) EAD Tool (Encoded Archival Description) Primo PID 555 tag Harvest & Index 855 tag

  9. EAD Tool Interface

  10. Click on Thumbnail

  11. Patron can a item to be digitized.

  12. Organizing and displaying Collections

  13. EAD Tool(Encoded Archival Description)A finding aid that adds a viewable structure to a collection. • Many large, complex collections • Aleph stores only the collection level descriptive metadata • Add / edit / delete component level metadata • Rosetta 3.0 doesn’t have enough functionality for our collections.

  14. EAD Tool - Staff View

  15. EDIT Drag and drop XML or CSV files

  16. Challenges • Configuration is fairly complex within and between systems • Large Batch Ingest has been difficult • Re-ingesting content into Rosetta has been problematic (preservation system) • Collections Management • Built EAD tool to manage collections • Serials? • PDF Content • Progressive PDF download in Adobe Reader X is not fully supported until Rosetta v3 • 2 searches to see text within a PDF

  17. Successes • Ingested over 57,000 Family history books ranging from 1 MB to 800 MB for the Family History department. (books.familysearch.org) • Ingested 700,000 files. Most are linked to collections for Church History library. • Consistent user experiences with large files • Good response times • 75,000 views / Month • Once the pipelines are established, they work well. • Able to create customizable solutions for multiple institutions.

  18. Under Development • Restricted content • Based on IP addresses • authentication • Multiple viewing experiences (Responsive Design) • Reporting capabilities • Monitor usage

  19. How we glued it all together Digital Pipeline

  20. Digital Pipeline Rosetta Aleph (Master Record) EAD Tool (Encoded Archival Description) Primo Church History Library

  21. Catalog in Aleph • Physical assets are cataloged in Aleph (collection level). • Staff assigns a call number and Aleph assigns a BIB number. • While browsing, a patron requests that the asset be digitized. Call #: MS 2877 4 BIB #: 000114027 Aleph

  22. Digitized into Rosetta • The asset(s) from the Aleph collection are digitized. • A custom tool (SIP tool) queries the Aleph SRU server using BIB # to get collection level metadata. • Digitized assets, BIB # (CMS ID) and metadata are ingested into Rosetta. PIDs get assigned. BIB #: 000114027 Call #: MS 2877 4 Title: Parley P. Pratt… Rosetta Query Response Aleph SRU 000114027 MS 2877 4 Parley P. Pratt… +

  23. Rosetta to Aleph / EAD Tool • Every night a custom script (cron job) queries Rosetta using the OAI harvester. • The PIDs and item titles from Rosetta are inserted into the EAD Tool or prepared for Aleph (856 tags). • Every night a custom script (job_list) ingests the metadata into Aleph. Query Response Dublin Core Rosetta OAI EAD Tool Aleph

  24. From Aleph to Primo • Every night Aleph publishes all data sets to Primo (publish-06 in job_list). • Aleph data sets include collection level metadata plus 856 tags (Rosetta PID) or 555 tags (EAD link). • During harvest, Primo creates links from 555 or 856 tags which point to the EAD Tool and Rosetta respectively. Publish MARC XML Primo Aleph

  25. Digital Pipeline Notes • Aleph BIB # provides the linkage between the various systems. • Call # provides the reference • Linux scripting glues it all together. • Cron jobs, job_list, wget and custom logs work together to get data, re-format data, move data and start new jobs. • NFS shares allow us to move data around easily. Rights are a bit of a hassle. • Timing matters. • Pull from Rosetta, update EAD Tool, send to Aleph, publish to Primo, and run Primo pipes.

  26. Backup Slides

  27. Rosetta OAI Harvester • Linux script using wget requests Rosetta data from last 24 hours • Query: by publication set with from / until • Response: Dublin Core Rosetta OAI Query Linux script Response Wget http://<host>.<domain>.org/oaiprovider/request?verb=ListRecords& metadataPrefix=oai_dc&set=<pubset>&from=2012-08-20T20:00:00Z& until=2012-08-21T19:59:59Z

  28. Aleph SRU Server • Aleph SRU server response to Rosetta • Query: by BIB # (CMIS ID) • Response: Dublin Core (or MARC XML) Query Response Aleph SRU Rosetta

  29. Aleph SRU Server URL • http://<your-aleph-host>.<your-domain>.org:5661/ <xyz01>?version=1.1&operation=searchRetrieve... • …&query=dc.callno=“MS 318”&maximumRecords=1 • …&query=rec.id=000082419&maximumRecords=1 • …&query=dc.title=“Book of Mormon”&maximumRecords=5 • …&query=dc.subject=Apostles&maximumRecords=10 • …&query=dc.creator=“John Taylor”&maximumRecords=10

More Related