1 / 29

An Introduction to the Merritt Curation Repository

UC3 Summer Webinar Series. An Introduction to the Merritt Curation Repository. University of California Curation Center Team California Digital Library June 9, 2011. First, a word about the webinar series…. A forum for timely topics of interest to the UC community

elden
Download Presentation

An Introduction to the Merritt Curation Repository

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. UC3 Summer Webinar Series An Introduction to the Merritt Curation Repository University of California Curation Center Team California Digital Library June 9, 2011

  2. First, a word about the webinar series… • A forum for timely topics of interest to the UC community • Highlighting projects, services, and developments in the areas of digital preservation, web archiving, and data curation • Intended to raise awareness of issues, and provide information on useful resources and services available to the UC community • 2nd and 4th Thursday of the month, and as scheduled, featuring UC3 staff and UC librarians, content managers, and technologists Teleconference +1 (866) 740-1260, access code 9879016# Webconferencehttp://bit.ly/jdjMAP

  3. First, a word about the webinar series… • Some logistics… • Participant phones will be muted during the formal presentation, but we will be monitoring the online chat • Slides, Q & A, and web and voice recordings will be posted after each presentation • Schedule available at http://www.cdlib.org/uc3/uc3webinars.html • Please suggest additional topics! uc3@ucop.edu • Take the short survey http://www.surveymonkey.com/s/XSGWP8R

  4. Now on with the show… • Today’s topic is an introduction to the Merritt curation repository • Who is it for? • What can it do? • Why use it? • What does it cost? • Next steps? • Q & A

  5. What keeps you up at night? How much will it cost? What’s the best strategy to ensure permanent availability? How do I know my content is safe? Are there standards or best practices I should be aware of? How can I transfer my content to an appropriate curation environment I have a good discovery platform; how can I add preservation services? Do I need to create new derivatives just for preservation purposes? Can I control who can see my content? How can I get a persistent reference to my content? What if my content needs to evolve over time?

  6. “There’s an app for that” Automatic replication and high-availability redundancy How much will it cost? What’s the best strategy to ensure permanent availability? Storage at $1.04/GB/year How do I know my content is safe? Are there standards or best practices I should be aware of? Periodic fixity audit UC3 consultation How can I transfer my content to an appropriate curation environment I have a good discovery platform; how can I add preservation services? Simple submission UI/API METS “feeder” duplicates existing DPR workflow Modular micro-services “toolkit” Do I need to create new derivatives just for preservation purposes? Can I control who can see my content? How can I get a persistent reference to my content? Model free No packaging, format, or metadata requirements What if my content needs to evolve over time? Curator-defined access control rules Integration with EZID and DataCite Strongly versioned

  7. Merritt repository • Merritt is available for use by all members of the UC community • Libraries/archives/museums • ORU/MRUs • Faculty/staff • Centrally hosted by UC3/CDL on behalf of the UC community • Economies of scale • Shared experience and expertise Mediated through campus libraries

  8. Modes of use: dark archive • Pro-active preservation, but no expectation of direct end user access • Legacy DPR content contributed by campus libraries • Cultural heritage texts, master images, sound, moving image, data sets • All DPR content will be automatically migrated to Merritt

  9. Modes of use: bright archive • Provide preservation and end user access • NIH Healthy Pathways project on bio-demographics • Multi-institutional: UC Davis, University of Colorado, University of Virginia, Syddansk University (Denmark) • Need to restrict access to project partners initially, with eventual public access

  10. Modes of use: bright archive • Content discovery: search

  11. Modes of use: bright archive • Content discovery: search

  12. Modes of use: bright archive • Content discovery: browse

  13. Modes of use: bright archive • Content discovery: browse

  14. Modes of use: preservation “back end” • Preservation only; content discovery/delivery provided by well-known external systems • Using direct hooks into Merritt to retrieve content • – eScholarship • Open access publishing • – Open Context • Archaeological data publishing • – Investigating integration with Islandora/Drupal and Alfresco

  15. Modes of use: distributed data grids • DataONE “Enable new science and knowledge creation through universal access to data about life on earth and the environment that sustains it”

  16. More information • Online help http://merritt.cdlib.org/help • FAQ http://merritt.cdlib.org/docs/merritt_handout.pdf • User’s guide http://merritt.cdlib.org/docs/merritt_user_guide.pdf • UC3 contact http://www.cdlib.org/uc3/contact.html uc3@ucop.edu

  17. Merritt cost model • UC3 provides technical infrastructure, data center hosting, staff, monitoring, maintenance, enhancements, help, outreach, consultation, etc. • Contributors are charged only for storage used, at the UC3 recovery rate of $1.04/GB/year • Developing an “endowment” model: Pay once, preserve forever • Will soon extend model for non-UC contributors • How does this compare? • Cost of a physical book in RLF †$ 4.62/year • Cost of a digital book in HathiTrust ‡$ 0.15/year • Cost of a digital book in Merritt $ 0.06/year † Gary Lawrence (2007) Internal analysis, CDL; ‡ Paul Courant and Matthew Nielsen (2010), On the cost of keeping a book, HathiTrust.

  18. Average collection sizes and costs A “cost calculator” spreadsheet is available at http://www.cdlib.org/uc3/docs/Merritt-cost-calculator-v3.xlsx

  19. Average ETD size and cost Based on 2009 holdings in ProQuest * UCSF based on total ETD holdings in Merritt

  20. Average research data size and cost • Almost 50% of all research data is less than 1 GB Source: Science 331:6018 (February 11, 2011): 692-693 <DOI: 10.1126/science.331.6018.692>

  21. Next steps • UC3 is working with campus partners to determine ongoing development and collection priorities New content acquisition

  22. Next steps • In production • Model-free objects • Submission via UI and API • Persistent identifiers • Format identification • Version provenance • Automated replication • Automated fixity audit • Role-based access control • Collections • Semantic index and search • Object/version/file download • In progress • Simplified update • Enhanced characterization (JHOVE2) • Faceted search and browse (XTF) • CMS/DAMS-like function (Islandora) • In planning • Simplified batch • UCTrust integration • Linked data • Transformation • Notification • Annotation • Support for NGTS/DLSTF recommendations • We welcome your feedback on needs and priorities! • http://www.cdlib.org/uc3/contact.html • uc3@ucop.edu

  23. Simplified update • Variant form of object update requiring the submission of only the changed components • Client-side tools to simplify the creation of batch manifests #%checkm_0.7 #%profile | http://uc3.cdlib.org/registry/ingest/mani #%prefix | mrt: | http://merritt.cdlib.org/terms# #%prefix | nfo: | http://www.semanticdesktop.org/onto #%fields | nfo:fileUrl | nfo:hashAlgorithm | nfo:hash http://merritt.cdlib.org/samples/goldenDragon.jpg | m http://merritt.cdlib.org/samples/tumbleBug.jpg | md5 http://merritt.cdlib.org/samples/generalDrapery.jpg | http://merritt.cdlib.org/samples/generalDrapery.jpg | #%eof

  24. Enhanced characterization • JHOVE2 next-generation framework for format-aware characterization http://jhove2.org/ • Automated extraction and inference of extensive technical metadata significant for preservation analysis and planning "Module": { "scope": "ICCModule“, "Header": { "scope": "ICCHeader“, "ProfileSize": { "unit": "byte“, "value": 60960 } ,"ProfileVersionNumber": "4.2.0.0“ ,"ProfileDeviceClass_raw": "spac“ ,"ProfileDeviceClass_descriptive": "ColorSpace Conversion profile“ ,"ColourSpace_raw": "RGB “ ,"ColourSpace_descriptive": "rgbData“ ,"ProfileConnectionSpace_raw": "Lab “ ,"ProfileConnectionSpace_descriptive": "labData“

  25. Enhanced discovery via XTF • eXtensible Text Framework http://xtf.cdlib.org/ • CDL developed/supported open source discovery platform • Robust, scalable faceted search and browse

  26. CMS/DAMS-like function • Many campuses are looking for CMS/DAMS solutions • Investigating integration with Islandora to provide a Drupal CMS/DAMS front-end to Merritt http://islandora.ca/ http://drupal.org/

  27. Questions?

  28. Upcoming webinars http://www.cdlib.org/uc3/uc3webinars.html • Please take the webinar survey http://www.surveymonkey.com/s/XSGWP8R

  29. For more information UC Curation Center http://www.cdlib.org/uc3 http://www.cdlib.org/uc3/contact.html uc3@ucop.edu • Stephen Abrams Margaret Low • Lisa Colvin David Loy • Patricia Cruse Mark Reyes • Scott Fisher Tracy Seneca • Erik Hetzner Joan Starr • Greg Janée Marisa Strong • John Kunze Perry Willett UC3 webinar series http://www.cdlib.org/uc3/uc3webinars.html Merritt repository http://merritt.cdlib.org/ http://merritt.cdlib.org/help http://merritt.cdlib.org/docs/merritt_handout.pdf http://merritt.cdlib.org/docs/merritt_user_guide.pdf

More Related