E N D
What is Rosetta? Rosetta is a completedigital asset management and preservationsolution that addresses the ever-growingneed to collect, archive and preserve the digitally-born and digitizedmaterials stored at academic institutions, research organizations, and government institutions, ensuring data integrityand access over time.
Agenda 1 The Need The Challenges 2 3 Rosetta Solutions 4 Data Model Who’s Using Rosetta and How 5
Need for Digital Preservation Today’s world is digital. If a file can’t be opened, probably the reasons are: Corrupted media Missing rendering application Un-identified file format
Need for Digital Preservation All Kinds of institutions must preserve & provide long term access to information Medical Records Research Data Digitized Collections Cultural Heritage Website Archives Legal Documents Audiovisual Museums
Challenges • Active preservation principles: • Ensuring bit integrity • 2) Ensuring content health • Format viability • Complete metadata • Provenance • 3) OAIS compliant system
Challenge 1: Bit Integrity • Fixity checks determine if data has changed or corrupted • Basic feature found in asset management as well as preservation solutions • Does not guaranty data access – just that it has not changed
Challenge 2: Content Health • Formats evolve rapidly and become obsolete • File access requirements: • Positive ID of format e.g. pdf • SW application e.g. Acrobat reader • Complete Metadata: • Technical metadata (e.g. size, resolution, compression, etc) • Descriptive metadata (e.g. author, title, publisher, etc) • Provenance Metadata
Rosetta Solutions - Key Features Community Driven Knowledge Base Active Preservation Scalable Open & Integrative Ready to use Configuration Flexible Delivery
Rosetta Solutions – Community Knowledge Base • Library of formats with metadata and extraction tools • Based on PRONOM global library • Formats associated to applications and risks • Supports integration with a global library • Auto update format library with each SW version
Rosetta Solutions - Active Preservation • Manages preservation planning process from risk to action • Allows evaluation and comparison of alternatives • Based on best practices and recommended workflows • Community knowledge sharing Execute Evaluate Identify Operational Storage …… Migration Action Permanent Storage
Rosetta Solutions - Scalable • Proven scalable architecture capable of ingesting and processing millions of files/day • Scale wide and dedicate servers to particular roles • Flexible configuration to allow for growth • Failures handled gracefully to minimize manual intervention
Rosetta Solutions - Open & Integrative ILS/CMS Systems Submission Apps Rosetta Search Engines Storage Abstraction Plug Ins (validation, migration, enrichment, etc)
Rosetta Solutions – Submission Applications • Deposit work flows out of the box • Automated (ftp, NFS, etc) • Manual • SDK (software development kit) with API’s allows building submission tools to interact with Rosetta deposit module Rosetta Automatic Submission App Publisher (e.g. newspaper)
Rosetta Solutions - ILS/CMS Systems • Synchronization with ILS / CMS systems • Interface uses integration standards such as SRU and OAI. Other ILS
Rosetta Storage Abstraction Rosetta Rosetta SDK allows to create plugins in order to interact with any storage Storage Abstraction Layer Plugin Plugin Plugin NFS NetApp IBM
Rosetta Solutions – Search Engines • Publishing module allows information exchange with external systems • Allows publishing different object groups in different formats • Provides a set of API’s and SDK for access • OAI interface out of the box … Search engine agnostic
PREMIS • Preservation metadata: implementation strategies • International working group concerned with developing metadata for use in digital preservation • Metadata for intellectual entities, events, agents and rights • Data model consisted of several entities: • Intellectual entity • Representation • File • Bit-stream
METS Ex Libris has a METS profile that will be published and open. Each Intellectual Entity is one METS Each representation is a file group Structure map is on the representation level Metadata stored for all levels descriptive as DMD and preservation as AMD.
Data Model Intellectual Entity Representation File Bit-Stream 1 N 1 N 1 N is the set of files, including structural metadata, needed for a complete and reasonable rendition of an Intellectual Entity is a named and ordered sequence of bytes that is known by an operating system A bit-stream is data within a file that has meaningful common properties for preservation purposes. a coherent set of content that is reasonably described as a unit, for example, a particular book, map, photograph, or database
Data Model Example - Book Intellectual Entity Representation Master Representation Modified Master Representation Access Copy JP2 JP2 JP2 TIFF TIFF TIFF JPG JPG JPG PDF
Data Model Example - Image Intellectual Entity Representation Master Representation Modified Master Representation Access Copy JP2 TIFF JPG
Support for Digitization Projects • Bavarian State Library (BSB) - Current mass digitization projects • Public-Private-Partnership with Google • more than 1 million books (in less than 10 years), more than 300 million pages • Books printed in the 16th century • 37.000 titles; 7.500.000 pages
Preserving and Managing Local Dissertations • Offering additional alternative platform for non-published materials, for example: ETH Bibliothek’s e-collection
Special Collections Ex Libris Ltd., 2010 - Internal and Confidential
Preserving Cultural Heritage Collections National Library of New Zealand’s Royal Ballet Photos
Digitally-Born Collections (Websites) Ensure the library stays relevant in the digital era National Library of New Zealand Web Site Harvest
Selected Rosetta Customers Background Collections in Rosetta University • Zurich, Switzerland • Leading technological institution • DataCite partners • Research data • Special collections • Dissertations Background Key Areas of Collaboration National Library • Wellington, New Zealand • Development partner • Mandate for digital preservation • Nation’s Cultural heritage • Private collections • Websites
Selected Rosetta Customers Background Collections in Rosetta University • Binghamton, NY, USA • Part of the SUNY system • FTE: ~14K students • Staff: 1.5FTE (not dedicated) • Special collections (Edwin A. Link collection) • Born digital newsletters • University photographs State Library Background Key Areas of Collaboration • Munich, Germany • Service providers for Bavaria • Part of the Google Books project • Scanned manuscripts and rare books • Legal deposit documents • Websites
Selected Rosetta Customers National Archives Background Collections in Rosetta • Wellington, New Zealand • Merged with the National Library • Integrating Archway • Legal documents • Archival collections • Government papers Service Providers Background Collections in Rosetta • Leuven, Belgium • LIBIS services providers • Replacing DigiTool • Integrating with Aleph and Primo • Special collections • Faculty papers • e-mails • Video collections
China Rosetta Test Server: rosetta.cceu.org.cn http://rosetta.cceu.org.cn:1801/deposit