120 likes | 304 Views
The Digital Object Management Programme (DOM) Richard Masters, Programme Manager PRESERV Partners Meeting 18 th November 2005. 1. DOM Programme Mission and Vision. Our mission is to enable the United Kingdom to preserve and use its digital output forever
E N D
The Digital Object Management Programme (DOM) Richard Masters, Programme Manager PRESERV Partners Meeting 18th November 2005 1
DOM Programme Mission and Vision • Our mission is to enable the United Kingdom to preserve and use its digital output forever • Our vision is to create a management system for digital objects that will • store and preserve any type of digital material in perpetuity • provide access to this material to users with appropriate permissions • ensure that the material is easy to find • ensure that users can view the material with contemporary applications • ensure that users can, where possible, experience material with the original look-and-feel
DOM Programme Scope • Providing a generic and cost-effective infrastructure for the Library’s digital materialthat will • take in material of many types • take in material coming from many sources • store it all securely for the long term • allow controlled access • endure
DOM Programme Scope We already have a wide range of materials to deal with • Existing voluntary deposit scheme, operational since 2000 (1.5 TB) • Digitised versions of BL material, from early ’90s onwards (15 to 20 TB) • Electronic journals (1 TB) • New digitisation initiatives: newspapers, sound, etc • Sound Archive material (150 TB, growing at 30 TB per year) • Web archiving,Cartographic data, Picture library, Purchased and donated digital materials We must be prepared for • Legal deposit legislation for non-print material: royal assent was given in October 2003 but the law needs secondary legislation to bring it into force. The first materials will probably be hand-held (DVDs, CD-ROMs). Our storage planning figure is 300 TB after 5 years.
DOM System – key features • Scalability • 100s of TBs, millions of objects, millions of users • Resilience • Conventional DR is not adequate • Duty of care means we have to have multiple sites • Integrity and authenticity • Identify and repair damaged objects • A process is defined to provide long-term assurance that an object that is re-presented is as it was when it was ingested • Rights management • Current rights agreements, licences are complex legal documents • Separate policy and enforcement • Representation model • Need to deal with complex structured objects • e.g. digitised newspaper, OCR text, articles
DOM architecture – 2 key concepts • Heterogeneous Storage • Storage is supplied by several vendors • Storage is independent of all vendors • ‘Commodity’ storage • Avoid paying for unneeded features of high performance and high resilience • Multiple Sites • Same design implemented on several sites • But may be different equipment • 2 sites atfirst, aim for 4 • Dark Archive
DOM architecture – 2 more key concepts • Integrity • System can monitor the object store continuously to detect object corruption • It would then initiate object recovery • Authenticity • Long-term assurance that an object when presented is the same as when it was ingested • Based on the use of cryptographic signing techniques • Each object is ‘signed’ when it is ingested • The signature is verified when required • The signing mechanism is ‘tightly’ controlled
Digital Preservation DRM Repository Accession/Ingest Resource Discovery/User Interface • Combined Resource Discovery with other collections • Format Validation • Format Conversion • Request/Rerequest • Metadata Validation/Creation Metadata • Storage • Digital Preservation • Continuous Validation • Performance Management • Metadata Management Information / Technical Operations DOM Component Architecture Researchers Content Providers
Published papers • “The large-scale archival storage of digital objects.” • February 2005 • The 4th in the series of Digital Preservation Coalition Technology Watch reports, available at: • http://www.dpconline.org/graphics/reports/
Published papers Adam Farquhar et al“Design for the Long Term: Authenticity and Object Representation”Presented at the Archiving 2005 conference, April 2005 http://www.bl.uk/about/policies/dom/pdf/archiving2005l.pdf Sean Martin, with Mary Baker and Kim Keeton of HP Labs"Why Traditional Storage Systems Don’t Help Us Save Stuff Forever" Presented at the 1st IEEE Workshop on Hot Topics in System Dependability on June 30th 2005 in Yokohama, Japan.http://www.stanford.edu/~candea/hotdep/papers/baker_forever.pdf
UK Web Archiving Consortium • Developing a selective approach to web archiving • License for PANDAS about to be signed with NLA • Sub-licenses with consortium partners and contractor to follow • ITT concluded with Magus Research winning the contract. • Implement a common web arching infrastructure (lots of Linux machines + PANDAS) • Provide customisation/development of PANDAS • Provide help desk and support
International Internet Preservation Consortium • Developing advanced web archiving technologies • Smart Crawler • Continuous adaptive crawler, adjusting crawl priority on the fly • Based on IA Heritrix • Working on requirements now • Expect to being tender process in June • Content Management • Archival formats • Framework • Metrics and Test Bed