470 likes | 612 Views
British Computer Society North London Branch Major Programmes Richard Boulderstone July 27, 2004. Agenda. The British Library Vision Our Audiences/Customers ILS Digitisation Digital Object Management Web Archiving Collaboration Conclusions. Magna Carta. What Is The British Library ?.
E N D
British Computer SocietyNorth London BranchMajor ProgrammesRichard BoulderstoneJuly 27, 2004
Agenda • The British Library • Vision • Our Audiences/Customers • ILS • Digitisation • Digital Object Management • Web Archiving • Collaboration • Conclusions Magna Carta
What Is The British Library ? • Created by British Library Act 1972 - commenced 1973 • Merger of British Museum Library (1753), National Reference Library of Science and Invention (1855), National Central Library (1916), and National Lending Library for Science and Technology (1961) • Subsequent incorporation of British National Bibliography in 1974, India Office Library and Records in 1982, and British Institute of Recorded Sound in 1983 • Flagship building at St Pancras - largest public building project in Great Britain in 20th century - opened in 1998
World-Class Research LibraryKey Statistics 2002/3 • 150 million items • 8.2 million items consulted or supplied • 408,000 reading room visits • 618,000 catalogue records created • 554,000 items received on legal deposit • 651 km shelf capacity 92% full add 12 km each year • 18.5M Web Site Hits (www.bl.uk) • 2,400 staff • £85.2 million Grant in Aid and £27.0 million trading income in 2001/2 • Annual report - http://www.bl.uk/about/annual/latest.html
Outcome Based Vision … by aiding scientific advances … by adding commercial value for businesses … by contributing to UK “knowledge economy” … through the pursuit of academic excellence … through the stimulation of ideas … by adding to personal and family history … through increasing the nation’s cultural wellbeing … by giving information relevant to their interests … by helping to find the next medical breakthrough … by creating a link between the past, present and future Pride Innovation To help people advance knowledge to enrich lives Relevance ‘The World’s Knowledge’
High R+D Industries Prof. Services Creative Industries School Libraries Students 11>18 Publishing Industries Teachers Lifelong Learner Visitors (child + adult) Resource Discovery Bespoke Services Research Services Document Supply Reprographics Innovation Centre On-site Visits School Tours Web Learning Exhibitions Events Tours Publishing SMEs Lifelong Learner EDUCATION PUBLIC BUSINESS Lifelong Learner Postgraduate/ Undergraduate RESEARCHER Librarians Reading Rooms Bespoke Services Reprographics Publishing Document Supply Searching Tools LIBRARIES Scholars Document Supply Resource Discovery Training Best Practice Lifelong Learner Commercial Researcher Public Libraries H.E. Libraries Broadcasting e.g. BBC Publishing e.g. OED Public 2
Major Programmes/1 Da Vinci Notebook Integrated Library System (ILS) Programme
ILS: Development • Data migration • Due to finish in a few days • 16M+ BL records • 10M+ records from other sources • Online ILS software • All online changes made (mainly interfaces) – final tests • Web OPAC configuration – tested by staff, HE, expert • Batch imports / exports • Most ones done for go live • Rest in priority order
ILS: Implementation • Training • Courses to end-users well underway • ‘Practice’ system available • ‘Search only’ training also underway • Testing • Functional testing (end to end) nearly complete • Performance poor – OPAC very slow • Automated stress testing (LoadRunner scripts) • eIS trying to find area of problem • Ex Libris experts flying over • Some security ‘hardening’ needed
ILS: Cutover from legacy systems • Now: Temporary Aleph cataloguing • 7 June: Phase 1 – internal processing • Staggered take-on of users to ease cutover problems • Merge ‘temporary’ records • 30 June: Phase 2 – reading rooms • Reading rooms closed for cutover 26-29 June • Mainly brand-new PCs etc rather than XP upgrade • 30 July: Phase 3 – remote users • Could be delayed major problems
Future ILS development (ILS/2) • Current ILS development seen just as the start • Extra records • E.g. Sound archive, Manuscripts, Newspaper issues • Extra functions • E.g. Preservation records • Links to other new BL systems • E.g. Digital Object Management (images, web pages etc) • New releases of Ex Libris packages
Major Programmes/2 International Dunhuang Project Digitisation Programme
Background • Digitisation Is The Process Of Converting Existing Physical Items Into Digital Surrogates. • Digitisation Projects Must Take Into Account Metadata Creation, Optical Character Recognition, Navigation, Display, Archiving, Preservation. • Overall Cost Of Digitisation Projects Is Declining, From £10s/Image to Sub £1/Image. • Automatic Devices To Digitise Are Becoming Availalble But Are Expensive ~£150,000. • BL Has Had Fairly Ad Hoc Approach Driven By • External Funding Opportunities • Curator Interest • Projects Have Generally Created Their Own Approach, IT Resources, Project Management • BL Has Created About 1.5M Digital Images So Far…
Digitisation Strategy • Digitisation Strategy Project Was Formerly Initiated On February 2, 2004 • Key objectives for the project are to define: • Selection Criteria • Uniform Approach • Communications Plan • Sustainability • Intellectual Property Rights • External Relationship Management • Funding • Integration with DOMS
Project Status Information • Definitive Register of Projects • 19 Complete • 19 Current • 20 Planning • JISC Sound (3,900 Hours) • JISC Newspapers (2M Pages of 750M Pages) • Chopin (Collaborative Project) • Early English Books Online
Major Programmes/3 Gutenberg Bible Digital Object Management (DOM) Programme
DOM Programme vision • Our mission is to enable the United Kingdom to preserve and use its digital intellectual property forever • Our vision is create a management system for digital objects that will • store and preserve any type of digital material in perpetuity • provide access to this material to users with appropriate permissions • ensure that the material is easy to find • ensure that users can view the material with contemporary applications • ensure that users can, where possible, experience material with the original look-and-feel
Introduction - history • Digital Library PFI • Mar 1997 – Dec 1998 • Digital Library System • 1999 – early 2002 • Lessons • DOM Report • Nov 2002 • The DOM Programme • Started September 2003
Drivers for the BL DOM Programme • Legal deposit legislation for non-print material was granted royal assent in October 2003 • Existing voluntary deposit scheme operational since 2000 • Storage of digitised masters from early ’90s onwards • New digitisation initiatives: newspapers, sound, etc • Sound archive receives 12T of material per year (with 50 year collection) • Web archiving • Cartography and datasets • Electronic journals, picture library • … and …. • …. and …. • We need a generic and cost-effective approach for the secure long term storage of digital material that is produced by numerous initiatives
ILS WEB ARCHIVING DIGITISATION PROGRAMME LDEP WORKFLOW RIGHTS MANAGEMENT METADATA DEFINITION TECHNICAL REQUIREMENTS RESOURCE DISCOVERY FILE CONVERSION UTILITIES FILE FORMAT REGISTRY LDLSE VDEP AUTHENTICATION PERSISTENT IDENTIFIERS PROTOTYPES SDM RADM INTERFACES STRATEGY DEVELOPMENT DOM – many topics to address HIGH ESTIMATED SIZE OF COMPONENT LOW LOW COMPONENT AMBIGUITY / COMPLEXITY HIGH LDEP: Legal Deposit of Electronic Publications LDLSE: Legal Deposit Libraries Secure Environment RADM: Risk Analysis of Digital Materials SDM: Storage of Digitised Masters VDEP: Voluntary Deposit of Electronic Publications Started Planned Planned co-operation Non-DOM projects
Scope - life cycle of objects • Collection • Selection • Acquisition • Accession • Description • Preservation • Storage • Preservation • Access • Resource discovery • Delivery • Rendering
Scope – objects and processes • Preservation store • Preserves the bit stream in perpetuity • Access store • Access versions • Limited formats – in the flavour of the era • Metadata to support resource discovery • Descriptive, Administrative, Links with existing tools e.g. Integrated Library System (ILS) • Workflow • Ingest, e.g. Legal Deposit processing
ACCESS Resource Discovery Delivery Shared services Signing Authentication DOCUMENT SUPPLY Metadata Publishers Archives Persistent ID Non-Serial Store Grey Literature Archiving Operational Stores WEB ARCHIVING DONATIONS LEGAL DEPOSIT DIGITISATION Legal Deposit Items St Pancras Studios LDL Secure Environment Newspapers Legal Deposit Processing NSA DOM Digital Rights Management DOM Storage Ingest
Provide functionality for material covered by LDEP secondary legislation Timeline • Consolidate R0 into operational system • Provide preservation-quality digital store for materials received under Voluntary Deposit of Electronic Publications (VDEP) • Integrate it with the existing VDEP front-end Prototype will provide a basic preservation-quality digital object storage module ET approve Business Case & Timeline Definition. R0 BC R0 • Support ingest for a major content stream • Integrate with core Library systems as required Operat’l Storage Sub System. R1 R1 1st Content Stream ingest. R2 R2 R3 LDEP - initial format. R3 & R4 R4 Open DOM to new projects. R5+ 2003 2004 2005
DOM: Project definition - 1 Example issues digital rights, file formats, etc Functional Architecture “What” Prototyping – assessing market solutions allow changes to new suppliers, relationships to ILS, other projects etc Logical architecture “how – overall architecture” Prototyping - basic functioning architecture how do we build it cost-effectively today, supplier selection criteria Physical architecture “how – storage & specifics” Prototyping - principal solutions and options • Business case • Planning – incremental implementation phases Cross team workshops – reviewing progress, debating detailed technical issues, planning immediate priorities, risk management & way forward
DOM: Project definition - 2 • Approach is to be incremental and not ‘Big Bang’ • We prototype to learn, understand, reduce risk and uncertainty, and demonstrate the basis of a good solution • A principal goal is to define: • An overall long term “logical architecture” • Within which, there will be successive generations of physical architectures • We are understanding the storage marketplace, and we will use the knowledge to manage procurement • We are certain that we will need >500T of storage but we are uncertain when – we thus need flexible scalable procurement
Rights Management Resource Discovery ILS Non-cat based RD LDEP Doc supply DOM architecture - overview Others DOMID OBJECT DOM Storage Service Compound objects/relations Atomic Objects Unique persistent identifier (DOMID) Integrity Authenticity DOMID is mapped to node/vol/LRL Local resource locator Object DOM Physical Storage
Storage subsystem Storage subsystem DOM System (release 3) Aleph Access Mailroom Shared services Administration Publishers DOM System
DOM logical architecture – integrity and authenticity • Integrity: • System has capability to continuously monitor the object store to detect object corruption • It would then initiate object recovery • Authenticity: • A process is defined to provide long-term assurance that an object that is re-presented is as it was when it was ingested • Based on the use of cryptographic signing techniques • Each object is signed when it is ingested • The signature is verified when required • The signing mechanism is “tightly” controlled
Procuring physical storage in volume • A major cost is in physical storage • The market for storage systems is changing rapidly, and this implies that “lock-in” is not sensible • We thus need flexibility to change supplier over time • Cost of storage is reducing by 30-40% per year • Hence procure on rolling basis just ahead of demand • Replace storage on a rolling basis on expiry of warranty • The rolling programmes imply the need to be able to support a heterogeneous product solution • The design of the logical architecture thus supports storage sourced from multiple storage vendors
Disaster tolerance and the organisation of storage clusters • One can obtain commercial disaster recovery (DR) solutions for common equipment configurations • However one cannot obtain such solutions for systems comprising multi-100 Tb systems • So we must build in the need for DR into the design of the system • A single site solution, subject to a common-mode disaster, would suffer considerable loss of availability after a disaster, and so is not acceptable • This implies that we need a multi-site solution • Conventionally these are based on a master-standby where only 50% of kit is delivering normal service • Our design is based on the use of multiple autonomous independent peer clusters that cross-synchroniseso 100% of the kit delivers normal service
DOM architecture in the context of the storage solution market • The dominant segment of the market focuses on delivering performance within a highly resilient single cluster • However: • Many of our objects will be rarely accessed • so we do not want to pay for “maximised” performance we do not need • We have resilience by using multiple clusters, hence we have a reduced need for resilience within a cluster • so we do not want to pay for “maximised” resilience we do not need • We are using these drivers to design a cost-effective large scale resilient solution
DOM Storage gateway DOM Storage gateway Storage cluster Storage cluster DOM Storage Service DOM Storage Service DOM Physical Storage DOM Physical Storage DOM storage subsystem architecture - overview • DOM • Shared • Services • Unique ID • Signing • Logging
Normal access/delivery is from local storage cluster DOM Storage gateway DOM Storage gateway Storage cluster Storage cluster DOM Storage Service DOM Storage Service DOM Physical Storage DOM Physical Storage DOM storage subsystem architecture - access • DOM • Shared • Services • Unique ID • Signing • Logging • DOM central • Unique ID • Signing • Logging
DOM Storage gateway DOM Storage gateway Storage cluster Storage cluster DOM Storage Service DOM Storage Service DOM Physical Storage DOM Physical Storage DOM storage subsystem architecture - access When a cluster is off-line then access/delivery is from a remote storage cluster • DOM • Shared • Services • Unique ID • Signing • Logging • DOM central • Unique ID • Signing • Logging
DOM Storage gateway DOM Storage gateway Storage cluster Storage cluster DOM Storage Service DOM Storage Service DOM Physical Storage DOM Physical Storage DOM storage subsystem architecture - ingest Normal ingest is to the local storage cluster and then the remote cluster is synchronised Synchronise remote store Store Signing • DOM • Shared • Services • Unique ID • Signing • Logging • DOM central • Unique ID • Signing • Logging
DOM Storage gateway DOM Storage gateway Storage cluster Storage cluster DOM Storage Service DOM Storage Service DOM Physical Storage DOM Physical Storage DOM storage subsystem architecture - ingest When a cluster is off-line then ingest is managed by the remote storage cluster and the local cluster is synchronised later Synchronise remote store later Store Signing • DOM • Shared • Services • Unique ID • Signing • Logging • DOM central • Unique ID • Signing • Logging
In conclusion • We plan for generations of physical storage • Migration from one generation to the next • Allow changes of supplier • Purchase incrementally in modest quantities • Move quickly when required • Be cost conscious • We provide assurance that an object is held and re-presented as when it was ingested • We are designing a cost-effective large scale resilient solution • In summary: we take a long term view
Major Programmes/4 Web Archiving Programme
Structure of Programme • Web Archiving Programme is a collaborative initiative, roughly implemented across two consortiums • UK Web Archiving Consortium • Developing a selective approach to web archiving, procuring a common web archiving infrastructure and software to begin archiving activities at the earliest • International Internet Preservation Consortium • Developing advanced web archiving technologies for the long terms, large scale, continuous crawling requirements enabled through legislation
UK Web Archiving Consortium • Developing a selective approach to web archiving • License for PANDAS about to be signed with NLA • Sub-licenses with consortium partners and contractor to follow • ITT concluded with Magus Research winning the contract. • Implement a common web arching infrastructure (lots of Linux machines + PANDAS) • Provide customisation/development of PANDAS • Provide help desk and support
International Internet Preservation Consortium • Developing advanced web archiving technologies • Smart Crawler • Continuous adaptive crawler, adjusting crawl priority on the fly • Based on IA Heritrix • Working on requirements now • Expect to being tender process in June • Content Management • Archival formats • Framework • Metrics and Test Bed
Digital Library Collaborations/PartnershipsCurrent • UK Digital Preservation Collation • Founder Member • TEL (The European Library Project) • Web Archiving UK • JISC, Wellcome Trust, National Archives, National Library of Scotland & National Library Of Wales • International Internet Preservation Consortium • BNF, Library Of Congress, Internet Archive, National Archives & Library Of Canada, National Library Of Australia, National Library Of Italy, National Libraries Of Nordic Countries • JISC Funded - Digital Curation Centre • Persistent Identifiers • DOI Foundation, European National Libraries (KB & DDB) • Resource Discovery • Union Catalogues (SUNCAT) • Digital Library Federation
Digital Library Collaborations/PartnershipsPotential • Secure Legal Deposit Network • 6 Legal Deposit Libraries • Global Digital Format Registry • Potential Partners (National Archives, DLF) • Patch (Standards & Tools To Extend The OAIS Model For Emulation & Migration) • KB (Netherlands National Library & Other Partners – FP6 Bid) • Digital Rights Management • Potential Partners (Publishers, JISC) • Metadata • Publishers, Others ? • Authentication • JISC ? • Resource Discovery • Search Engine Vendors, Researchers • Others ???
Conclusions • Beautiful Building! • Market & Outcome Focus • Huge IT Agenda • Collaboration Is Critical To Our Success • Can You Work With Us?