270 likes | 396 Views
Ya-ning Arthur Chen, Feng-chien Chung Computing Centre, Academia Sinica 11 April, ISGC 2008. A hybrid approach of digital long term preservation to institutional repositories - A case study of DSpace/SRB Integration. Outline. Background of MAAT From Website to Institutional Repository
E N D
Ya-ning Arthur Chen, Feng-chien Chung Computing Centre, Academia Sinica 11 April, ISGC 2008 A hybrid approach of digital long term preservation to institutional repositories - A case study of DSpace/SRB Integration
Outline • Background of MAAT • From Website to Institutional Repository • Long Term Preservation & OAIS • The Hybrid Approach • Future
MAAT – Background • The Metadata Architecture & Application Team (MAAT) was established in 2002 to engage in metadata research and service supportive for the National Digital Archives Program (NDAP) in Taiwan • To date, the MAAT has been supporting over 80 digital library projects of Taiwan E-Learning & Digital Archive Program (TELDAP, former: NDAP)
system specifications, best practices of metadata standards, technical reports, research papers, briefings, and tutorial materials. MAAT – Motivation • A number of documents have been created and can be categorized into • questionnaires, • work sheets, • meeting records, • metadata mapping tables, • Most documents of the MAAT website are arranged in a static manner.
MAAT Website http://www.sinica.edu.tw/~metadata Academia Sinica
MAAT - Consideration1 • Document management and repository • over 1,000 documents and URL links have been arranged and served at the MAAT website. • the MAAT website needs an effective system of document management. • Access control • The MAAT website still lacks access control for document access.
MAAT - Consideration2 • Workflow reengineering • the MAAT website adopts a centralized model to maintain documents and website arrangement. • This model is very complicated and labor-intensive, and the overhead cost is very high. • Usage Statistics Report
MAAT - Challenge • Too manypublications, • Too muchchange(that is various documentversions), • Too manycontributors, and • Too manyinstitutions.
Implementation Level Static Website Phase1: from website to IR Institution Repository
DSpace - feature • Captures • Digital research material in any format • Directly from creators (e.g. faculty) • Large-scale, stable, managed long-term storage • Describes • Descriptive metadata (Dublin Core) • Technical metadata (file size, format…) • Rights metadata (licenses, creative commons…) • Distributes • Via WWW, with necessary access control • Preserves • Persistent ID and Handle • Bitstream format registry
MAAT – Content1 • Content Type • 支援計畫 (Documents from the Projects we support) • 出版與活動 (Documents of Publication and Activity) • 計畫管理 (Project Management related – restricted documents) • 研究發展 (Research & Development - restricted documents) • 48 Communities, 110 collections, 783 items • Document Format • User upload: 794 pdf files, 446 ms word files, 59 ms powerpoint slides, 27 xml files, 17 jpeg images, 16 html files, 7 ms excel files…and the others • System generate: Over 1900 Plain Text files (mainly DSpace License files)…
MAAT – Content2 • Access Method • DSpace user browse and search interface • Search engines (google, yahoo…etc.) • OAI-PMH harvesting
MAAT DSpace http://pl11.sinica.edu.tw:8080/dspace/index.jsp
DSpace - Consideration • The Need for Extending DSpace Storage Capabilities • The amount of documents grows so fast that an enormous size storage solution is required • The Lack of Risk Management Mechanism • The Reliable Backup and Disaster Recovery Systems are not included in the default DSpace Installation
Implementation Level Statis Website Phase1: from website to IR Institution Repository Phase2: from IR to Long Term Preservation Institution Repository + Grid
DSpace/SRB Approach1 • In 2004, NARA (with NSF/NPACI) has funded a project aimed at integrating DSpace and SRB to • allow DSpace to use the data grid as a storage layer • permit the exchange of authentic documents between them • NARA Proposal & Participants • San Diego Super Computer Center (SDSC) • Member of National Partnership for Advanced Computational Infrastructure (NPACI) an NSF sponsored program • MIT Libraries • UC San Diego Libraries (UCSD) • Hewlett Packard Laboratories (HP) • National Archives and Records Administration (NARA)
DSpace/SRB Approach2 • In DSpace, there can be multiple bitstream stores, each of these bitstream stores can be traditional storage or SRB storage. • Both traditional and SRB storage are specified by configuration parameters. • Both traditional and SRB bitstream stores are configured in dspace.cfg
Examination of DSpace/SRB • An Open Archive Information System (OAIS) intends to preserve information for access and use by a Designated Community
OAIS Functional Model…Again DSpace RDBMS & SRB MCAT DSpace Submit Interface DSpace User Interface DSpace Ingest SRB Mass Storage DSpace Batch Import DSpace & SRB Administration
Producer, Management and Consumer DSpace RDBMS & SRB MCAT DSpace Submit Interface DSpace User Interface DIP • Producer • DSpace may play the role of ingest SIP from producer, and generate AIP for Management & Storage • Management • SRB May play the role of receive AIP then Store & Manage data, and generate AIP for Access • Consumer • DSpace May Play the role of process the access request and generate the proper DIP for dissemination DSpace Ingest AIP AIP SIP SRB Mass Storage DSpace Batch Import
Archives arrangement • Logical Archives structure: • DSpace allow multi-level communities and one level collection • Archive’s principle • Principle of provenance • Principle of respect des fonds • Physical Files Arrangement: • SRB Mass Storage Technology
Future1 • Best Practice & SOP for DSpace/SRB integration • Deeper Check Against Activities of OAIS • Preservation Planning and policy • Monitor Producer/Management/Consumer’s service requirements and emerging technology, develop archival strategy & migration plan
Future2 • Feasibility Evaluation • Migrate from SRB to others advanced technology, such as SRM, iRODS… • Adopt metadata approach to enhance digital preservation, such as PREMIS and METS (ex: structural map, behavior section…)