120 likes | 145 Views
Explore the development strategy and future enhancements for PANDORA, a national and potentially distributed infrastructure for capturing, storing, and accessing digital content. Expect exponential growth, improved workflow efficiency, and better resource discovery.
E N D
Context • Perceived Wisdom Accessing information from the Internet is like trying to drink from a fire hose • “It can’t be done” • “It will not scale” • “It is too expensive” • The goal posts keep moving as authors use the browser feature du jour • And the technical challenges are … • Systems/tools for capture, creation, storage, display and access • Metadata support • Access control and rights management • Preservation and ongoing access
Development Strategy • Expect the archive to grow exponentially (at least a factor of two each year) • Develop PANDORA as a national and potentially distributed infrastructure • Develop PANDORA in the context of other collecting strategies, eg electronic deposit and whole of domain web capture • Buy not build
What is PANDORA Today? • PANDAS • The ILMS of PANDORA • Systems/tools for capture, creation, storage • Metadata support • Access control • PANDORA’s Box • The Stacks of PANDORA • Large scale storage supporting ongoing access and long term preservation • PANDORA’s Lid • The Reading Room for PANDORA • Controlled public access to the archive using contemporary browsers • Appropriate resource discovery tools
PANDAS • Improve workflow efficiency • Provide more effective quality assurance tools • Develop ability to allow publisher’s to push material into the archive • Keep pace with web publishing technology • Database-driven services • Streaming delivery
PANDORA’s Box • The archive is currently approximated 1.5 million objects requiring 150GB of storage • … and growing fast • The Digital Object Storage System (DOSS) • Large scale storage system for Digital Collections • Initial system configuration provides 5 TB of storage • System can be scaled to 25 TB • PANDORA will migrate to DOSS for the end of July
DOSS Architecture Ethernet 100 Mbs SCSI 80 MBs DB Server Web Server DOMS Server Fibre Channel 100 MBs SAN Switch Tape Library Disk Arrays
PANDORA’s Lid • Initial release will go into production by the end of July, and will support • Automatically generated title entry pages • Access Controls • Improved resource discovery • Browse by title • Browse by subject • Full text search • Metadata search • And it will look better too!
PANDORA’s Lid futures • Better integration with the Library Catalogue • Full metadata support • Facilitate the research use of the archive though the development of appropriate navigation tools • Support more sophisticated rights management • Better browser support
Towards a Distributed National Archive PANDORA currently supports distributed collection management and access through a central system The Library in partnership with other agencies will explore “more” distributed models Currently the model being discussed is that of agencies having the choice to maintain local archives and access with a central metadata repository and access portal Two possible architectures have been proposed
Distributed Storage • Enhance existing system to allow agencies to have local copies of PANDORA’s Box and their own public access system • Can be done in the short term • Management is central • Gathering may be local or central • Archiving is to a local system • PANDORA’s Lid provides normal functionality
Distributed PANDORA • Each agency would provide local management, gathering, storage and access • National metadata repository and access portal may be real or virtual • Difficulties • Technology • Cost • A packaged hardware and software solution providing “PANDORA Appliance”