410 likes | 527 Views
Models, Architectures, and Technologies of Digital Libraries (1). Session 3 LIS 60639 Implementation of Digital Libraries Dr. Yin Zhang. 1. Choosing a Repository Architecture Reese & Banerjee (2008): Ch. 3. Why is it important for choosing an architecture?.
E N D
Models, Architectures, and Technologies of Digital Libraries (1) Session 3 LIS 60639 Implementation of Digital Libraries Dr. Yin Zhang
1. Choosing a Repository Architecture Reese & Banerjee (2008): Ch. 3
Why is it important for choosing an architecture? • Repository functionality is closely tied to the platform • The best architecture for a repository depends on the purpose of the repository and its anticipateduse • Some platforms are better suited to centrally managed repositories while others are based on a community model • Various platforms are usually optimized for working with specific needs and types of resources • A platform should facilitate various functions needed for the repository
Factors to consider when choosing an architecture • There are many important factors to consider • Bottom line: • DL systems may not last forever, all DL materials should be • current systems should have the ability to migrate resources and metadata to a different platform in future
Required Features: General • Accepts content in many formats • Ingestion/addition mechanism is not limited to existing formats • Scalable for consistent growth and heavy use • Security and privacy • Proper monitoring of system and resource health • Adequate reporting of metadata, resource, and system activity
Required Features: Interface • Accessible to staff and patrons with disabilities • Browse and search • Display content on the web • Cross collection search • Services, e.g., reference and search • Cataloging tools allow the creation of standard metadata
Required Features: Metadata • Descriptive – identifies items • Technical – details requirements for using a resource • Administrative – describes usage restrictions, rights, original source, etc. • Structural – defines relationships with other resources • Must be able to add new metadata fields or schema to accommodate future needs • Automatically records and stores timestamp of creation and modification of resources • Automatically assigns unique identifiers that are independent of location and protocol
Required Features: Maintenance and data preservation • Content preservation is not dependent on hardware or software • Exports resources • Exports metadata • Ensures data integrity • Has robust backup and restore capability
Discussion and Reflection • Issues raised in this reading • How such issues are addressed in your DL case
2. Key principles in assessing repository models Oya Y. Rieger (2007). Select for success: Key principles in assessing repository models. D-Lib Magazine, July/August 2007, Volume 13 Number 7/8.
Background • Many cultural and educational institutions are in the process of selecting or developing repositories to support a wide range of digital curation activities, including • content management, submission, ingest, archiving, publishing, discovery, access, and preservation • There is an increasing emphasis on deploying systems that support content re-purposing and delivery of a wide range of web services.
Processes in Selecting a Repository Model • This article offers strategies to match specific institutional requirements with repository system features and functionalities. • The repository model selection process involves several essential stages: • Identify key stakeholder • Conduct a needs assessment analysis • Identify resource requirements • Understand the existing human landscape
Discussion and Reflection • Issues raised in this reading • One of the challenges in providing an inclusive methodology for selecting and evaluating repository models is the heterogeneous nature of the content types and user communities. • How such issues are addressed in your DL case
3. A framework for building open digital libraries Hussein Suleman and Edward A. Fox (2001). A framework for building open digital libraries. D-Lib Magazine, December 2001, Volume 7 Number 12.
The Problem and Solution Problem: • Networked information systems are constantly evolving to keep pace with Internet innovation • DLs are thus expected to demonstrate the careful management of libraries while supporting standards that evolve at an astonishing pace. • This architectural moving target is a predicament that all DLs face sooner or later in their lifecycle, and one that few manage to deal with effectively. Solution: • Build systems that are interoperable at the levels of data exchange and service collaboration. • Such interoperability requirements necessitated the development of standards such as • the Dublin Core Metadata Element Set, and • the Open Archives Initiative's Protocol for Metadata Harvesting (OAI-PMH). • These standards have achieved a degree of success in the DL community largely because of their generality and simplicity. • This article proposes a framework that help building extensible DLs.
Quick Facts • Dublin Core Metadata Element Set • Open Archives Initiative's Protocol for Metadata Harvesting (OAI-PMH) • In October of 1999 the Open Archives Initiative (OAI) was launched in an attempt to address interoperability issues among the many existing and independent DLs • The focus was on high-level communication among systems and simplicity of protocols. • The OAI has since received much media attention in the DL community and, primarily because of the simplicity of its standards, has attracted many early adopters. • OAI-PMH in essence supports a system of interconnected components, where each component is a DL. • Since the protocol is simple and is becoming widely accepted, it is far from being a custom solution of a single project. • The OAI protocol can be thought of as the glue that binds together components of a larger DL.
Open Digital Library (ODL) • Digital Libraries are modeled as networks of extended Open Archives, with each extended Open Archive being a source of data and/or a provider of services. • This network of extended Open Archives, an instance of which is illustrated in Figure 1, is herein called an Open Digital Library (ODL). Figure 1. Example networked architecture of an Open Digital Library.
http://www.ndltd.org/ An Example of ODL: Architecture of NDLTD ODL system
Discussion and Reflection • Issues raised in this reading • How such issues are addressed in your DL case
4. A scalable architecture for harvest-based digital libraries Xiaoming Liu, Tim Brody, Stevan Harnad, Les Carr, Kurt Maly, Mohammad Zubair, & Michael L. Nelson (2002). A scalable architecture for harvest-based digital libraries: The ODU/Southampton experiments. D-Lib Magazine, November 2002, Volume 8 Number 11.
Elements • Need a common infrastructure to support the requirements of applications based on the Open Archives Initiative (OAI) • Propose a scalable and reliable infrastructure that aims at satisfying these requirements
Challenges faced by OAI-PMH based applications • Data Provider (DP) and Metadata Quality • Not all archives strictly follow the OAI-PMH; many have XML syntax and encoding problems. • There are significant problems with metadata. With OAI-PMH, despite the syntax for metadata being strictly defined (XML schema validation), problems still appear. Some DPs do not strictly check their OAI-PMH implementations. • Server Availability • The stability and service from DPs are difficult to predict. If a large DP is periodically unavailable, this can be a serious problem for harvesting. • Scalability • OAI-PMH harvesting is resource-expensive to DPs because the HTTP responses are dynamically generated, and DPs may need to cache current harvest sessions. • Linking Across Service Providers (SP) • In OAI-PMH, several DPs may be harvested by many SPs, each providing different services for the same records. Cross-service linking and data sharing can be achieved by using unique OAI identifiers. Unique identifiers also allow the detection of record duplication.
Multiple SPs may harvest multiple DPs at the same time. • If one DP has implementation problems (e.g., XML encoding), all SPs have to address these problems. • If one DP is unavailable, all SPs have to wait until the DP comes up again, even if some SPs have already cached the data from the DP. Basic OAI-PMH model 1.1. An OAI-PMH proxy dynamically forwards requests to DPs from value-added services. 1.2. When transmitting the response, it can dynamically fix common XML encoding errors and translate between different OAI-PMH versions. 2.1. An OAI-PMH cache caches metadata, which it can filter and refine before re-exposing the data to SPs. 2.2. As a cache, it reduces the load on source DPs and improves DP availability. 3.1. An OAI-PMH gateway can convert the OAI-PMH to other protocols and applications. 4. End-user services provide various services, such as search and citation linking. Each layer may fetch data from any of its lower layers depending on availability and service type. Hierarchical Harvesting Model
Discussion and Reflection • Issues raised in this reading • How such issues are addressed in your DL case
5. Building expandable digital libraries Donatella Castelli and Pasquale Pagano (2003). A system for building expandable digital libraries
What is OpenDLib? • Expandability is one of the main requirements of future digital libraries. • OpenDLib is a digital library service systemdesigned to be highly expandable in terms of content, services and usage. • It is a general purpose software that could be customized to meet the needs of the different application frameworks. It enables different user communities to create their own DLs. • The role of OpenDLib is analogous to the role of a database management system (DBMS) for a database, i.e. it supports the creation and maintenance of distributed DLs. • A DL can be created by instantiating OpenDLib and then either loading or harvesting the content to be managed.
The OpenDLib Model • OpenDLib was built as a distributed digital library, according to the notion of individually defined services located anywhere on the Internet. • When combined, these services constitute a digital library. • The functionality of the OpenDLib digital library includes the storage of and access to multimedia and multilingual resources, cross-language search and browsing, user registration and personalized information dissemination of new incoming documents. • The OpenDLib federation of services will communicate through an established protocol. • The figure shows a conceptual model that specifies the notion of "OpenDLib service". • URL: http://opendlib.iei.pi.cnr.it/home.html
Discussion and Reflection • Issues raised in this reading • The authors strongly believe that future DLs will be built by first constructing a “core DL” (i.e. a core content accessible through core services able to satisfy the needs of a core set of users) and then expanding this DL to cover emerging needs and to exploit new opportunities. Expandability will be one of the main requirements of these DL systems. • How such issues are addressed in your DL case
6. A Distributed DL: NSDL Carl Lagoze, Dean B. Krafft, Sandy Payette, & Susan Jesuroga (2005). What is a digital library anymore, anyway? Beyond search and access in the NSDL. D-Lib Magazine, November 2005, Volume 11 Number 11
The Model • The paper presents an information model for digital libraries that • intentionally moves "beyond search and access", without ignoring those basic functions, and • facilitates the creation of collaborative and contextual knowledge environments. • This model is an information network overlay that represents a digital library as a graph of • typed nodes, corresponding to the information units (documents, data, services, agents) within the library, and • semantic edges representing the contextual relationships among those units. • The information model integrates local and distributed information integrated with web services, allowing the creation of rich documents (e.g., learning objects, publications for e-science, etc.). • It expresses the complex relationships among information objects, agents, services, and meta-information (such as ontologies), and thereby represents information resources in context, rather than as the result of stand-alone web access. • It facilitates collaborative activities, closing the loop between users as consumers and users as contributors.
Information network overlay (INO) • The concepts underlying the INO are illustrated in the following figure, with the following layers: • The primary resources or raw data selected for the library are shown at the bottom layer of the illustration. In the NSDL these are the web-accessible STEM resources. But, as noted earlier, these raw materials also consist of data sets, the agents and organizations that contribute to the library, and services. • The information network overlay is locus for modeling library resources, their descriptions, and the web of information that builds around them. It is first populated with the primary resources or references via metadata to them, which are shown as red nodes. The association and derivation of these nodes with the primary resource layer is shown by the solid red arrows. The dashed red arrows in the INO indicate initial relations between these nodes, such as the collection/item relations in the NSDL. In the NSDL, the initial populating of the INO is done via metadata harvest from collection providers, essentially duplicating the functionality of the phase I metadata repository. • The Access-Controlled API (Application Program Interface) provides full programmatic access to the INO. This includes read and write access to the components of the data model – data, documents, metadata, agents, relationships, etc. – and searching over the relationships – e.g., "find all resources contributed by DLESE"18. • The API can then be used by external contributors – e.g., users, services, ontology classification services, and the like – to enhance the information in the INO. These API-channeled requests, indicated by solid green arrows, add both new nodes (such as learning objects that aggregate existing resources) to the INO, indicated in green, and new relationships among the nodes, indicated by green dashed lines. • DLs built on the INO model will reflect expanding communities of knowledge built over the resources in the library.
Discussion and Reflection • Issues raised in this reading • In the age of Google, what is a digital library anymore, anyway? • Search and access over a set of resources, while important to any digital library, are not sufficient. • Digital libraries need to distinguish themselves from web search engines in the manner that they add value to web resources. This added value consists of establishing context around those resources, enriching them with new information and relationships that express the usage patterns and knowledge of the library community. The digital library then becomes a context for information collaboration and accumulation – much more than just a place to find information and access it. • How such issues are addressed in your DL case
7. Design and implementation of networked digital libraries Hussein Suleman, Edward A. Fox, & Devika P. Madalli (2003). Design and implementation of networked digital libraries: Best practices. DRTC workshop on digital libraries: Theory and practice, Bangalore.
Networked Digital Library (NDL) • The concept Networked Digital Library (NDL) is intended for resource sharing among digital libraries of similar interests and content. • NDL allows digital libraries with similar content, clientele and services form a network to give integrated services to users at all nodes thus resulting in making the benefits and impact of the digital library multifold. • Examples: • National Science Digital Library (NSDL) • Networked Digital Library of Theses and Dissertations (NDLTD) • The Networked Computer Science Technical Reference Library (NCSTRL) Which have integrated several smaller digital library projects to holistically serve communities across geographical and institutional boundaries.
Data collection and basic services are provided at each institution or organisation, which manages its own DL or DLs. • If there are multiple DLs for specific subject areas or subsections of the organisation, these can be merged into institution-wide archives (e.g., DLA). • Regional archives • can then interoperate with the institution-wide archives of each organisation while subject • archives can interoperate with the subject archives (e.g., DLA1) and extract subject-specific data • where individual subject archives do not exist (e.g., DLB). • In general, collection of data is done as close to the source as possible, thereby giving the creators • of data control over the management of the data. • Services, however, are provided at a sufficiently high level of aggregation so that the data is interesting to users. • This is the model followed by most current large-scale networked DL projects, including NDLTD, NCSTRL and CITIDEL.
The Networked Computer Science Technical Reference Library (NCSTRL) is a distributed digital library of technical reports published by computer science departments internationally. • Originally, the system was made up of a central site and multiple remote sites either running Dienst (3) or supporting a lightweight FTP-based protocol for metadata transfer (5). • Since the introduction of the Open Archives Initiative’s (6). Protocol for Metadata Harvesting (OAI-PMH) (4), the old system has been replaced by components adhering to newer interoperability standards.
Discussion and Reflection • Issues raised in this reading • The planning and implementation of networked digital libraries poses new challenges and involves • policy making regarding the members, content, content management, governance, maintenance, and the technical know-how. • Best practices • How such issues are addressed in your DL case