670 likes | 792 Views
Part 4. Next Generation Digital Libraries: Supporting Interoperability, Semantics, and Quality. OAI, ODL, DL-in-a-box. Open Archives Initiative since 1999, www.openarchives.org Open Digital Libraries since 2001, from www.dlib.vt.edu with Hussein Suleman (now U. Cape Town) DL-in-a-box
E N D
Part 4 Next Generation Digital Libraries: Supporting Interoperability, Semantics, and Quality
OAI, ODL, DL-in-a-box • Open Archives Initiative • since 1999, www.openarchives.org • Open Digital Libraries • since 2001, from www.dlib.vt.edu • with Hussein Suleman (now U. Cape Town) • DL-in-a-box • NSDL support since 2001 • Aimed to help new collections / services projects • http://dlbox.nudl.org
Open Archives Initiative (OAI) • Advocacy for interoperability • Standard for transferring metadata among digital libraries • Protocol for Metadata Harvesting (PMH) • Simplicity • Generality • Extensibility • Support for PMH => Open Archive (OA)
OAI = Technical Umbrella forPractical Interoperability… Metadata Harvesting Reference Libraries Museums Publishers E-PrintArchives …that can be exploited by different communities
OAI – Repository Perspective Required: Protocol Set Structure URI Scheme MDO MDO MDO MDO Required: DC MDO MDO MDO MDO DO DO DO DO
OA 1 OA 2 OA 4 OA 3 OA 5 OA 6 OA 7 OAI – Black Box Perspective
Tiered Model of Interoperability Mediator services Metadata harvesting Document models
Metadata harvesting The World According to OAI Service Providers Discovery Current Awareness Preservation Data Providers
Image Video Video Video Image Image Program Program Program 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 Document Document Document 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 ? users digital objects
Program Video Image Image Video Program Program Video Image 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 Document Document Document 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 Monolithic and/or Custom-built web-based application ? ? digital library
Program Video Video Image Image Program Program Video Image ? 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 ? ? ? ? ? ? ? ? ? ? ? ? Document ? Document Document ? 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 ? 1010100101010010101010010101010101010101 ? ? ? ? ? ? componentized digital library
Program Video Video Image Image Program Program Video Image XPMH 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 OA OA XPMH PMH OA XPMH OA XPMH XPMH OA XPMH OA Document Document Document XPMH XPMH 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 XPMH OA OA XPMH OA PMH XPMH open digital library
Extended OAI-PMH Open Digital Library Protocol Protocol for Metadata Harvesting
Extended OPEN ARCHIVE Open Digital Library Component OPEN ARCHIVE
Open Digital Library Deployments • NDLTD (www.ndltd.org) • Computer Science Teaching Center (www.cstc.org) • Computing and Information Technology Interactive Digital Educational Library (www.citidel.org) • Open Archives Distributed (NSF, DFG) – enhancements to PhysNet • OCKHAM • Open to others through DL-in-a-box
Open Digital Library • Network of Extended Open Archives where each node acts as either a provider of data, services or both. • Component = Node • Protocol = Arc
Open Digital Library Components • Running now • XML-File (data provider from file system) • Search: simple or in-memory (Essex) or generalized • Union, browse, recent, filter • E-journal/review, Submit, Edit, Annotation • Recommender, Rating; Mirroring (see JCDL’02) • Working with NCSA: from DB, unstructured text • Others in process • Classification/categorization • Registry (and other connections with web services)
ETD-2 ETD-4 Video ETD-3 Image Program Program Video Image 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 ETD-1 Document Document 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 Example Open Digital Library ODLRecent USER INTERFACE Recent PMH ODLUnion Filter PMH ODLUnion Union Browse PMH ODLBrowse PMH ODLUnion Filter PMH Search ODLSearch ETD DL for the Networked Digital Library of Theses and Dissertations (www.ndltd.org) Students and researchers ETD collections
Open Digital Library: Extended As What’s New Service Provider As Metadata Search Service Provider As Metadata Browse Service Provider As Recommend & Rate Service Provider As Annotation Search Service Provider IRDB-1 Search Engine DBBrowse Browse Engine Recommend IRDB-2 Search Engine What’s New Engine Rate Engine XMLFile Coll. & Data Provider 1 DBUnion Archive Merger Component Annotation Engine Harvest from data providers XMLFile Coll. & Data Provider 2 Filter XMLFile Coll. & Data Provider 3 OAI-PMH Data Provider Submit Archive OAIB (NCSA: from RDBMS)
CITIDEL Technology Features • Component architecture (Open Digital Library) • Re-use and compose re-deployable digital library components. • Built Using Open Standards & Technologies • OAI: Used to collect DL Resources and DL Interoperability • XSL and XML: Interface rendering with multi-lingual community based translation of screens and content (Spanish, …) • Perl: Component Integration • ESSEX: Search Engine Functionality • Very fast, utilizing in-memory processing • Includes snap-shots for persistence • Multi-scheming • Integrates multiple classifications / views through maps, closure
OCKHAM Initiative, Contact Info • Supported by DL Federation, Mellon, NSF, … • P2P University Network involving: • Emory, Notre Dame, U. Arizona, Virginia Tech, … • PI: Martin Halbert Phone 404-727-2204 Email: mhalber@emory.edu • OCKHAM URL: http://ockham.library.emory.edu
The Problem • Digital library development is complex and expensive. • Various DL development communities (in the USA at least) are not working together well. • Results exhibit much incompatibility, little common practice, slow progress, and no leverage on investment. • If this continues, we are just going to languish and fester.
Lightweight Protocols • “Lightweight”, or relatively small and simple protocols seem to have clear advantages over “Full” protocols that attempt to be comprehensive. • Successes of protocols considered lightweight is illuminating. • Examples: TCP/IP, HTTP, LDAP, and the OAI PMH
Reference Models • Reference Model: a common vocabulary and description of components, services, and inter-relationships that comprise a system under consideration • Useful as a tool to foster consensus and common understanding in a time of rapid change and/or disagreement • Explored in CS6604 class project with 2 focus groups: librarians, education experts
Current Focus: Peer-to-Peer (P2P) Lightweight (Protocol) Reference Models • Builds on successful example of the OAI PMH, clearly understood minimalist concept of metadata distribution, implemented in simple protocols (e.g., ODL) • Leads to developing simple reference models of specific subsystems, with associated simple protocols and standards • Testing in NSDL, connecting university libraries to support teaching & learning
OCKHAM Proposed Services • Alerting • Browsing • Cataloging • Conversion • OAI – Z39.50 • Pathfinding • Registry – prototype in CS6604 now • (plus others such as from adapted ODL)
DL Student Research: Gonçalves • 5S as a basis for developing digital libraries • Theory • Syntax, Semantics; Definitions, Relationships • Specification of requirements • Generation of systems • Quality
Motivation for 5S • DLs are not benefiting from formal theories as have other CS fields: DB, IR, PL, etc. • DL construction: difficult, ad-hoc, lacking support for tailoring/customization • Conceptual modeling, requirements analysis, and methodological approaches are rarely supported in DL development. • Lack of specific DL models, formalisms, languages
5S Layers Societies Scenarios Spaces Structures Streams
Intra-Model Relationships: Streams • Participant concepts: {text, image, video, audio} • Relations: • contains videoimage videoaudio • Streams define the basic content types over which digital objects are built; the latter being the ultimate carriers of the information in the DL. • However some complex types of streams (e.g., video) may themselves be associated with simpler types of streams (e.g., images, audio). • This relation indicates that a video contains a image as one of its frames or a specific audio recording.
DL Services/Activities Taxonomy (Gonçalves) Infrastructure Services Information Satisfaction Services Repository-Building Add Value Creational Preservational Acquiring Cataloging Crawling (focused) Describing Digitizing Federating Harvesting Purchasing Submitting Conserving Converting Copying/Replicating Emulating Renewing Translating (format) Annotating Classifying Clustering Evaluating Extracting Indexing Measuring Publicizing Rating Reviewing (peer) Surveying Translating (language) Browsing Collaborating Customizing Filtering Providing access Recommending Requesting Searching Visualizing
Services, Definitions, Parameters • In the table each service is characterized by • parameters (input, output) • of the initial and final events • of the scenarios that compose those services and • respective pre- and post-conditions which are represented in terms of rules on DL relations. • All other previous definitions and keys apply here. • That set is complemented with the following definitions:
Services Related Definitions • Aquery q is the representation of user interest or information need. • Hyptxt is an hypertext; wherein anchor is a node. • A log_entry is a descriptive metadata specification about an event of a scenario. • Let {doi} = {doi1, doi2,…, doin } be a set of digital objects and Ct = {c1, c2,…,cn} is a set of labels for categories. A classifier classCt: {doi} 2Ct is a function that maps a digital object to a set of categories. • A cluster cluk = {do1k, do2k, …, donk} is a subset of a set of digital objects.
DL Services I/O Behavior • Regarding the prior figure, which shows: • Instantiations of the “Services Definition” model • Inputs and outputs of examples of infrastructure and information satisfaction DL services • Key: • CDL = Collection • ICDL = index for collection CDL • {doi} = digital object • Soc = Society
DL Concept Dimensions of Quality Digital object Accessibility Pertinence (*) Preservability (*) Relevance Similarity Significance Timeliness (*) Metadata specification Accuracy Completeness Conformance Collection Completeness Impact Factor Catalog Completeness Consistency Repository Completeness Consistency Structures for Navigation Navigability (*) Services Composability Efficiency Effectiveness Extensibility Reusability Reliability Defining Quality in Digital Libraries
Completeness of Metadata (1) • Degree of completeness of a metadata specification msx • Completeness(msx) = 1 - (no. of missing attributes in msx/ total attributes of the schema to which msx conforms) • According to 5S definition of conformance