1 / 49

Context

... Information Environment Service Registry, Internet Archive ARC file format, OAIS concepts, XML, XML Schema, ... aDORe Archive software (Layer 1: XMLtape & ARCfiles) is available ...

flora
Download Presentation

Context

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    Slide 2:Context Fact: LANL Research Library stores a significant scholarly collection locally (A&I databases, journal articles, …) and creates applications based on that collection. Initial aDORe motivation: Undo tight integration between data and application Uniform approach for ingesting, storing, and disseminating LANL RL data collections Bigger picture: Allow for multiple, parallel applications on top of stored content Create an environment that provides guarantees regarding long-term accessibility of stored content

    Slide 3:aDORe characteristics Standards-based: MPEG-21 Digital Item Declaration, the MPEG-21 Digital Item Identification, URI, info URI, OAI-PMH, NISO OpenURL, SRU, Information Environment Service Registry, Internet Archive ARC file format, OAIS concepts, XML, XML Schema, XQuery. Component-based, highly modular: Multiple content repositories, Identifier Locator, Service Registry, Format Registry, Semantic Registry, Harvesting front-end, Dissemination front-end Protocol-based: Components expose (REST-based) Web services All “read” services based on 4 standards: OAI-PMH, NISO OpenURL, SRU, Xquery. Interaction between modules is protocol-driven.

    Slide 4:aDORe characteristics Scalable Scalable Etc.

    Slide 5:aDORe effort aDORe is 2 things: A standards-based, repository federation architecture Actual implementation of the architecture at LANL for local storage of digital assets Prototype version was in production for 2 years! Production version finalized June 2007.

    Slide 6:aDORe overview Representing Digital Objects MPEG-21 DID & DIDL to represent Digital Objects using XML packages Identification of Digital Objects, datastreams, and XML Packages Storing Digital Objects Autonomous distributed repositories with OAI-PMH and OpenURL-based service interfaces Locating Digital Objects, datastreams, and XML Packages Identifier Locator Registries: Service Registry: Locating service interfaces for autonomous distributed repositories Format Registry: Sharing media type identifiers across autonomous distributed repositories Semantic Registry: Sharing intellectual content type identifiers across autonomous distributed repositories Providing federated access to the autonomous distributed repositories: OAI-PMH Federator: Harvesting XML packages OpenURL Resolver: Requesting services pertaining to Digital Objects, datastreams, and XML Packages

    Slide 7:Representing Digital Objects

    Slide 8:sample Digital Object Create an XML-based surrogate for each Digital Object: Glues all components together in a single XML Package Contains all required metadata (descriptive, technical, identifiers, …) in the XML Package Initial access format for all materials is the same (XML) irrespective of their native media type Assign identifiers to the XML Package, the Digital Object, the datstreams. Maintain original identifiers.

    Slide 9:representing Digital Objects using MPEG-21 DID & DIDL An XML Package is available for every Digital Object The Package is an XML document compliant with the MPEG-21 Digital Item Declaration Language ~ DIDL document The DIDL document typically contains: By-Value: descriptive metadata datastream & ingest/repository related metadata By-Reference: all constituent datastreams of the Digital Object Creation of DIDL documents can be: static, at ingestion time, cf. for aDORe Archive dynamic, via add-on capability to existing content management system, cf. Ghent University eRez add-ons A new DIDL document is created when a new version of a previously ingested Digital Object is ingested (update is considered re-ingestion).

    Slide 10:sample Digital Object

    Slide 11:representing Digital Objects using MPEG-21 DID

    Slide 12:Identification: digital objects, datastreams, DIDL documents

    Slide 13:aDORe DIDLTools aDORe DIDLTools software is available from http://african.lanl.gov/aDORe/projects/DIDLTools/

    Slide 14:The aDORe architecture

    Slide 15:the aDORe architecture : 3 layers Layer 1: the aDORe repositories Networked systems that host digital object content and that make that content accessible by exposing core service interfaces. In LANL Implementation: XMLtapes and ARCfiles (aDORe Archive) Other Content Management Systems can be turned into an aDORe repository by implementing the core service interfaces. Layer 2: the aDORe federation components Networked systems that facilitate presenting the aDORe repositories as a single logical repository; these federation components expose core service interfaces to allow access to their content. Federation components are: Identifier Locator, Service Registry, Format Registry, Semantic Registry Layer 3: the aDORe front-ends Networked systems that make digital object content hosted in the multitude of physical aDORe repositories accessible by exposing core services interfaces that present those aDORe repositories as a single logical repository aDORe front-ends are: OAI-PMH Federator, OpenURL Resolver

    Slide 17:The aDORe architecture

    Slide 19:aDORe repositories Networked systems that host digital object content and that have core service interfaces to facilitate access that content. Currently 2 types in LANL implementation: XMLtapes concatenating XML Packages ARCfiles concatenating datastreams Combination of OAI-PMH and OpenURL-based core service interfaces Generic XMLtape XQuery Resolver Other Content Management Systems can be turned into an aDORe repository by implementing the core service interfaces. Cf. Aleph Cf. Ghent University eRez

    Slide 20:aDORe Archive : XMLtapes

    Slide 21:aDORe Archive : XMLtape XQuery Resolver

    Slide 22:aDORe Archive : ARCfiles

    Slide 23:The aDORe architecture

    Slide 25:Identifier Locator Stores all identifiers of aDORe repositories (DIDLDocumentIdentifier, digital object identifier, datastream identifier) Loaded by retrieving identifiers from aDORe repositories using their “give me your identifiers” OpenURL service interface Stores [identifier, repository identifier] 1 OpenURL-based service interface to the Identifier Locator

    Slide 27:Service Registry

    Slide 28:Registries: Service Registry

    Slide 32:Registries: Format Registry

    Slide 33:Registries: Semantic Registry

    Slide 34:The aDORe architecture

    Slide 36:Expose aDORe repositories as a single repository

    Slide 37:OAI-PMH Federator

    Slide 38:OpenURL Resolver

    Slide 40:OpenURL Resolver (a bit more)

    Slide 42:LANL aDORe implementation

    Slide 43:LANL aDORe software Largely based on off-the-shelf software components: Berkeley DB Java Edition Heritrix tookit MySQL db OCLC OAICat OCLC OpenURL software Ockam IESR service registry aDORe Archive software (Layer 1: XMLtape & ARCfiles) is available from http://african.lanl.gov/aDORe/projects/adoreArchive/ Plans to “one way or another” make the entire LANL aDORe solution (revised Layer 1, Layer 2, Layer 3) available.

    Slide 44:LANL aDORe @ 2 Sep 2007

    Slide 45:LANL aDORe hardware

    Slide 46:LANL aDORe Performance

    Slide 47:aDORe Ingestion : Overview

    Slide 48:Conclusion aDORe Archive: The file-based approach (XMLtape/ARCfile) is inherently simple, and reduces dependency on database systems. The XMLtape approach is inspired by the ARC file format, but provides several additional attractive features: Off-the-shelf XML tools can be used to parse/validate an XMLtape All Digital Object metadata can be stored in XML Package The autonomy of the indexes allows retaining the files over time, while the indexes can be created using other techniques as technologies evolve. Can throw all indexes out and just start from scratch. Data integrity: XMLpackage contains SHA1 digest for each datastream of the Digital Object represented by the XML Package SHA1 digest for each XMLtape and ARCfile stored in XMLtape Registry, and ARCfile Registry, respectively

    Slide 49:Conclusion aDORe: The protocol-based nature of the access increases the flexibility in light of evolving technologies through the introduction of a layer of abstraction. Can throw whichever technology out and re-implement the same protocol interface using another technology. The protocol-based nature of the solution allows a fully distributed implementation. The component-based nature yields scalability. The standard-based design allows the use of off-the-shelf tools. A standard-based approach typically allows for a less painless migration (to a new standard). All kinds of Content Management Systems can be aDORe-ized.

More Related