470 likes | 596 Views
Alternative Architecture for Information in Digital Libraries. Onno W. Purbo Onno@indo.net.id. Reference. http://www.dlib.org/dlib/february97/cnri/02arms1.html
E N D
Alternative Architecture for Information in Digital Libraries Onno W. Purbo Onno@indo.net.id
Reference • http://www.dlib.org/dlib/february97/cnri/02arms1.html • William Y. Arms, Christophe Blanchi, Edward A. Overly, “An Architecture for Information in Digital Libraries,” Corporation for National Research Initiatives Reston, Virginia, February 1997.
The Structure of Information • Digital data digital library. • Digital objects • Metadata • Unique identifier (handle). • Group of digital objects set of digital objects. • Different type of material categories.
Work Flow Example • Search • Z.39.50 – list of digital objects identified by handle. • Select • Retrieval • Resipository Access Protocol (RAP) • Display
Structure of Info in Digi Lib • Relationship (chapter, index) • Format (SGML, HTML) • Version • Right & Permission • Computer System & Network (dialup vs. broadband).
Basic Principles • User & app. Program must be flexible. • Collections must be straightforward to manage. • The information archirectire must reflect economic, social & legal framework.
Data type, structural metadata • Data type – technical properties of data, format & processing. • Structural metadata – type, version, relationship of digital material. • Meta-object – reference to a set of digital object.
Guidelines for all categories • All data is given an explicit data type • All metadata is encoded explicitly • Handles are given to individual items of intellectual property • Meta-objects are used to aggregate digital objects • Handles are used to identify items listed in meta-objects
An Example of the Use of Meta-objects • Scanned photographs • Digital objects for a scanned photograph • Digital objects for individual versions • Meta-object • Handles for scanned photographs • Depositing a scanned photograph
Digital objects for a scanned photograph • Low resolution “thumbnail” • High resolution “reference” image
Digital objects for individual versions • Key metadata. • used to manage the object in a networked environment. It includes the handle, and the rights and permissions associated with the digital object. • Structural metadata. • includes fields for description, owner, handle of meta-object, data size, data type (e.g., "jpg"), version number, description, date deposited, use (e.g., "thumbnail"), and the date of last revision. • Image data. • This is the image data.
Meta-object • Key metadata. • includes the handle, and the rights and permissions associated with the digital object. • Structural metadata. • includes a description, the owner, the number of versions, the date deposited, the use ("meta-object"), and the date of last revision. • Data about each version. • For each of the three scanned versions (e.g., the thumbnail), there is a package of information including the handle of the version, and the relationship among the versions.
Handles for scanned photographs • control identifier - 3a16116r.jpg • replace the control identifiers by handles, which provide a unique, persistent, location independent name for each item - loc.ndlp.amrlp/3a16116 • Terminology to describe handles: • "loc.ndlp.amrlp" is the naming authority • "3a16116" is a locally unique string • For convenience in processing, use sequence numbers • loc.ndlp.amrlp/3a16116.1 • loc.ndlp.amrlp/3a16116.2
Depositing a scanned photograph • Human • machine
Depositing a scanned photograph - human • Selection of the material that will be made into each digital object. • Specification of the metadata for those fields that require judgment.
Depositing a scanned photograph - machine • Creation of the meta-object and the links to other digital objects. • Depositing the digital objects in the repository. • Registering the handles in the handle system.
Access to a scanned photograph • Bibliographic entries in search systems refer to the scanned photograph by the handle of the meta- object. • If a user requests a summary of the photograph, the "thumbnail" image is provided. • If the user requests access to the photograph without specifying which version, the "access" image is provided.
Digital Object • Key-metadata • The key-metadata is the information stored in the digital object that is needed to manage the digital object in a networked environment -- for example to store, replicate, or transmit the object without providing access to the content. This includes terms and conditions, and the handle. • Digital material • The digital material (or data) comprises a set of sequences of bits.
Digital Objects Internal Structure • An element is a bit sequence comprising an elementary unit of information. An element has its own ID. • A package is a collection of elements and other packages, with its own ID. • A digital object is a package with key-metadata for use in a networked environment. The ID is a handle.
Data Element • Data element • A data element is any bit-sequence. • Element ID • The element ID is the internal identifier of the element within the digital object. Unlike a handle, which is unique and known publicly, the element ID is of local importance only. • Attributes • Attributes are the information that is needed to process the element. They include: a role, which defines the function of the element (such as "DTD" in the SGML world), and a type, which includes technical information (such as "jpeg").
Packages • Packages are used to group or associate elements and other packages. • A package has a package ID. • If the package is a digital object, the package ID is a handle. Otherwise, it is the internal identifier of the package within the digital object. Unlike a handle, which is unique and known publicly, such a package ID is of local importance only. The content of a package consists of elements and other packages.
Handle & Handle System • The digital library is assembled from a great variety of components. They include people, computers, networks, repositories, databases, search systems, Web servers, digital objects, elements of objects, bibliographic records, and many more. Keeping track of these components requires a systematic approach to identification. • http://www.handle.net
Handle System • To resolve a handle is to present a handle to the handle system and receive as a reply information about the item identified. • The handle system is a distributed computer system, with many computers distributed across the world. CNRI manages a global handle registry and there are local handle services operated by other organizations, e.g. http://www.handle.net/
Naming Authority • Handles are created by naming authorities, administrative units that are authorized to create and edit handles.
Structure of a Repository • A repository is a system for networked based storage and access to digital objects. • All interaction with the repository uses a simple protocol, known as the Repository Access Protocol (RAP). RAP has a small number of fundamental operations, such as "deposit", which stores a digital object in the repository, and "access", which provides access to a digital object. • Thus RAP provides a clearly defined, open interface for the repository that allows others to write clients and higher level interfaces.
Structure of Repository • Repository shell • The repository shell is the part of the repository that interfaces with the outside world. It implements the RAP protocol • Persistent store • Information in the repository is held in the persistent store. The persistent store is completely hidden from the outside. • Object management layer • The object management layer provides an interface between the services provided by the persistent store and the object oriented functions required by the repository shell.
The Repository Access Protocol (RAP) • VerifyHandle. Confirm that a handle has been registered in the handle system. • AccessRepoMeta. Access the repository metadata. • Verify_DO. Confirm that a repository stores a digital object with a specified handle. • AccessMeta. Access the metadata for a specified digital object. • Access_DO. Access the digital object. • Deposit_DO. Deposit a digital object in a repository. • Delete_DO. Deletes a digital object from a repository. • MutateMeta. Edit the metadata for a digital object. • Mutate_DO. Edit a digital object.
Example RAP Work Flow • The handle "loc.ndlp/1234" is sent to the handle system. It resolves to data type "handle" (HDL), value "loc/repos1". This is interpreted as information that the digital object is stored in the repository identified by the given handle. • The handle "loc/repos1" is sent to the handle system. It resolves to information of type "RAP". This is information that the repository implements RAP. The corresponding data is a reference to a CORBA Object Request Broker (ORB). • The command "Access_DO (loc.ndlp/1234)" is now sent to the repository.
Benefit Using Handle • Since the digital object is identified by a handle, if it is moved to another repository the only change required is to alter the data in the first of the handle records in the figure. Since the repository is identified by a handle, if the repository is moved to a different computer or otherwise changed, but its handle remains the same, altering the single data item in the second handle record in the figure is the only change needed, for all the digital objects stored in the repository.
Hierarchies • Level 0: • contains the digitized image, sound, text, or other data. • Level 1: • is a parent of digital objects of Level 0. Upon encountering a digital object of this type, the digital object browser extracts the content of the all the child Level 0 digital objects and displays them in an indexed list to the user. This type has been used to display indexes of thumbnail images. • Level 2: • is a parent of digital objects of Level 1.