320 likes | 333 Views
This document discusses the physical and logical views of information, the concept of content, works and expressions, manifestations, items, and different types of object models. It also explores overlay journals and virtual collections, as well as the challenges faced in accessing materials and services in a diverse digital landscape.
E N D
Object models, overlay journals, and virtual collections William Y. Arms Corporation for National Research Initiatives March 22, 1999
Object models, overlay journals, and virtual collections William Y. Arms Department of Computer Science, Cornell University March 22, 1999
Object models, overlay journals, and virtual collections William Y. Arms Corporation for National Research Initiatives March 22, 1999
Physical and Logical Views of Information Physical view: Data structures, files, directories, servers Publishers, libraries, web sites Logical view: Works, expressions, manifestations, items Object models (document models) Overlay journals Virtual collections
What is Content? Works, expressions, manifestations, items
Work • Work • The underlying abstraction. • Examples • • Homer's The Iliad. • • Beethoven's Fifth Symphony. • • The Unix operating system.
Expression • Expression • A work is realized through an expression. • Examples • • The Iliad was first expressed orally, then it was written • down as a fixed sequence of words. • • Beethoven's Fifth Symphony can be expressed as a • printed score or by any one of many performances. • • The Unix operating system has separate expressions as • source code and machine code.
Works and Expressions • Works and Expression • Many works are realized through a single expression. • Examples • • The poem, The Road Not Taken by Robert Frost. • • The picture: • In such examples, there is no practical distinction between expression and work.
Manifestations • Manifestation • A expression is given form in one or more manifestations. • Examples • • The text of The Iliad has been manifest in numerous • manuscripts and printed books. • • A musical performance can be distributed on CD, or • broadcast on television. • • Software is manifest as files, which may be stored or • transmitted in any digital medium.
Items • Item • When many copies are made of a manifestation, each is a separate item. • Examples • • A specific copy of a book. • • A copy of acomputer file.
Beyond Simple Documents • Many digital objects are more than static files of data. • Dynamic objects: What is presented to the user depends upon the execution of computer programs or other external activities. • Complex objects: Objects are made up from many inter-related elements. • Alternate disseminations: Digital objects may offer the user a choice of access methods. • Databases: A database comprises many alternative records, with different records selected each time the database is accessed.
Object Models and Structural Types Web object Digitized materials Digitized image Set of digitized page images Marked-up text with page images Digitized audio recording Sets Set of digital objects Searchable set of digital objects
Web Object: File with URL & Data Type Identifier http://www.dlib.org/boats/swan56 Data Metadata jpg
Object Model: Digitized Image Data Several manifestations: thumbnail image reference image archival image Metadata Each manifestations may have its own metadata
Object Model: Digitized Image Identifier hdl:loc.ndlp/amrlp.1234567 Data thumbnail gif reference jpg archive jpg object metadata Metadata
Object Model: Set of Digitized Page Images Data Each page: separate image Metadata Structure of work: page sequence page numbers special pages
Object Model: Set of Digitized Page Images Identifier hdl:loc.ndlp/amrlp.13579 Data page 1 gif page 2 gif page 3 gif page map Metadata
Page Map • List of pages • Numbers printed on pages • Blocking of information on pages (columns, figures) • Sequences of information across pages A page map relates the page images to the structure of the information, e.g.: A page map is metadata for a specific manifestation
Overlay JournalsandVirtual CollectionsLogical organization of physically separate works
The NSF SMETE Library Soon, all scientific and engineering information will be available on-line: • Journals, reports, papers, standards, patents • Data sets, instruments, sensors • Computer programs, simulations, designs • Maps, images, films • ... etc., etc., etc.
The Instructor's Wish List To discover materials and services: • Good science • Comprehensible to students -- effective for teaching • Stable -- will not change or disappear Access to collections and services that are provided by many independent organizations: • No uniform catalog or index to everything • Mixture of for-profit and open access information
The Instructor's Wish List To discover materials and services: • Good science • Comprehensible to students -- effective for teaching • Stable -- will not change or disappear Access to collections and services that are provided by many independent organizations: • No uniform catalog or index to everything • Mixture of for-profit and open access information
Conventional Journal Contents Articles
Overlay Journal Articles in Repository B Articles in Repository A Contents
Overlay Journals Articles in Repository B Articles in Repository A Contents of Journal I Contents of Journal II
Overlay Journals with Preprint Servers Preprint server Research Web site Contents of Journal I Cornell CS Reports CoRR Contents of Journal II
SMETE Library: Physical Sites User CSTR NCSTRL ACM D-Lib CoRR
SMETE Library: Virtual Collections SMETE Links show the members of the virtual collection
Metadata for Virtual Collections Reference linking Identifiers (URLs, URNs, ...) Citations and reverse citations Information discovery Cataloguing and indexing Object models Structural types Disseminators
Indexing and Cataloguing Conventional cataloguing and indexing: Skilled professionals, following quality guidelines. Web spiders and gatherers:Programs that gather information and build indexes (e.g., Infoseek, Harvest). Meta-data in publishing: Addition of metadata by the creator to aid automatic indexing (e.g., Dublin Core). Content extraction:Indexing using structured text, speech recognition, or image content.
The End Physical view: Data structures, files, directories, servers Publishers, libraries, web sites Logical view: Works, expressions, manifestations, items Object models (document models) Overlay journals Virtual collections