210 likes | 219 Views
Explore the evolution of metadata in digital libraries, its crucial role in resource discovery and management, and the impact of different formats. From traditional cataloguing rules to automatic generation, navigate the roots and complexities of metadata. Understand how metadata facilitates information discovery, aids in structuring data, and influences digital library services. Discover the balance between simplicity and complexity in metadata formats, application profiles, and interoperability for effective resource organization and access. Gain insights into the importance and impact of metadata in the digital library landscape.
E N D
CS 502 Computing Methods for Digital LibrariesCornell University – Computer ScienceHerbert Van de Sompelherbertv@cs.cornell.edu Lecture 12 Why metadata?
Notes • Carl Lagoze on Wednesday • No Lab on Friday • But Paul Ginsparg on 04/03 • XML Schema & XSLT - later
Content – Data - Metadata Content refers to digital library materials as information that is of interest to a user. Data emphasize bits and bytes to be processed by a computer. Metadata : data about data
Metadata – focus on description/discovery • data about data • origins in library cataloguing, A&I databases • now: an amplification of traditional bibliographic cataloguing practices in an electronic environment; • now: any data used to aid the identification, description and location of networked electronic resources. • actually, it is more
Metadata - broader • descriptive: facilitating resource discovery and identification (record in OPAC system) • administrative: facilitating resource management within a collection (loan record in OPAC system) • structural: binding together the components of complex information objects (series title in record in OPAC system)
descriptive/discovery administrative structural library objects networked resources Metadata - evolution descriptive/discovery library objects
Metadata • Traditionally stored separately from the objects that it describes, • For digital objects, sometimes is embedded in the objects (cf. KWF). • Usually the metadata is a set of text fields. • Textual metadata can be used to describe non-textual objects, e.g., software, images, music, …
Metadata – why? Some methods of information discovery search descriptive metadata about the objects. Generally, it enables digital library services: • explicitly (discovery metadata) or implicitly (terms and conditions) • helps to impose order on chaos • enables automated discovery/manipulation of objects
Metadata – generation (traditional) cataloguing rules object metadata record reference data
Metadata – generation (traditional) • Advantage: • Human expertise leads to high-quality catalogs and indexes • Disadvantages: • Expensive ($50+ per record) • Time consuming • Requires cumbersome cataloguing rules • Slow to adapt to new formats and types of digital objects • Human cataloging and indexing is too expensive to apply to all but a small proportion of digital objects • => automatic generation of metadata
Metadata – roots (Library cataloguing) Anglo American Cataloguing Rules (AACR2) • rules for what goes into each field of a catalog record MARC format • an exchange format for catalog records "MARC Catalog" • catalog in MARC format, where content of each field follows AACR2
Citation: a monograph -- book! • Citation • Caroline R. Arms, editor, Campus strategies for libraries and electronic information. Bedford, MA: Digital Press, 1990.
MARC tags MARC field MARC subfield code MARC subfield MARC indicator
ISBN Title statement • Imprint – • location, • publisher, • year Collation Series Title
directory leader field terminator 001 field
MARC: the good news • A great achievement: • Developed in 1960s • Magnetic tape exchange format for printing catalog records • The dawn of computing: • mixed upper and lower case • variable length fields, • repeated fields • non-Roman scripts • 100(?) million records with standard content and format • Thousands of trained librarians (millions?)
MARC: the bad news • A great problem: • Not designed for computer algorithms • One record per item (poor links between records) • Tied to traditional materials and traditional practices • Not Unicode • 100 million records at $50+/record • A classic legacy system!
Metadata –- simplicity/complexity • Variety of metadata formats for description/discovery: • basic, proprietary, records used in global internet search services; • simple attribute/value records such as the ROADS templates used in eLib subject services; • unqualified Dublin Core (12 elements only) • the more structured TEI and MARC formats; • qualified Dublin Core • detailed formats such as CIMI and EAD, typically applied to archival material.
Metadata –- one-size-fits-all/application-profiles • There is an evolution from a “one size fits all” concept for metadata towards: • the use of a specific format depending on the purpose; • the co-existence of formats in relation to an object; • combining metadata elements from various formats; • Choice of format can depend on: • the functional purpose of the metadata –- [description/ discovery/location] ; [administration] ; [structuring] • level of detail required to fulfill the purpose • discipline/domain/audience of the objects that are described • legacy issues • interoperability requirements
Commerce Home Pages Geo Library Internet Commons Scientific Data Whatever... Museums Metadata – interoperability
Metadata – descriptive/other • There is an evolution towards the creation of standards for non-discovery related metadata formats: • Preservation metadata [NedLib, CEDARS, …] (see OCLC overview document - http://www.oclc.org/digitalpreservation/presmeta_wp.pdf • Data Dictionary for Technical Metadata for Digital Still Images“ (http://www.niso.org/pdfs/DataDict.pdf) • book e-commerce [ONIX] • resource administration: • Circulation Interchange Protocol (NCIP) Standard – see http://www.niso.org/drafts/Z3982v1.html • Electronic resources (cf. Adam Chandler)