Lifecycle Metadata for Digital Objects

Lifecycle Metadata for Digital Objects November 6, 2006 Descriptive Metadata: “Modeling the World”

Descriptive metadata for what? • WWW: now seen as the ONE place to find everything • Descriptive metadata provides: • Unique identification for a resource • Information permitting evaluation/selection of a resource • Information describing all “essential properties”

WWW: How to find things • What does search mean on the WWW? • How to support its multiple purposes? • Failure of search engines to render precise results (problems of scale; is this true?) • Failure of HTML metatags (spamming) • Solution • local expert cataloging (providing access points) • making local cataloging available • remote free-text searching (inferring access points) • Dublin Core and its limitations • Warwick Framework and RDF • Universal Semantic Web

Some metadata examples • Individual objects (Dublin Core and its derivatives) • Multimedia and/or complex objects (METS/MPEG21) • Books and other chunks of information (MARC) • Finding aids (EAD)

Semantic Web • Berners-Lee’s vision for the Web: basically machine-understandable metadata about meaning for everything on the Web • http://www.semaview.com/d/Semweb_Illustrated.pdf

Aside on cataloging • Cataloging systems as relatively static: relationships remained tacit and externally specified • Classification systems • Controlled vocabularies • Name authorities • Note all of these can be represented in XML as specific namespaces (MARC, MODS, etc.) • New methods aren’t that different: ontologies for the Semantic Web are also namespaces--but ones that are much more specific about actions

Ontologies • Like previous classification systems, they are being built by hand • General (Cyc) and domain-specific (especially for B to B, web services) • Ontologies establish a joint terminology between members of a community of interest • Ontologies specify domain knowledge in terms of formal logic that includes actions by and among entities • Ontologies will be used to guide extraction of semantic content from texts (and perhaps automatic generation of metadata)

Topic Maps • Representation of information using topics, associations, and occurrences • Note how this “triples” representation fits well with RDF (entity, relationship, entity) • An XML representation: XTM • An (older) ISO standard: ISO/IEC 13250:2003 • Related to ontologies and mind maps; designed to “map” semantic regions

Web Services • How to provide processing services over the WWW: XML and HTTP infrastructure passing remote procedure calls • UDDI (Universal Description Discovery and Information) is the registry of services • WSDL (Web Services Description Language) allows “advertisement” of services (in XML, of course) in the UDDI registry • SOAP (Simple Object Access Protocol) is the XML wrapper for requests sent to services • Example: DC metadata registry: http://dublincore.org/dcregistry/

Does what we know fit into this? • DC and derivatives are aimed at the single object (though not always used for it) and are frequently used in WWW contexts (cf. Warwick Framework ≈ RDF namespaces) • EAD describes descriptions of aggregate chunks of information (chunked in terms of “series” or “collections”) but can describe single objects • MARC/MODS describes aggregate chunks of information (chunked in the form of “books”) • METS and MPEG21 are frames for multiple and multimedia objects

Granularity • Granularity governs the level at which metadata can be descriptive • Metadata granularity tends to be finer for digital objects • Digital objects cannot be managed without individual granularity (thank you David Bearman)

EAD: Describing descriptions • What is a finding aid? • Describing a finding aid so it can be searched • Expanding a finding aid to accommodate individual granularity • Is it efficient to drill down through a finding aid to individual objects? • Can EAD be searched from the bottom up?

EAD Schizophrenia • Because it describes finding aids, it has retained concern with look and feel • Mixes granular conceptual description with box/folder lists for physical (and contingent) object arrangements • Lack of granularity is expressed in the possibility of writing narrative with <p> tags everywhere

MARC: Chunked packages • International Standard Bibliographic Description (ISBD) as parent of MARC, TEI • MODS: User-friendly MARC? subset of MARC elements (20), language-based tags • MARC as descriptive metadata • Bibliographical detail for the work • Bibliographical detail for the specific instance of the work (cf. FRBR) • Places the work within one or many classificatory systems (ontologies, controlled vocabularies, authority lists) • But alas! Not consistent!

METS: Multimedia/Multiversion • METS developed to express “archival bond” among objects related to one another as a single work (cf. FRBR, Warwick Framework, RDF) • Reflects concerns of digital librarians who want to make a wide range of versions available • Standard form: • General descriptive metadata for package • Object link • Object type • Specific descriptive metadata set(s) for specific kinds of objects

What about the single object? • Is Dublin Core enough? Outdated? (15 elements) • What about derivatives? • Qualified DC, DC profiles • Australian elements (20) • Why describe the single object? • Who will describe at the object level? • Zillions of archivists? • Authors? • Automatic analysis (ontology-driven)?

Wisdom of Crowds vs Long Tail • The wisdom of crowds: tagging as democratized subject catalogin • The long tail: specialist cataloging for small niche groups, now visible online

Lifecycle Metadata for Digital Objects