1 / 44

Metadata for the Web From Discovery to Description

Metadata for the Web From Discovery to Description. CS 502 – 20020226 Carl Lagoze – Cornell University. Co-existing Cost/Functionality Levels. Greater Functionality & Cost. Dublin Core Qualifiers. From fuzzy buckets to more specific description Model of “graceful degradation”

tanika
Download Presentation

Metadata for the Web From Discovery to Description

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Metadata for the WebFrom Discovery to Description CS 502 – 20020226 Carl Lagoze – Cornell University

  2. Co-existing Cost/Functionality Levels Greater Functionality & Cost

  3. Dublin Core Qualifiers • From fuzzy buckets to more specific description • Model of “graceful degradation” • Support both simplicity and specificity • Intra-domain and inter-domain semantics

  4. implied verb one of 15 properties property value (an appropriate literal) DC:Creator DC:Title DC:Subject DC:Date... implied subject Resource has property X qualifiers (adjectives) [optional qualifier] [optional qualifier]

  5. Varieties of qualifiers: Element Refinements • Make the meaning of an element narrower or more specific. • Narrowing implies an is a relationship • a "date created“ is a "date“ • an "is part of relation“ is a "relation“ • If your software does not understand the qualifier, you can safely ignore it.

  6. Varieties of Qualifiers: Value Encoding Schemes • Says that the value is • a term from a controlled vocabulary (e.g., Library of Congress Subject Headings) • a string formatted in a standard way (e.g., "2001-05-02" means May 3, not February 5) • Even if a scheme is not known by software, the value should be "appropriate" and usable for resource discovery.

  7. Resource has Subject "Languages -- Grammar" LCSH Resource has Date "2000-06-13" ISO8601 Revised

  8. Dumb-Down Principle for Qualifiers • The fifteen elements should be usable and understandable with or without the qualifiers • Qualifiers refine meaning (but may be harder to understand) • Nouns can stand on their own without adjectives • If your software encounters an unfamiliar qualifier, look it up -- or just ignore it! • "has a“ relations break the model • E.g., a creator has ahair color

  9. Test for “good““ qualifiers: cover and ask: -- Does the statement still make sense? -- Is it still correct? Resource has Subject "Languages -- Grammar" LCSH Resource has Date "2000-06-13" ISO8601 Revised

  10. “Incorrect” Qualification Resource has creator “Cornell University” affiliation Resource has subject “pre-schoolers” audience

  11. Open questions in this model • Are uncontrolled and unconstrained values really useful for discovery? • Is it possible for an organization (DCMI) to control the evolution of a language? • How can "simple discovery metadata" be combined with complex descriptions? Is there a notion of graceful degradation? • Can DC serve as a lingua franca (mapping template) among more complex models

  12. Models for Deploying Metadata • Embedded in the resource • low deployment threshold • Limited flexibility, limited model • Linked to from resource • Using xlink • Is there only one source of metadata? • Independent resource referencing resource • Model of accessing the object through its surrogate

  13. Syntax Alternatives:HTML • Advantages: • Simple Mechanism – META tags embedded in content • Widely deployed tools and knowledge • Disadvantages • Limited structural richness (won’t support hierarchical,tree-structured data or entity distinctions).

  14. Dublin Core in HTML • http://www.dublincore.org/documents/2000/08/15/dcq-html/ • HTML constructs • <link> to establish pseudo-namespace • <meta> for metadata statements • name attribute for DC element (DC.element.ER) • content attribute for element value • scheme attribute for encoding scheme or controlled vocabulary • lang attribute for language of element value

  15. Dublin Core in HTML example <link rel="schema.DC" href="http://purl.org/dc/elements/1.1"> <meta name="DC.Title" content="Business Unusual”><meta name=“DC.Title” lang=“es” content=“negocio inusual”> <meta name="DC.Creator" content="Carl Lagoze"> <meta name="DC.Subject" content="bibliographic control web cataloging "> <meta name="DC.Date.Created" scheme="W3CDTF" content="2000-10-23"> <meta name="DC.Format" content="text/html"> <meta name="DC.Identifier" content="http://lcweb.loc.gov/lagoze_paper.html">

  16. Unqualified Dublin Core in XML <?xml version="1.0"?> <!DOCTYPE rdf:RDF SYSTEM "http://dublincore.org/2000/12/01-dcmes-xml-dtd.dtd"> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/"> <rdf:Description rdf:about="http://www.ilrt.bristol.ac.uk/people/cmdjb/"> <dc:title>Dave Beckett's Home Page</dc:title> <dc:creator>Dave Beckett</dc:creator> <dc:publisher>ILRT, University of Bristol</dc:publisher> <dc:date>2000-06-06</dc:date> </rdf:Description> </rdf:RDF> http://www.dublincore.org/documents/2000/11/dcmes-xml/

  17. Example of Dublin Core Use A map in the United States Library of Congress on-line American Memory Collection

  18. Title The name given to the resource< META name = “DC.Title” content = “Novi Belgii Novæque Angliæ:nec non partis Virginiæ tabula multis in locis emendata ” lang = “la” >

  19. Creator An entity primarily responsible for making the content of the resource < META name = “DC.Creator” content = “Nicolaum Visscher” >

  20. Subject The topic of the content of the resource < META name = “DC.Subject” content = “Middle Atlantic States” scheme = “LCSH” >< META name = “DC.Subject” content = “Maps” scheme = “LCSH” >< META name = “DC.Subject” content = “Early works to 1800” scheme = “LCSH” >

  21. Description An account of the content of the description < META name = “DC.Description.Abstract” content = “An historical map showing the coast of New Jersey as perceived in the seventeenth century” >

  22. Publisher An entity responsible for making the resource available < META name = “DC.Publisher” content = “Library of Congress, United States” >

  23. Contributor An entity responsible for making contributions to the content of the resource. < META name = “DC.Contributor” content = “Historic Urban Plans” >

  24. Date A date associated with an event in the lifecycle of the resource < META name = “DC.Date.Created” content = “1996-04-17” scheme =“W3C-DTF” >

  25. Type The nature or genre of the content of the resource < META name = “DC.Type” content = “image”scheme = “DCMIType” >

  26. Format The physical or digital manifestation of the resource < META name = “DC.Format.Medium” content = “image/gif” scheme = “IMT” >< META name = “DC.Format.Extent” content = “556K” >

  27. Identifier An unambiguous reference to the resource in the current context < META name = “DC.Identifier” content = “http://loc.gov/coll1/img456.jpg” scheme = “URI” >

  28. Source A reference to a resource from which the present resource is derived. < META name = “DC.Source” content = “G3715 1685 .V5 1969 (LOC catalog #)” >

  29. Language Language of the intellectual content of the object < META name = “DC.Language” content = “nl”scheme = “ISO 639-2” >

  30. Relation A reference to a related resource < META name = “DC.Relation.isPartOf” content = “http://lcweb2.loc.gov/ammem/ gmdhtml/dsxpimg.html” scheme = “URI”>

  31. Coverage The extent or scope of the content of the resource < META name = “DC.Coverage.Spatial” content = “New Jersey” scheme = “TGN" >< META name = “DC.Coverage.Temporal” content = “1650” scheme = W3C-DTF”>

  32. Rights Information about rights in and over the resource < META name = “DC.Rights” content = “http://www.loc.gov/ rights_statement.htm” >

  33. Distributed ContentThe Metadata Challenge • From fixed, contained physical artifacts to fluid, distributed digital objects • Need for basis of trust and authenticity in network environment • Decentralization and specialization of resource description and need for mapping formalisms

  34. Photographer Computer artist Camera type Software Multi-entity nature of object description

  35. Understanding Metadata based on Query Capabilities • Simple boolean tags? • Creator=“Tom Baker” and “Title” contains “Dublin Core” • Agent, time, place questions? • Who was responsible for what and when and where

  36. subject implied verb metadata noun literal metadata adjective Playwright “Shakespeare” dc:creator.playwright R1 dc:title “Hamlet” Attribute/Value approaches to metadata… The playwright of Hamlet was Shakespeare Hamlet has a creator Shakespeare

  37. “Shakespeare” dc:creator.playwright R1 dc:creator.birthplace “Stratford” …run into problems for richer descriptions… The playwright of Hamlet was Shakespeare,who was born in Stratford Hamlet has a creator Stratford birthplace

  38. …because of their failure to model entity distinctions “Shakespeare” name R1 R2 creator birthplace title “Stratford” “Hamlet”

  39. Applying a Model-Centric Approach • Formally define common entities and relationships underlying multiple metadata vocabularies • Describe them (and their inter-relationships) in a simple logical model • Provide the framework for extending these common semantics to domain and application-specific metadata vocabularies.

  40. Events are key to understanding metadata relationships? • Modeling implied events as first-class objects provides attachment points for common entities – e.g., agents, contexts (times & places), roles. • Clarifying attachment points facilitates understanding and querying “who was responsible for what when”.

  41. ABC/Harmony Event-aware metadata ontology • Recognizing inherent lifecycle aspects of description (esp. of digital content) • Modeling incorporates time (events and situations) as first-class objects • Supplies clear attachment points for agents, roles, existential properties • Resource description as a “story-telling” activity

  42. ? Resource-centric Metadata

  43. “Orest Vereisky” “Leo Tolstoy” “Margaret Wettlin” "Moscow" “illustrator” “author” “translator” “1828” “1877” “1978” “creation” “translation” “Russian” “English” “Tragic adultery andthe search for meaningfullove” “Anna Karenina”

  44. Queries over complex descriptive graphs • Ability to ask questions like “show me all the translations of War and Peace between 1980 and 1990”

More Related