340 likes | 484 Views
Dublin Core and Emerging Conventions for a Semantic Web. Thomas Baker Fraunhofer-Gesellschaft, Bonn ELPUB 2003, Guimaraes, Portugal 26 June 2003. A particular set of metadata terms. Dublin Core as a simple and semantically generic lingua franca
E N D
Dublin Core and Emerging Conventions for a Semantic Web Thomas Baker Fraunhofer-Gesellschaft, Bonn ELPUB 2003, Guimaraes, Portugal 26 June 2003
A particular set of metadata terms • Dublin Core as a simple and semantically generic lingua franca • Fifteen “core” elements: Subject, Description, Title… • A metadata "pidgin" for "digital tourists" on a culturally diverse global Web • Limited grammar, easy to learn and use • Enough "as is" for many needs • 33 "element refinements" and 17 "encoding schemes" to qualify the elements for specialized purposes • A small set of 12 resource types for use with dc:type
A simple data model(resource with properties) • 1996-1998: Collective realization that machine-processability requires a coherent data model • 1996: “Warwick Framework” proposed at DC-2 workshop: DC as one specialized module (“resource discovery”) • 1997: “Qualifiers” proposed for specifying meanings • Some early adopters took this to unintended extremes: “DC.Creator.telephone-number” • 1998: DCMI involvement in emerging Resource Description Framework, clarification of simple data model • 2000: First set of qualifiers approved
A typology of metadata terms ("grammar") • Elements • (core) properties of resources • Element Refinements • properties that semantically refine elements • Encoding Schemes • give context to a metadata value • Vocabulary Terms • constitute controlled lists of possible values
An emergent approach to"structured values" • Implementers sometimes "shoehorn" complex sets of information into a single value • Creator: "name=Tom, affiliation=FHG, shoesize=47" • In practice, a large variety of "structured values" • Labelled strings • Unlabelled strings • Marked-up strings (e.g., LaTex, HTML) • Secondary resource descriptions (as above) • Post-processing ad-hoc constructs is messy and does not scale • Andy Powell's model: • Elements can have string values (Simple DC) • A further requirement to point to linked metadata?
A process for community standardization [10] • 1995-1999: open workshops, unruly but stimulating meetings of minds, rough consensus • 2000: qualifier vote: circa 25 voting members of an ad-hoc "Usage Committee" • 2001: smaller Usage Board • Codification of formal process for editorial control • Two two-day face-to-face meetings per year • Mandate and responsibility to maintain standard, approve extensions and clarifications
...based editorial review bya Usage Board • Term set must evolve as implementors coin new terms and usage patterns emerge • Working groups propose new terms or clarifications • Evaluate in light of grammatical principle, usefulness, clarity of definition, overlap with existing terms • Review application profiles based on Dublin Core • Tiered model of approval status: conforming, recommended, obsolete, registered • Meeting materials, mailing lists, and decisions archived and accessible on the open Web • DCMI as maintenance agency for ISO 15836
A bias towards simple and generic • DCMI Usage Board bias • Strength and value of DC lies in simplicity and generic applicability • Keep the core standard small, generic, and lightweight • Resist temptation to "complexify"– people want and need distinctions, but not in a "small standard" • DCMI Type Vocabulary has just 12 terms: user communities should invent or re-use their own more specific sub-types
A bias towards cooperation and re-use • Help user communities define and use their own extensions • Cooperate with maintainers of specialized vocabularies on forms of mutual recognition • Provide a model for re-use
"Good neighbor" policies • MARC Relators (roles such as "adapter", "artist") • DCMI: "use MARC Relators to refine dc:contributor" • LoC's RDF schema: "MARC Relators (identified with URIs) are sub-properties of dc:contributor" • Encoding Schemes • DCMI term designates Library of Congress Subject Headings (http://purl.org/dc/terms/LCSH) • If LoC coins own term, DCMI should promote its use
A "namespace policy" [20] • All DCMI metadata terms are given unique identity within three namespaces: • http: //purl.org/dc/elements/1.1/ - the core elements • http://purl.org/dc/terms/ - all other elements/qualifiers • http://purl.org/dc/dcmitype/ - a Type vocabulary • Example: http://purl.org/dc/elements/1.1/title • Policy on long-term stability of namespace URIs • Changes not substantially “semantic” (i.e., corrections) will not result in change of namespace URIs • “Semantic” changes must trigger a change of name • Version turnover of a “document management” nature will have no effect on namespace URIs
A typology of metadata vocabularies • Term declarations • Declare a unique set of elements and definitions • Each DCMI term is identified with a URI • Documented in HTML pages, formally declared as RDF schemas • Application profiles • Declare how an application uses which terms in its metadata • May mix-and-match from multiple namespaces
Why application profiles? • People want them! • Most standards have them: IEEE/LOM, MARC, DOI... • As focus of dialogue and semantic negotiation • Deep human need to resist total standardization? • To identify emerging semantics "at the edges" of a standard • To know how colleagues and peers are designing metadata – and avoid "reinventing the wheel" • To harmonize metadata usage within domains: • User communities (DC-Libraries, DC-Government) • Subject gateways (Renardus)
Dublin Core application profiles • Declaration specifying which metadata terms an information provider uses in metadata • Identifies source of terms used • May provide additional documentation • Designed to promote interoperability within constraints of Dublin Core model • Draft guidelines sponsored by European Standardization Committee (CEN) to be progressed through DCMI process • http://www.cenorm.be/isss/Workshop/MMI-DC/application-profile-for-comment.pdf • Caution: a documentary format cannot itself guarantee interoperability
A set of encoding practices • Guidelines for encoding metadata records (or embedded metadata) in HTML, XML, RDF • Use of rdfs:label and rdfs:value allow nesting of secondary resource descriptions • A model for declaring terms "machine-processably" in RDF • Namespace Policy mandates this, though not specifically RDF • Work item: a model for declaring application profiles machine-processably
Shared conventions fordeclaring namespaces? [30] • Cross-community consensus-building • W3C metadata standards and URIs as a basis for interoperability among different standards? • EU CORES Project (2002-2003) • Identify and explore areas of possible agreement among major standards initiatives • Interoperability Forum meeting in Brussels, November 2002
CORES Resolution on Identifying Metadata Elements • http://www.cores-eu.net/interoperability/cores-resolution/ • Whereas • Our metadata standards have “elements” – units of meaning comparable and mappable to elements of other standards, • We agree: • To assign Uniform Resource Identifiers to our elements; • To articulate and publish specific policies regarding the stability, persistence, and maintenance of the URIs assigned to the elements.
Clarifications to theCORES Resolution • URIs not necessarily used in applications "as is" • In metadata records, maybe dc:contributor instead of http://purl.org/dc/elements/1.1/contributor • Signatories decide what to identify with URIs • An individual element? An entire set of elements? A specific historical version of an element? • No implication that URIs will "resolve" to anything • URIs may "get" something with HTTP on Web – or not! • E.g., resolve to a database query? • Resolve to an RDF schema? • Or even resolve to nothing at all ("file not found")!!
Signatories • Eliot Christian, USGS, for GILS • Brian Green, EDItEUR, for ONIX • Rebecca Guenther, Library of Congress, for MARC21 • Keith Jeffery, EuroCRIS, for CERIF • Norman Paskin, Int’l DOI Foundation, for DOI • Robby Robson, IEEE LTSC, for IEEE/LOM • Stuart Weibel, DCMI, for Dublin Core
Signatories’ Action Plan • Action plan, November 2002 – May 2003: • Define and publish URI assignment mechanisms • Assign URIs to elements • Publish URI persistence policies • Article on follow-up scheduled for D-Lib Magazine in July 2003 issue • Taken as a whole, corpus of good-practice policies for others to discuss and emulate
Beyond the CORES Resolution [40] • Benefits for signatories: • Important first step towards future interoperability applications (e.g., mapping, conversion) • Improve "citability" of elements between standards • Potential areas of further work: • Provide persistent URIs for terms in taxonomies and ontologies • Shared conventions on declaring URIs in machine-processable forms • Shared conventions for application profiles and mapping constructs • Shared ontologies as targets for mapping
What exactly is being identified? • Is a particular term the same when used in different contexts? • A single term in a flat namespace? • http://ltsc.ieee.org/LOM/Identifier • Or two terms in a flat namespace? • http://ltsc.ieee.org/LOM/GeneralIdentifier • http://ltsc.ieee.org/LOM/MetadataIdentifier • Or two terms in a hierarchical namespace? • http://ltsc.ieee.org/LOM/General/Identifier • http://ltsc.ieee.org/LOM/Metadata/Identifier
What exactly is being identified? • For purposes of identification, is a term "the same" through successive versions? • At first, DC reflected version in the URI: • http://purl.org/dc/elements/1.1/title • Then decided to keep URIs stable and define the limits of change in the Namespace Policy • http://purl.org/dc/terms/audience • URIs for DC 1.1 kept for legacy reasons • URIs for successive versions of a term used "behind the scenes" for tracking changes
A method for maintaining (and versioning) a vocabulary • Assume that vocabularies must evolve: • Anticipate need to understand discrete states of the standard • All documents, decisions, and term declarations must evolve • Versioning to support future automated methods for processing legacy metadata • Numbered decisions linked to: • A specific historical version of a term • Supporting documentation for the decision • Historical record of the Usage Board meeting
Modes for publishing a vocabulary • Multiple publication formats needed • Web pages for human use • RDF schemas for expressing relationships between terms in machine-processable form • OWL ontologies and rules languages will improve expressivity of these constructs • Future schemas may need to express versioning machine-processably • Workflow • Web pages and schemas from a common source • XML data + XSLT scripts – simple, effective
A searchable "registry" of terms [50] • DCMI Registry • Searchable database of metadata terms • Terms translated into various languages • Goal: application interface for Web services • Goal: harvest schemas directly from their maintainers • An ecology of registries? • Harvest and merge element sets, vocabularies, profiles • For general overviews: SCHEMAS, CORES • Specific domains: MEG, GEM (education), FAO (agriculture) • Publication environment for information models • Tool for harmonization, mapping, conversion, merging
The Web as a new social context • Something new in history • Not just an historical set of technologies (HTTP, URLs, HTML) • Platform for historically unprecedented forms of social and intellectual interaction • Metadata as language for the Web • A language for statements about Web resources • Statements created and used both by humans and by machines • "Semantic Web" is about describing how resources relate to each other
Scale and automation • The Web is too big to control • Metadata statements are expensive to make and maintain • Shift away from the metaphor of "library"? • NSF workshop on "Post Digital Library Futures" • http: //www.sis.pitt.edu/~dlwkshop/ • Automated resource discovery (e.g. Google) • Using contextual information (e.g., URL structures) to infer "aboutness" • Natural-language technology, e.g. summarization
An evolving role for metadata • Balance between human and machine • Automated methods to generate metadata • "Let Google do it" versus expert intervention • Granularity of metadata • Describe each item or entire collections? • How much metadata is "enough" to improve discovery? • Semantic precision or tolerance of fuzziness?
Which aspects of Dublin Core willprove most useful over time? • The elements and related sets of terms • Open processes for community standardization • Editorial review by a Usage Board • A bias toward simple and generic metadata • A bias toward cooperative re-use of vocabularies • The etiquette of mutual recognition • A namespace policy for using URIs • A typology of vocabularies (e.g. application profiles) • A set of encoding practices (HTML, XML, RDF) • Methods for maintaining and versioning a vocabulary • Publishing a vocabulary for humans and machines • Searchable registries of metadata terms