320 likes | 413 Views
Metadata: an introduction. Michael Day UKOLN, University of Bath m.day@ukoln.ac.uk Managing Networks: Understanding New Technologies, Birmingham, 13 September 2001. Presentation overview. Defining “metadata” Dublin Core: Background Exercise 1 Semantics Syntax Content Rules Exercise 2.
E N D
Metadata: an introduction Michael Day UKOLN, University of Bath m.day@ukoln.ac.uk Managing Networks: Understanding New Technologies, Birmingham, 13 September 2001
Presentation overview • Defining “metadata” • Dublin Core: • Background • Exercise 1 • Semantics • Syntax • Content Rules • Exercise 2 Managing Networks, Birmingham, 13 September 2001
Metadata (1) • Some definitions: • “data about data” • “Internet-age term for structured data about data” - Joint NSF-EU Working Group on Metadata (1998) • “... Machine understandable information about web resources or other things” - Berners-Lee (W3C) • Functional definition: • structured data about resources that can be used to help support a wide range of operations Managing Networks, Birmingham, 13 September 2001
Metadata (2) • These operations may include: • resource discovery and access • rights management • e-commerce • authentication • collection management • preservation Managing Networks, Birmingham, 13 September 2001
Metadata (3) • Resource discovery metadata: • Provides support for: • searching • location • retrieval (delivery) • description • May help enable: • Semantic interoperability Managing Networks, Birmingham, 13 September 2001
Metadata (4) • Where is metadata stored?: • Different models of metadata-resource association: • embedded within resource • tightly coupled using protocols or identifiers • separate database(s) Managing Networks, Birmingham, 13 September 2001
Metadata formats (1) • Diversity of metadata formats and frameworks • How many have you heard of? Managing Networks, Birmingham, 13 September 2001
Metadata formats (1) • Diversity of metadata formats and frameworks, e.g.: • Dublin Core • EAD, CIMI, TEI • PICS, RDF • MARC • GILS, FGDC • ROADS • http://www.ukoln.ac.uk/metadata/glossary/ Managing Networks, Birmingham, 13 September 2001
Metadata formats (2) • SCHEMAS Forum project “Metadata Watch” has already identified: • Over 200 implementation activities • Around 90 standardisation activities • Very different levels of information about the various initiatives Managing Networks, Birmingham, 13 September 2001
Metadata formats (3) • USMARC: 245 00 Wordnews online $h [computer file]. 246 3 World news online 256 Computer online service. 260 Washington, D.C. : $b Worldnews Online, $c [1995- 538 Mode of access: Internet. 500 Title from title frame. 520 “WorldNews OnLine is a service ... “ 650 0 Newspapers $x Databases. 856 7 $u http://worldnews.net $2 http Managing Networks, Birmingham, 13 September 2001
Metadata formats (4) • TEI header: • <teiHeader type="aacr2"><fileDesc><titleStmt> • <title type="245">Rubaiyat of Omar Khayyam : the astronomer poet of Persia / rendered into English verse by Edward Fitzgerald ; with drawings by Florence Lundborg</title> • <title type="gmd">[electronic resource]</title> • <author>Omar Khayyam</author> • <respStmt> • <resp>Conversion to TEI.2-conformant markup:</resp> • <name>University of Virginia Library Electronic Text Center </name> • </respStmt> Managing Networks, Birmingham, 13 September 2001
Metadata formats (5) • ROADS/IAFA template: • Template-Type: SERVICE • Handle: 871473886-23884 • Title: Wellcome Unit for the History of Medicine • URI-v1: http://units.ox.ac.uk/cgi-bin/safeperl/wuhminfo/p?home.html • Admin-Email-v1: wuhmo@wuhmo.ox.ac.uk • Publisher-Name-v1: Wellcome Unit for the History of Medicine • Publisher-Postal-v1: 45-47 Banbury Road, Oxford, OX2 6PE • Publisher-City-v1: Oxford Managing Networks, Birmingham, 13 September 2001
A metadata typology • Simple Rich • Based on: Dempsey and Heery (1998) Managing Networks, Birmingham, 13 September 2001
Resource creators authors webmasters institutions Service providers search services third parties commercial publishers Who creates metadata? • hand crafted • robot/database generated Managing Networks, Birmingham, 13 September 2001
Metadata creation tools • DC-dot: • http://www.ukoln.ac.uk/metadata/dcdot/ • Nordic Metadata Project Metadata Template: • http://www.lub.lu.se/cgi-bin/nmdc.pl • Reggie Metadata Editor: • http://metadata.net/dstc/ Managing Networks, Birmingham, 13 September 2001
Aspects of metadata • Syntax • related to the technical implementation - e.g. MARC, XML • Semantics • the basic meaning of elements • Rules for content • e.g., cataloguing rules Managing Networks, Birmingham, 13 September 2001
Dublin Core (1) • What is it? • 15 element metadata set • based on international consensus • Some initial assumptions: • simple set for untrained creators • basic set for semantic interoperability or resource discovery • primarily for Web-based document-like objects • http://www.dublincore.org/ Managing Networks, Birmingham, 13 September 2001
Dublin Core (2) • Dublin Core Metadata Initiative • Workshop series • first workshop hosted by OCLC in Dublin, Ohio (1995) • 9th workshop (DC2001) will be held in October (Tokyo) • Working Groups • for DC issues (e.g. Architecture, Registry, Standards, tools, etc.) • for specific user communities (e.g. Libraries, Education, Government, etc.) • open e-mail discussion lists Managing Networks, Birmingham, 13 September 2001
Dublin Core (3) • Dublin Core Metadata Element Set: • Version 1.0 (RFC 2413, 1998) • Version 1.1 (1999) • approved (Z39.85) by the US National Information Standards Organization (NISO) as a Draft American National Standard (July 2001) • Dublin Core Qualifiers: • DCMI Recommendation (2000) Managing Networks, Birmingham, 13 September 2001
DC exercise 1 • The Dublin Core Metadata Element Set consists of 15 elements, designed for simple resource discovery. • What elements do you think should be part of such a metadata element set? • Think about the type of resources that need to be described: • Web pages • Document-like objects • Images, sound resources, etc. • Multimedia resources Managing Networks, Birmingham, 13 September 2001
Title Subject Description Creator Publisher Contributor Date Type Format Identifier Source Language Relation Coverage Rights DC semantics (1) 15 element core metadata set: Managing Networks, Birmingham, 13 September 2001
DC semantics (2) • An example: • Name: Description • Identifier: Description • Definition: An account of the content of the resource. • Comment: Description may include but is not limited to: an abstract, table of contents, reference to a graphical representation of content or a free-text account of the content. Managing Networks, Birmingham, 13 September 2001
DC semantics (3) • Qualifiers: • DC semantics are defined very broadly • Possible to add qualifiers to some elements: • Element refinement(s): • Relation.IsPartOf • Date.Created • Encoding scheme(s): • Subject (scheme=DDC) • Date (scheme=ISO8601) Managing Networks, Birmingham, 13 September 2001
DC syntax (1) • Can be embedded into HTML Web pages: • <META> tag • limited functionality • the data can be “harvested” by metadata-aware search engines (but not many do this) • note that this is just one way of implementing the DC element set Managing Networks, Birmingham, 13 September 2001
DC syntax (2) • An example of embedding DC metadata in HTML 4.0: • <html><head> • <title>UKOLN Home Page</title> • <meta name="DC.Title" content="UKOLN"> • <meta name="DC.Description" content="UKOLN is a national centre for support in network information management in the library and information communities. It provides awareness, research and information services"> • <meta name="DC.Creator" content="UKOLN Information Services Group"> • </head> Managing Networks, Birmingham, 13 September 2001
DC content rules • Not part of DCMI: • No content rules (cataloguing rules) defined as part of Dublin Core Metadata Element Set • May be important where there are expectations of consistent cross-searching across related services, e.g.: • ROADS Cataloguing Guidelines • Resource Discovery Network (RDN) Cataloguing Guidelines Managing Networks, Birmingham, 13 September 2001
DC exercise 2 • Go to the Nordic Metadata Template at: • http://www.lub.lu.se/cgi-bin/nmdc.pl • And try to create some metadata for a Web page that you know reasonably well • Reflect on: • Which bits are difficult to fill in • Which parts relate to semantics, which to content rules (e.g. inverted forms of names) Managing Networks, Birmingham, 13 September 2001
Acknowledgements • UKOLN is funded by Resource: the Council for Museums, Archives and Libraries, the Joint Information Systems Committee (JISC) of the UK higher and further education funding councils, as well as by project funding from the JISC and the European Union. UKOLN also receives support from the University of Bath where it is based. • http://www.ukoln.ac.uk/ Managing Networks, Birmingham, 13 September 2001