50 likes | 208 Views
Approaches To The Validation Of Dublin Core Metadata Embedded In (X)HTML Documents. The Problem Many (X)HTML document creators limit their "validation" to checking the presentation of their documents in Web browsers.
E N D
Approaches To The Validation Of Dublin Core Metadata Embedded In (X)HTML Documents The Problem Many (X)HTML document creators limit their "validation" to checking the presentation of their documents in Web browsers. Even where authors do use (X)HTML syntax validators, such tools do not check that embedded metadata conforms to the conventions recommended by DCMI. Furthermore, to be really useful to the metadata creator, a validation process should check the metadata against the specific requirements of the service that will use that metadata (an "application profile"). Background The Dublin Core Metadata Element Set is a simple set of metadata elements used for resource discovery. It has been widely adopted in digital library applications. One simple mechanism for deploying DC metadata is to embed it in (X)HTML documents, following conventions recommended by DCMI.
A Simple Approach To Validation • Use of DC-dot • DC-dot is a popular Web-based tool for creating and managing Dublin Core metadata. DC-dot can also be used to carry out simple validation of Dublin Core embedded in HTML resources. • Limitations of DC-dot • DC-dot has some limitations: • It was not designed primarily as a validation tool • It performs only basic validation • It validates againsta single set of rules The DC-dot Tool • Survey Findings • Use of DC-dot across a digital library programme showed that the entry points contained various errors in the representation of Dublin Core: • Use of DC.Author rather than DC.Creator • Incorrect format of date field • Incorrect use of delimiters
Using An RDF Validator • Use of An RDF Validator • An alternative approach was to make use of W3C's online Dublin Core to RDF XSLT transformation service and the RDF validator. This approach made use of several online services which were chained together: • Tidy to convert project home page to XHTML format • Dublin Core to RDF XSLT transformation service to convert embedded Dublin Core elements to RDF/XML • RDF validation service to validate the RDF/XML The RDF Validator Tool Comments This approach helped by providing a visual display of the Dublin Core metadata. It was noticed, for example, that one page contained an invalid identifier: <http:/www.foo.ac.uk/...> rather than < http://www.foo.ac.uk/...> However since the RDF validation service has no understanding of the semantics of the Dublin Core metadata, this approach has its limitations.
dcmeta: An XSLT Approach • Use of XSLT • We have employed XSLT to provide validation of Dublin Core metadata embedded in (X)HTML resources. • The dcmeta XSLT stylesheet: • Creates a report on the embedded DC metadata • Checks that general conventions for DC metadata are followed • Checks the metadata against a specified "application profile" of the DC Metadata Element Set. • The profile is a set of rules which specify: • Permitted DC properties (e.g. only the 15 DC elements are allowed) • Minimum/maximum permitted occurrences of a specified property (e.g. only one occurrence of DC.Title permitted) • Permitted encoding schemes (e.g. DC.Subject properties should have the scheme "LCSH") • Permitted values (e.g. DC.Publisher must have the value "UKOLN") • These rules are described in a secondary XML document read by the stylesheet. The dcmeta Tool
Conclusions • Deployment • The stylesheet can be deployed using any XSLT engine e.g. • Using a Javascript bookmarklet to apply the transformation in a browser with built-in XSLT engine (e.g. IE/MSXML) • As an online service using a server-side transformation • Run from the command line • Summary • This poster summarises a number of approaches to validating Dublin Core metadata embedded in HTML resources. • The poster reports on initial work in the development of an XSLT-based tool which can be used for validation of Dublin Core metadata. • Further Details • The stylesheet is available, together with details of the structure of the "profile" document, at <http://www.ukoln.ac.uk/metadata/dcmeta/> • For further information please contact Pete Johnston at the email address <P.Johnston@ukoln.ac.uk>