340 likes | 444 Views
Update on Relevant Standards for Terminology Interoperability. Alistair Miles. ECOTERM Berlin 2005. Themes. Identity Construction Data Model Publication Services Subtle network of dependency between choices from different themes. 1. Identity. Theme 1: Identity. 1. Identity - Relevance.
E N D
Update on Relevant Standards for Terminology Interoperability Alistair Miles ECOTERM Berlin 2005
Themes • Identity • Construction • Data Model • Publication • Services • Subtle network of dependency between choices from different themes
1. Identity • Theme 1: Identity
1. Identity - Relevance • N.B. ecoterm members are a community within a global information environment • When choosing standards and designing systems for interoperability, should anticipate and allow for … • Unlimited expansion • Novel interaction • A standard mechanism for identifying things independent of context is key to scalable interoperability
1. Identity - URI • Principles of good URI use • Architecture of the WWW Volume One • W3C Recommendation • http://www.w3.org/TR/webarch/ • Managing URIs • SWBPD-WG Vocabulary Management TF • Publish soon (hopefully)
2. Construction • Theme 2: Construction
2. Construction • Standards for building a ‘good’ terminological/conceptual thing … • Relevant because if two things constructed according to the same basic principles then likely improve interoperability • Two new proposed standards … • BS 8723 • ANSI Z39.19 • Also … • Ontology Engineering Patterns TF (SWBPD-WG)
BS 8723 • BS 8723 “Structured vocabularies for information retrieval” • Intended to replace • ISO 2788-1986 Guidelines for the establishment and development of monolingual thesauri = BS 5723:1987 • ISO 5964-1985 Guidelines for the establishment and development of multilingual thesauri = BS 6723:1985 • Parts relevant to construction … • Part 2: thesauri • Part 3: other types of vocabulary • (Other parts …) • Part 1: glossary • Part 4: mapping • Part 5: “interoperability with applications”
BS 8723 • BS 8723 status • Parts 1 & 2 public drafts (hope to finalise < 3 months) • Part 4 circulated • Part 3 & 5 TODO
ANSI Z39.19 • ANSI Z39.19-200x • “Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies” • Ballot period April 11 – May 25, 2005 Abstract: This Standard presents guidelines and conventions for the contents, display, construction, testing, maintenance, and management of monolingual controlled vocabularies. This Standard focuses on controlled vocabularies that are used for the representation of content objects in knowledge organization systems including lists, synonym rings, taxonomies, and thesauri. This Standard should be regarded as a set of recommendations based on preferred techniques and procedures. Optional procedures are, however, sometimes described, e.g., for the display of terms in a controlled vocabulary. The primary purpose of vocabulary control is to achieve consistency in the description of content objects and to facilitate retrieval. Vocabulary control is accomplished by three principal methods: defining the scope, or meaning, of terms; using the equivalence relationship to link synonymous and nearly synonymous terms; and distinguishing among homographs.
OEP • Ontology Engineering Patterns TF • OWL • Published various documents: • Classes as values • N-ary relations • Value partitions/value sets • Simple part-whole relations • Qualified cardinality constraints http://www.w3.org/2001/sw/BestPractices/OEP/
Wikis • Construction methodology … • Wiki based, e.g. GEMET? • Wikipedia • Decentralised development paradigm • SkosWiki
3. Data Model • Theme 3: Data Model
3. Data Models - Relevance • Data model influences design of • data publication formats • programmatic interfaces • Data model also determines management and maintenance strategies and procedures • Data model often implicit in a standard, not formally specified … • E.g. BS 8723, ANSI Z39.19 • Some ‘standards’ to consider … • ANZI Z39.19, BS 8723 • SKOS Core • OWL • TMF • Topic Maps
3. Data Model • ‘Traditional thesaurus’ … • I.e. model described in prose by BS 8723, ANSI Z39.19 • Only terms reified, concepts implicit
3. Data Model • SKOS Core • Explicitly concept-oriented data model • Specified formally using RDFS/OWL
3. Data Model • Terminology Markup Framework (TMF, ISO 16642) • (ISO TC 37) • Concept-oriented • Specified using UML • Really a framework for designing XML formats
3. Data Model • OWL • Class/instance-oriented model (“ontologies”) • Formally specified via model-theoretic semantics
3. Data Model • ISO Topic Maps • Topics, Associations, Occurrences, Roles …
3. Data Models - Comments • ANSI Z39.19, BS 8723, SKOS Core, TMF broadly aligned … • (I.e. T-O vs. C-O not necessarily at odds) • … however N.B. choice of T-O vs. C-O model as foundation has implications for management/maintenance and URI use. • Relationship to ontologies (OWL) … ? • Topic maps … ?
4. Publication • Theme 4: Publication
4. Publication • General principle: publish data • XML • Thesaurus tradition (e.g. Zthes …) • Terminology (I.e. TC 37) tradition (e.g. TBX, implementation of TMF) • Topic maps tradition … XTM • RDF • SKOS Core • RDFS/OWL
OWL • OWL • Based on RDF • Ontologies for the web • W3C recommendation
SKOS Core • SKOS Core • Based on RDF • ‘Concept schemes’ for the web • working draft
4. Publication - comments • N.B. XTM and anything based on RDF (I.e. SKOS Core, OWL) designed for a distributed information environment • I.e. use URIs, support data linking, merging • Others not (designed for point-to-point transfer)
5. Services • Theme 5: Services
5. Services • Service Oriented Architecture (SOA) • Important software engineering paradigm • Web services • WSDL, SOAP (W3C) • Web service interface to conceptual/terminological resource … • Lots of individual projects (EOS, UNEP.Net …) • Standardisation initiatives … • Some recent work on implementing the SKOS API … • … but still plenty of issues. • OMG lexicon query service? • ANSI terminology services API? • XMDR? • Other attempts to build consensus?
5. Services - comments • N.B. • SOA & WS architecture for distributing programmatic components • Semantic Web machinery architecture for distributing data • Complementary • E.g. a service that provides efficient programmatic access to an aggregation of data harvested from multiple published sources
Presenter’s suggestions … Steps towards interoperability: • Use URIs • Use URIs well (consistency, persistence, articulated management & maintenance policies etc.) • Be clear about what a URI identifies Concepts? Classes? Something else? • Publish RDF descriptions of the things you are identifying • Use ‘well-established’ RDF vocabularies as far as possible, extend as appropriate • Make commitments to maintaining RDF descriptions, and publish maintenance policies and procedures i.e. establish a semantic web of terminological / conceptual / ontological data relating to the environment.
Presenter’s suggestions … Services? • With a web of data in place, the role of services becomes extremely clear: to provide efficient, convenient, programmatic interaction with subsets of that web of data. • Build a service-oriented architecture on top of a web of data Steps to achieving this: • Analyse and publish functional requirements for service types (I.e. what do I want a service to do?) • Study functional requirements within this community, identify and publish sets of common requirements (basis for standardisation work) • Take a look at SPARQL services & protocol (v. cheap) N.B. true standardisation of a service interface requires a (lightweight, informal, open) community-driven process for developing a web-service specification and building consensus