380 likes | 784 Views
Crossing the boundaries: interoperability between vocabularies Stella G Dextre Clarke Senior Metadata Consultant, Bridgeman Art Library; Independent Consultant Summary Interoperability: At the metadata schema level At the vocabulary level Practicalities of vocabulary mapping
E N D
Crossing the boundaries: interoperability between vocabularies Stella G Dextre Clarke Senior Metadata Consultant, Bridgeman Art Library; Independent Consultant
Summary • Interoperability: • At the metadata schema level • At the vocabulary level • Practicalities of vocabulary mapping • Interoperability at the data exchange level • Standards to help us through the maze
In a networked world, interoperability is all the rage • CIDOC-CRM • Web 2.0 • Mash-ups • Semantic Web (well, not quite with us yet, but said to be coming shortly) … and it’s not just about Museum A sharing with Gallery B
How to achieve interoperability? • Step 1: apply a metadata schema consistently to all your records and export via a standard metadata format • Step 2: implement a metadata cross-walk e.g. The Getty crosswalk at http://www.getty.edu/research/conducting_research/standards/intrometadata/metadata_element_sets.html • So far so good – it’s not so difficult
But interoperability needs to apply at two levels • Between metadata schemas, e.g: Artist → Creator → Maker Location → Place → Coverage.spatial Keywords → Subject • Between vocabulary terms, e.g: rowing boats → rowboats → pulling boats gramophone records → phonograph records garments → clothes → clothing
How to achieve interoperabilityat the vocabulary level? • Step 1: apply a controlled vocabulary consistently to all your records • Step 2: implement a vocabulary cross-walk (a.k.a. set of mappings) • But ready-made crosswalks are not so easy to find; you may have to build your own, and it can be a long job…
Building the mappings – an easy example Vocabulary A Vocabulary B Churches Churches
Look a little closer. Is it so easy? Vocabulary A Vocabulary B Churches Churches NT Byzantine churches NT Anglican church Gothic churches Protestant church Norman churches Roman catholic church
Another example: compare 5 different vocabularies Look for the concept “schools” in the following: • IPSV (UK public sector) • AAT (art/architecture) • GEMET (environmental) • ERIC (education) • MeSH (medical)
URLs for those vocabularies • IPSV http://www.esd.org.uk/standards/ipsv/ • AAT http://www.getty.edu/research/conducting_research/vocabularies/aat/ • GEMET http://www.eionet.europa.eu/gemet • ERIC http://www.eric.ed.gov/ • MeSH http://www.nlm.nih.gov/mesh/
Typical differences between vocabularies • Different term for the same concept (and same term can signify a different concept) • Hierarchical structure around the concept • Scope note, definition, synonyms and other attributes of a term/concept • Concepts designated by terms or by codes or notation • Language of access (e.g. French, German) • Layout and format
More practicalities: two-way versus one-way mappings Poultry Parrots Chickens Canaries Birds Ducks Budgies Geese Vocabulary 3 Vocabulary 1 Vocabulary 2
More practicalities – planning the architecture A B F C D H E G
Or some people do chain mapping… A B F C D H E G P Q R S
buses → coaches coaches → trainers trainers → training shoes Job vacancies → jobs Jobs → posts Posts → post post → mail Any one of the mappings could be OK in one context, but not when chained. Most howlers can be avoided, but only if you check carefully Timber → wood Wood → woods Woods → forests Firewood → logs Logs → records Records → archives But what happens with chain mapping?
So best avoided… A B F C D H E G P Q R S
A bit of practical reasoning You can’t rely on a computer to do the matching But it’s such a huge job, you can’t do it without a computer! Ergo, use a computer to suggest matches, but do a human check on each one
One more practical need for interoperability • Data exchange between vocabularies and the computer applications that exploit them • Either for importing a vocabulary into an application (e.g. into a search engine or a cataloguing package) • Or to allow online interrogation of a vocabulary by a searching or indexing application • What we need are standard formats and protocols
So what standards do we have? • ISO 2788, ISO 5964 and national equivalents • ANSI/NISO Z39.19 • SKOS, Zthes, ADL, MARC, SRW/SRU • BS 8723 • ISO NP 25964
Vocabulary construction and management • ISO 2788-1986 Guidelines for the establishment and development of monolingual thesauri = BS 5723:1987 and other national standards • ISO 5964-1985 Guidelines for the establishment and development of multilingual thesauri = BS 6723:1985 and other national standards • ANSI/NISO Z39.19-2005 Guidelines for the construction, format and management of monolingual controlled vocabularies
Vocabulary data formats only • Simple Knowledge Organization Systems (SKOS) format is in XML/RDF and destined for Semantic Web. http://www.w3.org/2004/02/skos/ • Zthes – an application profile of Z39.50, for exchange of thesaurus data. http://zthes.z3950.org/ • MARC has a format for “authority records”, suitable for library applications. at http://www.loc.gov/marc/authority/
Vocabulary data protocols only • SKOS API designed for live querying of vocabularies on the Web. http://www.w3.org/2001/sw/Europe/reports/thes/skosapi.html • ADL Thesaurus Protocol for querying and navigation around monolingual thesauri on the Web. http://www.alexandria.ucsb.edu/thesaurus/specification.html • SRW/SRU (Search and Retrieve via the Web/URLs) is for a variety of search types, not just vocabularies. http://www.loc.gov/standards/sru/
Vocabulary construction and management + interoperability BS 8723: Structured vocabularies for information retrieval – Guide • Part 1: Definitions, symbols and abbreviations • Part 2: Thesauri • Part 3: Vocabularies other than thesauri • Part 4: Interoperability between vocabularies • Part 5: Exchange formats and protocols for interoperability Motivation throughout is “interoperability”
ISO NP 25964 (adoption of BS 8723 as an ISO standard) • The proposal to revise ISO 2788 and ISO 5964, basing the work on BS 8723, was submitted to ISO TC 46/SC 9 members in April 2007 • Project now approved • At least 9 countries participating: France, Germany, Canada, Finland, New Zealand, Sweden, UK, Ukraine, USA
In conclusion • In a networked world, we need interoperability at the vocabulary level • Building the mappings is a job for people, not computers (but computer support is vital) • Mapping may not be easy, but it’s fun… for the person with the right mindset • We need to apply standards to all aspects of vocabulary work, data exchange as well as construction and maintenance