220 likes | 363 Views
In pursuit of interoperability: Can we standardize mapping types?. Stella G Dextre Clarke Project Leader, ISO NP 25964. Overview. Compare mapping types used in some well-known projects: MACS; CrissCross; RENARDUS; KoMoHe
E N D
In pursuit of interoperability:Can we standardize mapping types? Stella G Dextre Clarke Project Leader, ISO NP 25964
Overview • Compare mapping types used in some well-known projects: MACS; CrissCross; RENARDUS; KoMoHe • and in Doerr’s well-cited paper on Semantic problems of thesaurus mapping • And in 3 standards: BS 8723-4, SKOS and the forthcoming ISO 25964-2 • Ask how feasible it is to achieve standardization
MACS Project • Context: enabling multilingual access to collections indexed with different vocabularies • Vocabularies are all subject heading schemes • All mappings are considered equivalence • Equivalence can be simple or compound • Two types of compound equivalence: • Heading A = Heading B OR Heading C • Heading A = Heading B AND Heading C
CrissCross Project • Context: improving access to vocabularies and heterogeneously indexed collections (in one natural language) • One-way mappings • From a subject headings scheme to a classification scheme • Many mappings from one keyword • “Degrees of determinacy” rather than distinct mapping types – D1, D2, D3, D4
RENARDUS Project • Context: search/browse across gateways using different classification schemes • One-way mappings, from DDC to local schemes • Five mapping types: • fully equivalent • broader or narrower equivalent • major or minor overlap
GESIS/KoMoHe • Context: distributed search across systems using 25 different vocabularies (thesauri and classification schemes) • (Separate) mappings in both directions • Three basic mapping types: • Equivalence • Hierarchical • Associative • Also there is an explicit “null relationship” • Any mapping can be one-to-one or one-to-many • Every mapping can have a “relevance rating” of high, medium or low.
Doerr’s findings(see http://journals.tdl.org/jodi/article/view/31/32) • Context: query transformation is assumed to be the main application of mappings • All the vocabularies discussed are thesauri, applied to documents and/or museum collections • Basic types of mapping are: • exact equivalence • inexact equivalence • broader equivalence • narrower equivalence • Exact, broader and narrower equivalence can be simple or compound • Compound equivalence means a Boolean expression of target terms using AND, OR or NOT (but in practice no examples are given using NOT).
BS 8723-4 • Provides for mapping search terms or index terms • Emphasis on thesauri, although other vocabulary types are taken into account • Basic mapping types: equivalence; hierarchical, associative • Hierarchical subdivides into broader/narrower • Equivalence subdivides into simple/compound • Degrees of equivalence (such as exact, inexact, partial) are discussed but not formalised as distinct types other than those described above.
SKOS (Simple Knowledge Organization System) data model • Context is sharing/linking KOSs via the Web • SKOS development began with thesauri, but has extended to classification schemes, subject heading schemes, etc. • Basic mapping “properties” (skos:mappingRelation): • skos:closeMatch (symmetric) • skos:exactMatch (symmetric, transitive) • skos:relatedMatch (symmetric) • skos:broadMatch (inverse of narrowmatch) • skos:narrowMatch (inverse of broadmatch) • No provision for compound mappings
ISO 25964-2 (still in draft) • A revision of ISO 2788 and ISO 5964 as well as BS 8723 • Provides for mapping search terms or index terms • Emphasis on thesauri, although other vocabulary types are taken into account • Basic mapping types: Equivalence Hierarchical Associative • “Inexact” can apply to any mapping, but most commonly to equivalence
ISO 25964-2 (still in draft) • A revision of ISO 2788 and ISO 5964 as well as BS 8723 • Provides for mapping search terms or index terms • Emphasis on thesauri, although other vocabulary types are taken into account • Basic mapping types: Equivalence Laptop computers EQ Notebook computers Hierarchical Roads NM Streets; Streets BM Roads Associative Journals RM Magazines • “Inexact” can apply to any mapping, but most commonly to equivalence Horticulture ~EQ Gardening
ISO 25964-2 mapping types • Basic mapping types: Equivalence Hierarchical Associative • “Inexact” can apply to any mapping, but most commonly to equivalence
ISO 25964-2 mapping types in more detail • Basic mapping types: Equivalence Simple Compound Intersecting compound equivalence Cumulative compound equivalence Hierarchical Broader Narrower Associative • “Inexact” can apply to any mapping, but most commonly to equivalence, including compound equivalence
ISO 25964-2 equivalence mappings in more detail • Simple Laptop computers EQ Notebook computers • Compound • Intersecting compound equivalence Women executives EQ Women + Executives • Cumulative compound equivalence Inland waterways EQ rivers | canals
women executives rivers canals women executives inland waterways Intersecting versus cumulative equivalence Women executives EQ Women + Executives Inland waterways EQ rivers | canals
Some key messages re compound equivalence • If you use mappings for conversion of index terms, you implement intersecting equivalents quite differently from cumulative equivalents. • With simple equivalence (exact or inexact) and with hierarchical or associative mappings, two-way conversions are usually OK; but compound equivalence typically works in one direction only.
Inexact: another complication for equivalence mappings • Simple Laptop computers EQ Notebook computers • Compound • Intersecting compound equivalence Women executives EQ Women + Executives • Cumulative compound equivalence Inland waterways EQ rivers | canals • Inexact simple equivalence Lawns ~EQ Turf • Inexact compound equivalence Women executives ~EQ Females + Managers
Major/minor overlap: yet another complication • Found useful in Renardus project • Is there a parallel with the KoMoHe “relevancy rating”? • Earlier versions of SKOS allowed “majorMatch” and “minorMatch”; these were subsequently deprecated • It would apply to inexact equivalence; maybe also to hierarchical and associative mappings? • How would you judge it in cases of compound equivalence? • A recent draft of ISO 25964 admits major/minor as an optional attribute of inexact equivalence, in the context of a particular application.
Now we come to the crunch:Can we standardize these mapping types? We can certainly write them in a standards document, but can we make them stick? Will real users implement them according to the guidance rules in the standard?
To make a standard stick: • Keep it simple • Address a real need • Adopt rules that are already broadly accepted in the user community • Keep it within the implementation range of available software • Make the standard available easily and free – or at least at a low price • Commit to lifelong maintenance
Want a copy of ISO 25964-2 ? • A draft is due to appear in January 2011, “ISO DIS 25964-2”, with the hope of attracting comments from potential users • The official way to get it is through your national standards body (e.g. DIN) • Distribution policies vary from one country to another; last time round we found a way to make the draft available online free of charge and free of passwords, on the BSI site. • Send me an email and I’ll alert you when the DIS is released. stella@lukehouse.org
References (abbreviated) • MACS: Landry, Patrice. Multilingual subject access: the linking approach of MACS. Cataloging & Classification Quarterly. 2004; 37(3/4):177-191 • CrissCross: http://linux2.fbi.fh-koeln.de/crisscross/swd-ddc-mapping_en.html • RENARDUS: http://www.mpdl.mpg.de/staff/tkoch/publ/preifla-final.html • KoMoHe: http://www.gesis.org/en/research/programs-and-projects/knowledge-technologies/project-overview/komohe/ • Doerr: http://journals.tdl.org/jodi/article/view/31/32 • SKOS: http://www.w3.org/TR/skos-reference/ • BS 8723-4:2007 Structured vocabularies for information retrieval - Guide - Interoperability between vocabularies. British Standards Institution • ISO 25964-2 (still in draft). Thesauri and interoperability with other vocabularies – Part 2: Interoperability with other vocabularies