260 likes | 343 Views
A land of milk and ice cream . Building better controlled vocabularies: Guidance from ISO 25964 Jutta Lindenthal, Detlev Balzer ISKO UK Conference, 2013. Quality criteria for a thesaurus .
E N D
Building better controlled vocabularies:Guidance from ISO 25964 Jutta Lindenthal, Detlev Balzer ISKO UK Conference, 2013
Quality criteria for a thesaurus • A thesaurus is an infomation retrieval instrument. Its quality is determined by the extent to which it serves this purpose. • It is expected to support the indexing process i.a. through • unambiguous descriptors -> duplicate control and qualifiers • directions for indexers -> compound equivalence • It supports the retrieval process i.a. through providing • navigation paths for browsing -> concept groups • a framework for facet-based retrieval -> facets and node labels • prerequisites for exploding search -> truly generic relationships • and a lot more that we cannot cover in this talk.
Examples drawn from: a guideline Linked Heritage is a 30 month EU project, started on 1st April 2011. Your terminology as a part of the semantic webrecommendations for design and management (PDF, 2.5MB) Source: http://www.linkedheritage.eu/ [2013-06-29]
Examples drawn from: a thesaurus Source: http://www.getty.edu/research/tools/vocabularies/aat/
Examples drawn from: another thesaurus Source: http://agclass.nal.usda.gov/agt.shtml
Transitive hierarchies Building a thesaurus the hop way ;-) E X P L O D E ? . animal material ... <animal material by form or function> ..... <excretions and secretions> ....... milk ......... cream (milk) ......... ice cream ......... cheese ........... cheese cake ............. cheese cake lifter ............... Käsekuchentortenhebergriff ......... quark (cheese) ........... Quark mit Soße (a bunch of baloney)
Where transitivity fails Let‘s perform an all-and-some test All ice creme is a kind of excretion and secretion and this in turn is a kind of animal material Note: White or yellowish white fluid secreted by the mammary glands of female mammals. Note that this cannot be done by machines Source: AAT Online, Hierarchy of <excretions and secretions> [2013-06-26]
Distinguishing hierarchical relationships • The standard defines the following ways of expressing hierarchies: • unspecified hierarchical relationship of broader and narrower terms. This can only be tested for cycles. • specified hierarchical relationships • the generic relationship; each pair of concepts must pass the all-and-some test • the hierarchical whole-part relationship; is transitive if ISO 25964-1 is followed • the instance relationship; not transitive. • Transitivity does not hold when generic and whole-part relationships are mixed.
If it isn’t generic, then don’t say so Generic hierarchical relationships Facets and node labels Source: http://www.getty.edu/vow/AATFullDisplay?find=ice+cream&logic=AND¬e=&subjectid=300266767 accessed 2013-07-02
Associative relationships “If two terms or concepts already have one of the basic relationships, no other basic relationship between the same terms or concepts is admissible. “ This example is debatable. Source: AAT hierarchy for plates (dishes)
Grouping by dimension Node label introducing a facet BT How does a machine know that „(products)“ can be used as a facet? Node label showing a characteristic of division BT Source: ISO 25964-1:2011, Figure 4 and 6
Facet names Facets are not modelled explicitly in the standard ; instead the standard enumerates three options to represent facet names : Facet names included as preferred terms and treated as top terms, under which complete hierarchies may be shown (12.2.4 Hierarchical display, Figure 6, P. 75) Facet names appear only in node labels, and there is no explicit display of complete facets(11 Facet analysis, Figure 4, P. 69) Facet names appear as the names of concept groups (objects) equipment (people) people agricultural industries (people) farmmanagers products
Art & Architecture Thesaurus Abusing hasTopConcept Following these guidelines verbartim will look like this:
Misunderstanding splitting of compounds “If there are compound terms in your terminology, try as much as possible to decompose them in order to get to a simple form.” Source: http://www.linkedheritage.eu/, p. 69 [2013-06-29]
Retrieving split compounds • Example from the ISO standard: • coal mining • USE+ coal AND mining • Assumes that a retrieval system is aware of compound equivalences and either • prompts the user to rewrite the query as suggested by the thesaurus, or • rewrites the query transparently by evaluating the USE+ relationships
Compound equivalence The ISO data model defines references between a compound term and two or more preferred terms. From a linguistic point of view a compound usually denotes an intersection of two or more constituent concepts which would evaluate to a narrower concept. Our preliminary explorations have shown that a concept-based modelling of compound equivalence not only fulfills the requirements addressed by the term-based model, but also has distinct advantages such as minimising the number of relationship types and permitting a straightforward implementation of multlingualism.
Qualifiers Misunderstanding disambiguation Source: http://de.slideshare.net/EuropeanaLocal/roxanne-wyns-belgium-2009
Unambiguous concepts There should be no duplicate terms for the same language A qualifier should be added to each homographic term Hut USE MonumentHut cranes (birds) cranes (lifting equipment)
Fuzzy clarification Denkmale Denkmäler Geografie Geographie karolingisch-ottonisch karolingisch, ottonisch Musikinstrument Musikinstument Flügel (Instrument) Flügel (Musikinstrument)
Controlling the qualifier vocabulary buildings single built works structures symbols visual works
Assigned top concept Manually assigning a top concept to individual nodes in the hierarchy is likely to produce errors that, even though they can be detected algorithmically, cannot be resolved without human intervention. Thus, a TT relationship (or hasTopConcept relationship in the ISO data model) should never be asserted explicitly, but always inferred by following the BT axis within the hierarchy tree.
A case for asserting top concepts Displaying the inferred top concept (or the entire hierarchy chain) next to any node in the hierarchy tree can greatly facilitate intellectual plausibility checks. Asserting a set of top-level nodes in advance, i.e. before the concept hierarchies are fully worked out, can also be useful in guiding the vocabulary development process. In this case, however, the property of being a top concept should only be taken as a declaration of intent, by which the actual outcome of thesaurus construction can be measured.
What’s in a term (without a definition)? Mandating a definition may have prevented the flaw in this tiny RDA element vocabulary.
References http://www.niso.org/schemas/iso25964/iso25964-1_v1.4.xsd http://www.niso.org/schemas/iso25964/example_multi_lingual_08-09T15-21.xml http://www.niso.org/schemas/iso25964/schema-intro/ http://www.niso.org/schemas/iso25964/Model_2011-06-02.jpg https://github.com/cmader/qSKOS/wiki/Quality-Issues