1 / 26

A land of milk and ice cream

A land of milk and ice cream . Building better controlled vocabularies: Guidance from ISO 25964 Jutta Lindenthal, Detlev Balzer ISKO UK Conference, 2013. Quality criteria for a thesaurus .

ferris
Download Presentation

A land of milk and ice cream

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A land of milk and ice cream

  2. Building better controlled vocabularies:Guidance from ISO 25964 Jutta Lindenthal, Detlev Balzer ISKO UK Conference, 2013

  3. Quality criteria for a thesaurus • A thesaurus is an infomation retrieval instrument. Its quality is determined by the extent to which it serves this purpose. • It is expected to support the indexing process i.a. through • unambiguous descriptors -> duplicate control and qualifiers • directions for indexers -> compound equivalence • It supports the retrieval process i.a. through providing • navigation paths for browsing -> concept groups • a framework for facet-based retrieval -> facets and node labels • prerequisites for exploding search -> truly generic relationships • and a lot more that we cannot cover in this talk.

  4. Examples drawn from: a guideline Linked Heritage is a 30 month EU project, started on 1st April 2011. Your terminology as a part of the semantic webrecommendations for design and management (PDF, 2.5MB) Source: http://www.linkedheritage.eu/ [2013-06-29]

  5. Examples drawn from: a thesaurus Source: http://www.getty.edu/research/tools/vocabularies/aat/

  6. Examples drawn from: another thesaurus Source: http://agclass.nal.usda.gov/agt.shtml

  7. Transitive hierarchies Building a thesaurus the hop way ;-) E X P L O D E ? . animal material ... <animal material by form or function> ..... <excretions and secretions> ....... milk ......... cream (milk) ......... ice cream ......... cheese ........... cheese cake ............. cheese cake lifter ............... Käsekuchentortenhebergriff ......... quark (cheese) ........... Quark mit Soße (a bunch of baloney)

  8. Where transitivity fails Let‘s perform an all-and-some test All ice creme is a kind of excretion and secretion and this in turn is a kind of animal material Note: White or yellowish white fluid secreted by the mammary glands of female mammals. Note that this cannot be done by machines Source: AAT Online, Hierarchy of <excretions and secretions> [2013-06-26]

  9. Distinguishing hierarchical relationships • The standard defines the following ways of expressing hierarchies: • unspecified hierarchical relationship of broader and narrower terms. This can only be tested for cycles. • specified hierarchical relationships • the generic relationship; each pair of concepts must pass the all-and-some test • the hierarchical whole-part relationship; is transitive if ISO 25964-1 is followed • the instance relationship; not transitive. • Transitivity does not hold when generic and whole-part relationships are mixed.

  10. If it isn’t generic, then don’t say so Generic hierarchical relationships Facets and node labels Source: http://www.getty.edu/vow/AATFullDisplay?find=ice+cream&logic=AND&note=&subjectid=300266767 accessed 2013-07-02

  11. Associative relationships “If two terms or concepts already have one of the basic relationships, no other basic relationship between the same terms or concepts is admissible. “ This example is debatable. Source: AAT hierarchy for plates (dishes)

  12. Grouping by dimension Node label introducing a facet BT How does a machine know that „(products)“ can be used as a facet? Node label showing a characteristic of division BT Source: ISO 25964-1:2011, Figure 4 and 6

  13. Facet names Facets are not modelled explicitly in the standard ; instead the standard enumerates three options to represent facet names : Facet names included as preferred terms and treated as top terms, under which complete hierarchies may be shown (12.2.4 Hierarchical display, Figure 6, P. 75) Facet names appear only in node labels, and there is no explicit display of complete facets(11 Facet analysis, Figure 4, P. 69) Facet names appear as the names of concept groups (objects) equipment (people) people agricultural industries (people) farmmanagers products

  14. Art & Architecture Thesaurus Abusing hasTopConcept Following these guidelines verbartim will look like this:

  15. Misunderstanding splitting of compounds “If there are compound terms in your terminology, try as much as possible to decompose them in order to get to a simple form.” Source: http://www.linkedheritage.eu/, p. 69 [2013-06-29]

  16. Example of compound equivalence

  17. Retrieving split compounds • Example from the ISO standard: • coal mining • USE+ coal AND mining • Assumes that a retrieval system is aware of compound equivalences and either • prompts the user to rewrite the query as suggested by the thesaurus, or • rewrites the query transparently by evaluating the USE+ relationships

  18. Compound equivalence The ISO data model defines references between a compound term and two or more preferred terms. From a linguistic point of view a compound usually denotes an intersection of two or more constituent concepts which would evaluate to a narrower concept. Our preliminary explorations have shown that a concept-based modelling of compound equivalence not only fulfills the requirements addressed by the term-based model, but also has distinct advantages such as minimising the number of relationship types and permitting a straightforward implementation of multlingualism.

  19. Qualifiers Misunderstanding disambiguation Source: http://de.slideshare.net/EuropeanaLocal/roxanne-wyns-belgium-2009

  20. Unambiguous concepts There should be no duplicate terms for the same language A qualifier should be added to each homographic term Hut USE MonumentHut cranes (birds) cranes (lifting equipment)

  21. Fuzzy clarification Denkmale Denkmäler Geografie Geographie karolingisch-ottonisch karolingisch, ottonisch Musikinstrument Musikinstument Flügel (Instrument) Flügel (Musikinstrument)

  22. Controlling the qualifier vocabulary buildings single built works structures symbols visual works

  23. Assigned top concept Manually assigning a top concept to individual nodes in the hierarchy is likely to produce errors that, even though they can be detected algorithmically, cannot be resolved without human intervention. Thus, a TT relationship (or hasTopConcept relationship in the ISO data model) should never be asserted explicitly, but always inferred by following the BT axis within the hierarchy tree.

  24. A case for asserting top concepts Displaying the inferred top concept (or the entire hierarchy chain) next to any node in the hierarchy tree can greatly facilitate intellectual plausibility checks. Asserting a set of top-level nodes in advance, i.e. before the concept hierarchies are fully worked out, can also be useful in guiding the vocabulary development process. In this case, however, the property of being a top concept should only be taken as a declaration of intent, by which the actual outcome of thesaurus construction can be measured.

  25. What’s in a term (without a definition)? Mandating a definition may have prevented the flaw in this tiny RDA element vocabulary.

  26. References http://www.niso.org/schemas/iso25964/iso25964-1_v1.4.xsd http://www.niso.org/schemas/iso25964/example_multi_lingual_08-09T15-21.xml http://www.niso.org/schemas/iso25964/schema-intro/ http://www.niso.org/schemas/iso25964/Model_2011-06-02.jpg https://github.com/cmader/qSKOS/wiki/Quality-Issues

More Related