Antoine Isaac , Dirk Kramer, Lourens van der Meij, Shenghui Wang, Stefan Schlobach, Johan Stapel

Vocabulary Matching for Book IndexingSuggestion in Linked Libraries – A PrototypeImplementation & Evaluation Antoine Isaac, Dirk Kramer, Lourens van der Meij, Shenghui Wang, Stefan Schlobach, Johan Stapel

Problem: subject indexing • Describing subjects of books • Using concepts from vocabularies (e.g. thesauri)

Problem: re-indexing • Describing a book that has already be described • With a new vocabulary • Fitting a different context (e.g., different libraries)

Why re-indexing at KB? • The Dutch National Library (KB) holds many books that are also in other Dutch public libraries • KB deposit uses Brinkman thesaurus for indexing • Public Libraries use Biblion thesaurus

A wider issue • KB shares books with many other libraries • All having their own description practices

Room for improvement? • Libraries devote large resources to indexing • 20 people at KB • About 20,000 books per year • Leveraging already existing descriptions for re-indexing can be beneficial for both sides

Alignment and re-indexing • STITCH project • Tackling semantic interoperability in Cultural Heritage • Using ontology alignment • Mappings between concepts from different vocabularies can be used for re-indexing Basic idea: replace concepts in descriptions by conceptually equivalent concepts

Goal: a re-indexing prototype • Past: preliminary experiments with KB data • Now: building a prototype and • plugging it onto the KB production system • having it evaluated by its potential users (indexers) • Prototype case: Dutch public libraries / KB Suggesting Brinkman subjects based on Biblion ones

Alignment and re-indexing: requirements Subjects can be complex • Mappings between groups of concepts "Travel guides" + "Spain" → "Spain; travel guides" Concepts are used in descriptions • Mappings taking into account extensional semantics "Building engineering" → "Learning material ; building engineering"

Obtaining re-indexing rules • Lexical alignments are not good enough • Probabilistic rules are calculated • Using extension of concepts: existing indexing • Simple probabilities, with adhoc adjustment "Travel guides","Spain"→"Spain; travel guides", 0.982 • Not only based on Biblion subjects • AUT – main authors of books • KAR – “characteristic” • DGP – intellectual level/target group

Demo Doesn't work?

User study • Quantitative aspect • How well does the tool compare to human subject indexing? • Qualitative aspect • User satisfaction • Improvement suggestion

Evaluation setting • 6 indexers • 6 weeks • 284 books • Evaluation integrated in daily indexing work • Pre-evaluation briefing • Questionnaire during evaluation • Post-evaluation de-briefing & questionnaire

User study results • Top ranked mappings are indeed much better • Individual book satisfaction level > 70%

User study results (1) • But the general satisfaction is lower • Only two out of six would use the tool as such • Quality of suggestions • Lower-level suggestions are often not meaningful • Perception of suggestions' quality • Long lists with wrong suggestions ad the end are bad • Ranking is appreciated, but it is not enough

User study results (2) Suggestions were found promising • Bridging the indexing gap between collections • Different indexing strategies "Persian language" (Biblion) vs. "Iranian language and literature" (Brinkman) Lots of suggestions for improvement • More re-indexing! • Suggesting concepts from other vocabularies • More context metadata as input

Conclusions • Shows the potential of re-using data in a library network • Alignment approach fitting indexing practice • Concrete demonstration, in KB production environment • Technology transfer: KB wants to continue efforts • Flexibility: architecture ready to exploit other vocabularies • Linked data & SKOS

Prototype components

Linked libraries?

Thank you! • Questions?

Screenshots

WinIBW production tool

STITCH suggestion tool

Original metadata

Concept suggestions

Comparing with human re-indexing

Complement: lexical alignments

Adding subjects using thesaurus access

Concept suggestions

Saving and back to WinIBW

Screenshots • Back

Antoine Isaac , Dirk Kramer, Lourens van der Meij, Shenghui Wang, Stefan Schlobach, Johan Stapel