310 likes | 453 Views
Vocabulary Matching for Book Indexing Suggestion in Linked Libraries – A Prototype Implementation & Evaluation. Antoine Isaac , Dirk Kramer, Lourens van der Meij, Shenghui Wang, Stefan Schlobach, Johan Stapel. Problem: subject indexing. Describing subjects of books
E N D
Vocabulary Matching for Book IndexingSuggestion in Linked Libraries – A PrototypeImplementation & Evaluation Antoine Isaac, Dirk Kramer, Lourens van der Meij, Shenghui Wang, Stefan Schlobach, Johan Stapel
Problem: subject indexing • Describing subjects of books • Using concepts from vocabularies (e.g. thesauri)
Problem: re-indexing • Describing a book that has already be described • With a new vocabulary • Fitting a different context (e.g., different libraries)
Why re-indexing at KB? • The Dutch National Library (KB) holds many books that are also in other Dutch public libraries • KB deposit uses Brinkman thesaurus for indexing • Public Libraries use Biblion thesaurus
A wider issue • KB shares books with many other libraries • All having their own description practices
Room for improvement? • Libraries devote large resources to indexing • 20 people at KB • About 20,000 books per year • Leveraging already existing descriptions for re-indexing can be beneficial for both sides
Alignment and re-indexing • STITCH project • Tackling semantic interoperability in Cultural Heritage • Using ontology alignment • Mappings between concepts from different vocabularies can be used for re-indexing Basic idea: replace concepts in descriptions by conceptually equivalent concepts
Goal: a re-indexing prototype • Past: preliminary experiments with KB data • Now: building a prototype and • plugging it onto the KB production system • having it evaluated by its potential users (indexers) • Prototype case: Dutch public libraries / KB Suggesting Brinkman subjects based on Biblion ones
Alignment and re-indexing: requirements Subjects can be complex • Mappings between groups of concepts "Travel guides" + "Spain" → "Spain; travel guides" Concepts are used in descriptions • Mappings taking into account extensional semantics "Building engineering" → "Learning material ; building engineering"
Obtaining re-indexing rules • Lexical alignments are not good enough • Probabilistic rules are calculated • Using extension of concepts: existing indexing • Simple probabilities, with adhoc adjustment "Travel guides","Spain"→"Spain; travel guides", 0.982 • Not only based on Biblion subjects • AUT – main authors of books • KAR – “characteristic” • DGP – intellectual level/target group
Demo Doesn't work?
User study • Quantitative aspect • How well does the tool compare to human subject indexing? • Qualitative aspect • User satisfaction • Improvement suggestion
Evaluation setting • 6 indexers • 6 weeks • 284 books • Evaluation integrated in daily indexing work • Pre-evaluation briefing • Questionnaire during evaluation • Post-evaluation de-briefing & questionnaire
User study results • Top ranked mappings are indeed much better • Individual book satisfaction level > 70%
User study results (1) • But the general satisfaction is lower • Only two out of six would use the tool as such • Quality of suggestions • Lower-level suggestions are often not meaningful • Perception of suggestions' quality • Long lists with wrong suggestions ad the end are bad • Ranking is appreciated, but it is not enough
User study results (2) Suggestions were found promising • Bridging the indexing gap between collections • Different indexing strategies "Persian language" (Biblion) vs. "Iranian language and literature" (Brinkman) Lots of suggestions for improvement • More re-indexing! • Suggesting concepts from other vocabularies • More context metadata as input
Conclusions • Shows the potential of re-using data in a library network • Alignment approach fitting indexing practice • Concrete demonstration, in KB production environment • Technology transfer: KB wants to continue efforts • Flexibility: architecture ready to exploit other vocabularies • Linked data & SKOS
Thank you! • Questions?
Screenshots • Back