490 likes | 652 Views
Interoperability in the Cultural Heritage Domain. Lourens van der Meij VU Amsterdam – KB (part of sheets by A.Isaac) October 3 rd , 2008. Background. CATCH (NWO) C ontinuous A ccess T o C ultural H eritage Computer science research projects
E N D
Interoperability in the Cultural Heritage Domain Lourens van der Meij VU Amsterdam – KB (part of sheets by A.Isaac) October 3rd , 2008
Interoperability in the Cultural Heritage Domain Background • CATCH (NWO) • Continuous Access To Cultural Heritage • Computer science research projects • Applied to Cultural Heritage (Libraries, Musea) • STITCH • SemanTic Interoperability To access Cultural Heritage • Interoperability: • Exchanging (standardization) • Integrating (translating, linking) metadata
Interoperability in the Cultural Heritage Domain Intention Show through example applications that • Integration of data, collections, and services • Interoperability: • Data standardized such that it can be used across different applications • Functionality reusable via services. • Creating mappings, semantic links between data from different sources is important in the Cultural Heritage Domain
Interoperability in the Cultural Heritage Domain First • Illustrate Integrated access to collections in the CH domain by looking at use case. • Introduction of the use case • About vocabulaires • Introduce the collections that will be integrated • Faceted browsing • What we want -> • Demo • Requirements, details
Interoperability in the Cultural Heritage Domain (Integrated) Access to collections • Collections: (records) of books, pieces of art,… • Electronic access, web portal. • STITCH focuses on semantics: structured access using the available knowledge sources, not full text search • Records: meta data, information about the object • Author • Date • Subject • CH institutes often maintain knowledge structures(KOS), vocabularies, to facilitate storage and access and maintenance. • Subject meta data, access through KOS focus of STITCH.
Interoperability in the Cultural Heritage Domain Vocabularies (Knowledge Structures, KOS) • Thesauri, classification systems, structuring collections, describing content, form, aspects of collection elements. • Many vocabularies, within the KB: STITCH is cooperation between VU Amsterdam (KRR group), National Library(KB) and MPI Nijmegen. In the KB in the order of 10 vocabularies are maintained internally, and 20 or more external vocabularies play a role. Why? • History • Specialized collections, particular views on the collection and theories how access should be provided. • Examples of vocabularies in the demos.
Interoperability in the Cultural Heritage Domain Vocabularies • Many different (kinds) of Vocabularies • Many different representations, data formats, methods of access. • Integrated access requires • standardized representation of vocabularies and collections • standardized access => services • Providing links between elements of vocabularies, alignment of vocabularies • Next: example of integration
Interoperability in the Cultural Heritage Domain Illustration, use case STITCH • Integrated access to two collections: • KB : geillumineerde manuscripten • BnF: Mandragore, manuscrits enluminés • STITCH focus: • Integration • Alignment, techniques (and standards) • Interoperability • RDF, SKOS Those aspects will be discussed after the first demo.
Interoperability in the Cultural Heritage Domain KB Illustrated Manuscripts
Interoperability in the Cultural Heritage Domain KB Illustrated Manuscripts: Iconclass
Interoperability in the Cultural Heritage Domain Faceted browsing • Access the collection, using structure of the vocabularies • Different dimensions: subject, author,.. • Use the hierarchy of vocabularies if there is such to group together objects • Lions, Giraffes, Zebras -> animals. Distinguish them as a group.
Interoperability in the Cultural Heritage Domain What we have
Interoperability in the Cultural Heritage Domain What we want
Interoperability in the Cultural Heritage Domain Demo • KB Illuminated Manuscripts • BNF Mandragore Manuscripts • http://galjas.cs.vu.nl:33333/MANDRA-SV-ICE-mandraNewNONE , amphibians • Wheat
Interoperability in the Cultural Heritage Domain Integrated Access • Integrated semantic access requires • standardized representation of vocabularies and collections • standardized access => services • Providing links between elements of vocabularies.
Interoperability in the Cultural Heritage Domain Standardized representation • Use of semantic web techniques • “Things” are represented as “resources”,URIs, over any application and data set • Values as simple strings, numbers(Literals), URIs • Properties as typed, named links between URIs and URIs and Literals • Theory, reasoning methods. • interoperability, some standardization • Still need standardization on how to represent CH objects (xml:Dublin core), vocabularies (SKOS), links between elements of vocabularies.
Interoperability in the Cultural Heritage Domain skos:ConceptScheme SKOS: Example rdf:type skos:Concept http://www.iconclass.nl/ rdf:type skos:inScheme http://www.iconclass.nl/s_11F skos:prefLabel “the Virgin Mary”@en skos:broader “la Vierge Marie”@fr skos:prefLabel http://www.iconclass.nl/s_11
Interoperability in the Cultural Heritage Domain SKOS (Simple Knowledge Organization System) • SKOS offers building blocks to represent KOSs in RDF • Objects:Concept and ConceptScheme • Lexical properties (multilingual) • prefLabel • altLabel • Semantic relations • broader, narrower • related • Notes • scopeNote • definition …
Interoperability in the Cultural Heritage Domain Vocabulary alignment • Aim: finding semantic correspondences between vocabulary elements • “klassieke ruïnes” ≈ “landschap met ruïnes” • “maagd Maria” = “Heilige Moeder” • Doing it (semi-) automatically • Vocabularies are big (tens of thousands concepts) • They change
céréale, grain, blé blé Interoperability in the Cultural Heritage Domain Automatic alignment techniques • Lexical Labels of entities and textual definitions • Structural Structure of the vocabularies • Background knowledge Using a shared conceptual reference to find links • Extensional Object information (e.g. book indexing)
céréale, grain, blé blé Interoperability in the Cultural Heritage Domain Automatic alignment techniques • Lexical Labels of entities and textual definitions • Structural Structure of the vocabularies • Background knowledge Using a shared conceptual reference to find links • Extensional Object information (e.g. book indexing)
Interoperability in the Cultural Heritage Domain Extensional Statistical Alignment • Object information (e.g. book indexing) “Dutch Literature” Thesaurus 1 Thesaurus 2 “Dutch” Collection of books
Interoperability in the Cultural Heritage Domain Results 1: 9132.9 (1704 3479 976) Schilderijen - schilderkunst 2: 8088.5 (1204 2330 767) Kwaliteitszorg - kwaliteitsmanagement 3: 6232.7 (820 1572 543) Personeelsmanagement - personeelsbeleid 4: 5392.1 (1399 3271 622) Beeldende kunsten - beeldende kunst 5: 5063.1 (4951 1152 613) Nederlands - Nederlandse taalkunde 17: 3421.8 (280 714 243) Diabetes mellitus - suikerziekte
Interoperability in the Cultural Heritage Domain Alignment: no Trivial Solution • Current techniques are not reliable as unique source of knowledge • What is a good alignment? • Evaluation criteria? • => What will it be used for? Usage scenarios • Integrated Search • Reindexing • Thesaurus merging • Navigation => faceted browsing
Interoperability in the Cultural Heritage Domain What next • Evaluation, lessons learned • What next -> • Second use case: reindexing • (Vocabulary service) • Conclusion
Interoperability in the Cultural Heritage Domain Why usage scenarios • Evaluation of alignments depends on its use. • Real world applications provide test of quality of alignments • Requirements on alignments depend on their use. • What kinds of links should be distinguished? • Optional demo evaluation: • http://localhost:33344/logineval • http://kits.cs.vu.nl:33344/logineval • Next, reindexing, nearest to real world application.
Interoperability in the Cultural Heritage Domain Situation at Dutch libraries, National Library(=KB) • KB: two large collections: • DEPOT?Deposit collection: all Dutch language publications) • Own Scientific collection • Subject indexing using two completely different indexing systems Brinkman, GOO • Common automation system for NL, Eu (OCLC-Pica) • Meta data of books, contains lots of fields • Een boek, publicatie door verschillende bibliotheken voorzien van meta data, gebruik makend van vele verschillende vocabulaires.
Interoperability in the Cultural Heritage Domain Reindexing • KB has about 20 people indexing books daily, about 20,000 books per year are being indexed. • Indexing even internally according to different vocabularies. Indexing: adding keywords and classification information to books. • Some books come with indexing done by other libraries (openbare bibliotheken, Biblion). • If Biblion indices, or combinations could be translated to KB indices (Brinkman). Less work for KB.
Interoperability in the Cultural Heritage Domain WinIBW • OCLC (PICA) automatiseringssysteem voor bibliotheken in Nederland, ook gebruikt binnen Europa • Online Public Access Catalogue (OPAC) • WinIBW internet access to Pica system (local and central). Adding records, adding meta data, searching records. • Demo, closest to real world application.
Interoperability in the Cultural Heritage Domain Reindexing • Biblion -> Brinkman Fietstochten, Kapellen, Beesel, Heiligenbeelden,… -> Brinkman? Use alignment.. Bibl:Fietstochten -> Brinkman? Bibl:Kappellen -> Brinkman? DEMO (Voorbeeld z sel 3-10-2008 gd? 79)
Interoperability in the Cultural Heritage Domain Reindexing • Under evaluation • Improvement: • Use other meta data • Adapt scenario (pass 95% confidence records) • Many other uses.
Interoperability in the Cultural Heritage Domain Schets vocabulaires van belang voor de KB
Interoperability in the Cultural Heritage Domain Integrated Access • Services through the internet • Protocols, SOAP, REST,.. • Collection Access? • Vocabulary Access, Alignment access • http://eculture.cs.vu.nl:38080/vocreptags • http://localhost:8080/vocreptags
Interoperability in the Cultural Heritage Domain Lessons • Using semantic web techniques interoperability and integration of collections can be made easier. • Aligning vocabularies is of use in different situations. The alignment methods need to be fine-tuned to the application they are meant for. • Introducing new techniques, interaction between field CH and scientific institutes very valuable. • Standardization of access to collections and vocabularies should be dealt with (prototype has been developed).
Interoperability in the Cultural Heritage Domain Begrippen • An ontology in both computer science and information science is a formal representation of a set of concepts within a domain and the relationships between those concepts. It is used to reason about the properties of that domain, and may be used to define the domain. • Metadata (meta data, or sometimes metainformation) is "data about data", of any sort in any media. An item of metadata may describe an individual datum, or content item, or a collection of data including multiple content items and hierarchical levels, for example a database schema.
Interoperability in the Cultural Heritage Domain begrippen • A library classification is a system of coding and organizing library materials (books, serials, audiovisual materials, computer files, maps, manuscripts, realia) according to their subject and allocating a call number to that information resource. Similar to classification systems used in biology, bibliographic classification systems group entities that are similar together typically arranged in a hierarchical tree structure. • In information technology, a thesaurus represents a database or list of semantically orthogonal topical search keys. In the field of Artificial Intelligence, a thesaurus may sometimes be referred to as an ontology.