400 likes | 578 Views
Semantic annotation and search of large virtual heritage collections. Guus Schreiber Free University Amsterdam. Overview. A non-technical view on the Semantic Web Work on Semantic-Web deployment SKOS, RDFa Semantic annotation and search in virtual collections: the E-Culture example.
E N D
Semantic annotation and search of large virtual heritage collections Guus Schreiber Free University Amsterdam
Overview • A non-technical view on the Semantic Web • Work on Semantic-Web deployment • SKOS, RDFa • Semantic annotation and search in virtual collections: the E-Culture example
The Web: resources and links URL Web link URL
The Semantic Web: typed resources and links Painting “Femme aux chapeau” SFMOMA Dublin Core creator ULAN Henri Matisse Web link URL URL
Principle 1: semantic annotation • Description of web objects with “concepts” from a shared vocabulary
Search for objects which are linked via concepts (semantic link) Use the type of semantic link to provide meaningful presentation of the search results Principle 2: semantic search ape great ape urang-utang orange
Principle 3: multiple vocabularies. or: the myth of a unified vocabulary • In large virtual collections there are always multiple vocabularies • In multiple languages • Every vocabulary has its own perspective • You can’t just merge them • But you can use vocabularies jointly by defining a limited set of links • “Vocabulary alignment” • It is surprising what you can do with just a few links
Example “Tokugawa” AAT style/period Edo (Japanese period) Tokugawa SVCN period Edo SVCN is local in-house thesaurus
classes and individuals subclasses properties subproperties domain/range of properties XML Schema datatypes equality, inequality inverse, transitive, symmetric, functional properties property constraints: cardinality, allValuesFrom, someValuesFrom conjunction, disjunction, negation of classes hasValue, enumerated type RDF/OWL language constructs
How useful are RDF and OWL? • RDF: basic level of interoperability • Some constructs of OWL are key: • Logical characteristics of properties: symmetric, transitive, inverse • Identity: sameAs • OWL pitfalls • Bad: if it is written in OWL it is an ontology • Worse: if it is not in OWL, then it is not an ontology
W3C Semantic Web Deployment Working Groupmaking vocabularies/thesauri/ontologies available on the Web • Schema for interoperable RDF/OWL representation of vocabularies • SKOS • Publication guidelines: • URI management, representation of versions • Embedding RDF in (X)HTML pages • RDFa
SKOS: pattern for thesaurus modeling • Based on ISO standard • RDF representation • Documentation: http://www.w3.org/TR/swbp-skos-core-guide/ • Base class: SKOS Concept
Semantic relation:broader and narrower • No subclass semantics assumed!
Indexing a resource with a SKOS concept • primarySubject is defined as subproperty
Adding semantics • Adding OWL statements • Interpretations of thesaurus relations such as narrower as subclass-of are often imprecise (but can still be useful) • Learning relations between thesauri is important form of additional semantics • Example: AAT contains styles; ULAN contains artists, but there is no link • Availability of this kind of alignment knowledge is extremely useful
W3C standardization process • Input: draft specification • Collect use cases • Derive requirements • Create issues list: requirements that cannot be handled by the draft spec • Propose resolutions for issues • Continuously: ask for public feedback/comments • Get consensus on amended spec • Find two independent implementation for each feature in the spec
Example issue: relationships between lexical labels • In draft SKOS spec lexical labels of concepts are represented as datatype properties • Use cases require relations between labels, e.g. “AAT” is an acronym of “Art & Architecture Thesaurus” • This is a problem because literals have no URI (so cannot be subject of an RDF property) • Possible resolutions: • Labels/terms as classes • Relaxing constraints on label property • …..
Recipes for vocabulary URIs • Simplified rule: • Use “hash" variant” for vocabularies that are relatively small and require frequent access http://www.w3.org/2004/02/skos/core#Concept • Use “slash” variant for large vocabularies, where you do not want always the whole vocabulary to be retrieved http://xmlns.com/foaf/0.1/Person • For more information and other recipes, see: http://www.w3.org/TR/swbp-vocab-pub/
Query for WordNet URI returns “concept-bounded description”
RDFa: embedding RDF metadata in an (X)HTML file Regular HTML HTML with RDFa Resulting RDF statements
E-Culture demonstrator • Part of large Dutch knowledge-economy project MultimediaN • Partners: VU, CWI, UvA, DEN,ICN • People: • Alia Amin, Lora Aroyo, Mark van Assem, Victor de Boer, Lynda Hardman, Michiel Hildebrand, Laura Hollink, Marco de Niet, Borys Omelayenko, Marie-France van Orsouw, Jos Taekema, Annemiek Teesing, Anna Tordai, Jan Wielemaker, Bob Wielinga • Artchive.com, ICN: Rijksmuseum Amsterdam, Dutch ethnology musea (Amsterdam, Leiden), National Library (Bibliopolis)
Use case: painting style Find paintings of a similar style KLIMT, Gustav Portrait of Adele Bloch-Bauer I 1907 Oil and gold on canvas 138 x 138 cm Austrian Gallery, Vienna
How can we find this other ‘Art nouveau’ painting? MUNCH, Edvard The Scream 1893 Oil, tempera and pastel on cardboard 91 x 73.5 cm National Gallery, Oslo
Issues w.r.t. the use case • Parse annotation to find matches with thesauri terms • E.g. match artists to ULAN individuals • Artists-style links • AAT contains styles; ULAN contains artists, but there is no link • Learn link from corpora • Derive it from other annotations • Domain-specific rules/reasoning needed • see example in SWRL doc • Painters may have painted in multiple styles
Example enrichment • Learning relations between art styles in AAT and artists in ULAN through NLP of art0historic texts • But don’t learn things that already exist!
Perspectives • Basic Semantic Web technology is ready for deployment • in open knowledge-rich domains • Important research issues: scalability, vocabulary alignment, metadata extraction • Web 2.0 features: • Involving community experts in annotation • Personalization, myArt • Social barriers have to be overcome! • “open door” policy • Involvement of general public => issues of “quality” • Importance of using open standards • Away from custom-made flashy web sites