190 likes | 347 Views
Antoine Isaac Europeana – VU University Amsterdam. Dagstuhl Multilingual Semantic Web seminar. Europeana. “A digital library that is a single, direct and multilingual access point to the European cultural heritage.” European Parliament. 24 M objects ( images, text, sound and video)
E N D
Antoine Isaac Europeana – VU University Amsterdam Dagstuhl Multilingual Semantic Web seminar
Europeana “A digital library that is a single, direct and multilingual access point to the European cultural heritage.” European Parliament • 24 M objects (images, text, sound and video) • From over 2.200 libraries, museums, archives • From 33 countries • For everyone
Dimensions of multilingual access • Interface • Search (query translation or document translation) • Result presentation • Browsing
Europeana's efforts • Interface translated into 26 languages • Query translation: only prototype • Query result filtering by country/language • Document translation (user enabled) • Semantic contextualization of objects • Multilingual enrichment/annotation of metadata
Current metadata in Europeana • Simple object records • Flat (text values) • Without language tags! • Only language-related info on metadata is at collection level • Can be "mul" Need to change! • a new Europeana Data Model (EDM)
"Semantic layer" of contextual resources(concepts, persons, places, events...) Cultural artefact Buildling Sculpture Painting Networked objects • Exploiting semantic relations • e.g. “broader concept”, “place of birth”, “involved person”…
Fetching already available linked data E.g., from libraries http://www.w3.org/2005/Incubator/lld/XGR-lld-vocabdataset/
Interoperability • Encouraging the use of RDF + common and simple elements
Interoperability • Encouraging the use of common and simple data elements <skos:Concept rdf:about="http://www.mimo-db.eu/InstrumentsKeywords/2308"> <skos:prefLabel xml:lang="fr">Piano carré</skos:prefLabel> <skos:prefLabel xml:lang="it">Pianoforte a tavolino</skos:prefLabel> <skos:prefLabel xml:lang="en">Square pianoforte</skos:prefLabel> <skos:prefLabel xml:lang="de">Tafelklavier</skos:prefLabel> <skos:prefLabel xml:lang="nl">Tafelpiano</skos:prefLabel> <skos:prefLabel xml:lang="sv">Taffel</skos:prefLabel> <skos:broader> <skos:Concept rdf:about="http://www.mimo-db.eu/InstrumentsKeywords/2273"> <skos:prefLabel xml:lang="en">Pianofortes</skos:prefLabel> </skos:Concept> </skos:broader> </skos:Concept>
Interoperability • mixed nature of eligible contextual resources: dictionaries, synonym/translation lists, thesauri, authority lists, gazetteers… • interplay: “semantic” data next to multilingual data
Simultaneous approaches • Getting richer semantic/multilingual metadata from providers • Fetching third-party contextual data and linking it to “un-contextualized” objects • Linking contextual data from an institution to another more general / more commonly used contextual dataset • Dbpedia.org, VIAF.org…
Current status • All this is work in progress and will take time R&D prototypes (EuropeanaConnect) showing the challenges of gathering appropriate multilingual tools and data • First tests of simple techniques in production portal: GeoNames (places) and GEMET (concepts) Encouraging, but illustrate issues with too naïve approaches (no NLP) and incomplete data • Cheval • Poison http://www.europeana.eu
Problems & requirements For providers & Europeana • Continue work on metadata • Benchmarking (cf. CHiC lab@ CLEF) • Positioning as consumers and contributors of data (cf Asun’s slides) data.europeana.eu For language-intensive tools and resources • Availability: open resources • Interoperability • Simplicity • But not always! E.g., not only “first hit” translations • Scale: scalability of tools, number and scope of datasets • Many languages, some lesser-resourced (wrt. English)
Another illustration: VOICES projectSomething entirely different but not completely unrelated Voice-based community-centric mobile services for social development • Easing communication on agricultural trade • Listing of products/prices via phone/radio • Pilot in Mali Challenges • Data-centric project, but language technology plays a crucial role • Objects should be provided with textual and audio labels (text-to-speech system) in different languages • Local languages: e.g., Bambara • Lack of resource: need low-cost, easy-to-adapt solutions Victor de Boer, VU Amsterdam (v.de.boer@cs.vu.nl)
Thank you aisaac@few.vu.nl http://www.few.vu.nl/~aisaac/ Some slides based on Marlies Olensky and Juliane Stiller - Multilingual Web Workshop, June 11, 2012, Dublin