190 likes | 202 Views
EHRI Vocabularies and Linked Open Data : An enrichment?. Annelies van Nispen 15/05/2018. The EHRI Portal connecting archives and users. Online inventory of institutions and collections about the Holocaust
E N D
EHRI Vocabularies and Linked Open Data : An enrichment? CONNECTING COLLECTIONS Annelies van Nispen 15/05/2018
The EHRI Portal connecting archives and users Online inventory of institutions and collections about the Holocaust • Making sources visible in a systematic fashion in order to counteract the fragmentation of the sources • Reveal interconnections (e.g. through a multilingual thesaurus; collation of authority files; relationships between originals and copies) • EHRI focuses on collection descriptions
EHRI Vocabularies • EHRI Thesaurus (subject terms) • Camps • Ghettos • Administrative districts • Places (Geonames) • Persons • Corporate bodies
EHRI Vocabularies • Main tool for multilingual information • Retrieval & search functionality • Cataloguing and integration tool for incoming data • Holocaust related knowledge base, useful for further developments eg. NER, LOD or …..
EHRI Vocabularies and Linked Open Data Experimentswith EHRI Vocabulariesand LOD • Places – Geonames • Persons – VIAF • Camps & Ghettos – Wikidata Aim: EnrichEHRIsVocabulariesandwherepossiblepublish as LOD
GeoNames Reconciliation - problematic cases • Places not listed in GeoNames (e.g. Altreich) • Places listed in GeoNames but missing spelling variants (e.g. Babyn Iar) • More than one location per place names, e.g. "Berlin" from "1(Berlin, sowjetischer Sektor)" mapped to 176 different locations • Access points which are difficult to disambiguate without context (e.g. "Bauer" can be the German word for "peasant", a German family name, or a German town)
Geonames: More issues • access points withtyposnotclusteredbyOpenRefine (e.g. "Aushwitz" instead of Auschwitz) • access points wronglyfiltered out as person names (e.g. "Amsterdam, Landsmeer”) • Common nounssometimesgivefalsepositives, e.g. "Artillerie" from "1(Artillerie)" mappedto a part of town in New Caledonia • Problem: Historicalstates, such as Yugoslavia or Czechoslovakia, are notproperlylinkedtoparents / children in theGeoNames dataset
EHRI Personalities and VIAF • Experiment with automatic matching to VIAF of persons data fromYadVashem, CDEC andCegesomawith manual quality check on matching results. • Issues : • Manypeoplecarrythesame name • Notenough information on birth/death dates, places or professiontodistinguishindividuals • Spelling variants/mistakes
Outcome of experiment • 100 YV names, 68 were matched against entries in VIAF. High ambiguity in matching: a total of 234 matches, each name was matched 3.44 times • 68 matches: 31 were correct and 37 false positives. The ambiguity in cases of a correct match was sometimes higher, eg correct one in a set of 5/6 matches • Cegesoma and CDEC data give similar results, with CDEC data even much higher false positives
Import Ghettos in Wikidata • Name of the ghetto in different languages • Unique EHRI identifier for the ghetto • Associated place name and its unique identifier in Wikidata • Coordinates from Yad Vashem and/or USHMM • Unique identifiers from online resources, including The Yad Vashem Encyclopedia of the Ghettos During the Holocaust and the USHMM Holocaust Encyclopedia • Added statement qualifying the entry as a “ghetto in Nazi-occupied Europe”
Wikidata to EHRI Portal • English name of the ghetto • Place where the ghetto was located • Coordinates for the location • EHRI-assigned unique identifier for the ghetto Associated unique identifiers from online resources • Multilingual labels generated from the name of the places
EHRI Vocabularies & LOD: An enrichment? Mixed results • Geonames set has problems, but we will use for further development • Personalities too much errors and sensitive vocabulary • Ghettos, Camps and Wikidata a positive experience 14
CONNECTING KNOWLEDGE NIOD Institute for War, Holocaust and Genocide Studies (NL) CEGESOMA Centre for Historical Research and Documentation on War and Contemporary Society (BE) Jewish Museum in Prague (CZ) Center for Holocaust Studies at the Institute for Contemporary History in Munich (DE) YAD VASHEM The Holocaust Martyrs’ and Heroes’ Remembrance Authority (IL) United States Holocaust Memorial Museum (USA) Bundesarchiv (DE) The Wiener Library Institute for the Study of the Holocaust & Genocide (UK) Holocaust Documentation Centre (SK) Polish Center for Holocaust Research (PL) The Jewish Museum of Greece (GR) Jewish Historical Institute (PL) King’s College London (UK) Ontotext AD (BG) Elie Wiesel National Institute for the Study of Holocaust in Romania (RO) DANS Data Archiving and Networked Services (NL) Shoah Memorial, Museum, Center for Contemporary Jewish Documentation (FR) ITS International Tracing Service (DE) Hungarian Jewish Archives (HU) INRIA Institute for Research in Computer Science and Automation (FR) Vilna Gaon State Jewish Museum (LT) VWI Vienna Wiesenthal Institute for Holocaust Studies (AT) Foundation Jewish Contemporary Documentation Center (IT) EHRI is funded by the European Union