220 likes | 407 Views
Towards a Similarity-Based Identity Assumption Service for Historical Places. Establishing Meaningful Links Krzysztof Janowicz; Muenster Semantic Interoperability Lab (MUSIL). Outline. Motivation Scenario Annotation Theory Further Work.
E N D
Towards a Similarity-Based Identity Assumption Service for Historical Places Establishing Meaningful Links Krzysztof Janowicz; Muenster Semantic Interoperability Lab (MUSIL) Krzysztof Janowicz
Outline • Motivation • Scenario • Annotation • Theory • Further Work Image from: http://de.wikipedia.org/wiki/HMS_Victory (Bleiglass, 1998) Similarity-Based Identity Assumption Service for Historical Places
Motivation • For the cultural heritage community • Incomplete and vague knowledge • Interchange between external sources is necessary to answer complex scientific questions &to clean up local knowledge • Local versus global identifiers • Accessible service-based infrastructure! Similarity-Based Identity Assumption Service for Historical Places
Motivation • For semantic similarity research • Application of similarity in a real world domain • Similarity as part of the identity assumption puzzle • Combination of similarity and classical reasoning • Using a stable upper-level ontology (CIDOC CRM) • Theory of similarity assumptions for historical places Similarity-Based Identity Assumption Service for Historical Places
Motivation • For an identity assumption service • To run queries against multiple sources it has to be made sure that they refer to the same real-world phenomena; just a common language is notenough! • Non unique place names (even within the same area) • Place names refer to cities,rivers, valleys, mountains,… • Misinterpreted place names (e.g. 'Al Wahat‘ Oasis) • Names also refer to varying geopolitical units (e.g. nomads) or prominent (artificial) landmarks (e.g. telegraph stations) • Out-dated place or even country names (e.g. UDSSR) Gazetteers can only partially solve these problems (From discussions with Dr. Karl-Heinz Lampe; ZFMK) Similarity-Based Identity Assumption Service for Historical Places
Place names: Cabo Trafalgar, Taraf al-Gharb, رأس الطرف الأغر HMS Victory: Which one?! Vice-Admiral Horatio Nelson, 1st Viscount Nelson? Also in a historical source from French perspective? Spatial relation between naval battleground and terrestrial cape, Province Cadiz,..? Temporal relations? Battle of Trafalgar - Scenario • Took place at Cape Trafalgar (Province Cadiz) in 1805 • British victory under the command of Horatio Nelson • HMS Victory was Nelsons flagship • Nelson was shot during the battle and died afterwards Should be easy to annotate!? Image from: http://en.wikipedia.org/wiki/Horatio_Nelson (painted by Nicholas Pocock) Similarity-Based Identity Assumption Service for Historical Places
From: http://en.wikipedia.org/wiki/Image:Trafalgar_aufstellung.jpg Similarity-Based Identity Assumption Service for Historical Places
Annotation of Historical Knowledge • CIDOC conceptual reference model (CRM) as upper-level ontology for the cultural heritage domain • specifies abstract and interrelated vocabulary instead of concrete definitions such as for kinds of exhibits heterogeneous domain! • describes historical knowledge by relations between places, events, actor and objects • RDF(S) based representation • ISO Standard (ISO/PRF 21127) Similarity-Based Identity Assumption Service for Historical Places
Annotation Examples (RDF-Triples) • P89F.falls_within(E53.Place(Cape Trafalgar), E53.Place(Province Cádiz)) Subject-Predicate-Object: The place Cape Trafalgarfalls within a place called Province Cádiz • P8F.took_place_at(E7.Activity(Battle of Trafalgar), E53.Place(Cape Trafalgar)) • P117F.occurs_during (E7.Activity(Battle of Trafalgar), E5.Event(Trafalgar Campaign)) • P14F.carried_out_by (E7.Activity(Battle of Trafalgar), E21.Person(Nelson)) • P2F.has_type (E53.Place(Andalusia), E55.Type(regions)) Similarity-Based Identity Assumption Service for Historical Places
Theory • In practice semi-automatic disambiguation via gazetteers and other global authorities (such as for historical figures) is often difficult, expensive and error-prone (especially for subordinate geopolitical units, events, actors,…) • Use the links established via the CIDOC CRM annotation between places, actors, objects and events as additional reference points! Similarity-Based Identity Assumption Service for Historical Places
interpretation interpretation Theory Use thematic information as support for spatiotemporal reference Geoinformation = < x, z > Spatiotemporal Reference Systems Semantic Reference Systems CIDOC CRM + Reasoning + Similarity Mike Goodchild: Geographic Rreality Similarity-Based Identity Assumption Service for Historical Places
Theory: Framework Comparing Place Descriptions • Extract new triples out of existing ones Spatiotemporal & Subsumption Reasoning • Compute overlap between source and target triples Semantic Similarity Measurement • Compare remaining labels & identifiers Syntactic Identifier Matching • How probably compared places correspond Identity Assumption Similarity-Based Identity Assumption Service for Historical Places
HMS XYZ (1805) ? HMS XYZ (1804) Theory: Reasoning • Entities are described by sets of RDF triples • Inference rules to generate new triples • Make local knowledge explicit! • More comparable information about entities • Example: Spatial & temporal Inference rules • Be careful - names are ambiguous! Similarity-Based Identity Assumption Service for Historical Places
Province Cádiz Province Cádiz Nelson falls within Cape Trafalgar Source: performed Napoleonic Wars sims simp * = sims Province Cádiz overlaps with falls within Cape Trafalgar Target: Nelson died in Theory: Similarity Similarity-Based Identity Assumption Service for Historical Places
Theory: Network Approach to Similarity • For all tuples from the source entity: find equal or similar tuples within the target entity description • Define meaningful notions of similarity for given predicates (relations) • Spatial • Temporal • Thematic • Define meaningful notion of similarity for all objects that are not subjects of other triples themselves (e.g. ADL Feature Types) Similarity-Based Identity Assumption Service for Historical Places
Theory: Neighborhoods & Hierarchies spatial temporal thematic Egenhofer & Al-Taha 1992 Different similarity measures for neighborhoods & hierarchies Similarity-Based Identity Assumption Service for Historical Places
(Getty Thesaurus) ID: 7008751 ID: 7008750 Cape Trafalgar Wrexham Theory: Syntactic Matching • After recursively applying (semantic) similarity measurements, only labels, vague appellations and identifier are left Requires syntactic matching / measuring (found at: www.gwjokes.com ) Similarity-Based Identity Assumption Service for Historical Places
Theory: Identity Assumptions • Two place descriptions probably refer to the same (real world) place if they are linked via equal or similar relations to equal or similar events, actors, objects, … • Similar position within a network of historical facts • Stepwise applying new restrictions to the set of compared historical places Number of compared tuples is a critical issue! Similarity-Based Identity Assumption Service for Historical Places
Further Work & Evidence • Similarity is only one part of the puzzle! • Other parts: trust, contradictions & consistence,... • Which inference rules may lead to difficulties? • How to handle complementary knowledge? • Connections to Time Map and ECAI • Evidence! Battle of Trafalgar Scenario? • Develop a identity assumption pilot • Combination of similarity measurement with itineraries • Based on real world data from ZFMK, Bonn (biodiversity museum) Similarity-Based Identity Assumption Service for Historical Places
Questions • Thank You! • Special thanks to • Martin Doerr Foundation for Research and Technology - Hellas (FORTH)Institute of Computer Science. Heraklion, Crete, Greece • Karl-Heinz Lampe Zoologisches Forschungsmuseum Alexander Koenig (ZFMK). Bonn, Germany • Any Questions? Similarity-Based Identity Assumption Service for Historical Places
‘Real World’-Place? From: http://de.wikipedia.org/wiki/Bild:Atlantis_map_kircher.gif Similarity-Based Identity Assumption Service for Historical Places
Gazetteer Feature Types • Gazetteer Feature Types Andalucía ADLG Getty Thesaurus Similarity-Based Identity Assumption Service for Historical Places