270 likes | 411 Views
Evaluating semantic similarity using GML in Geographic Information Systems. Fernando Ferri 1 , Anna Formica 2 , Patrizia Grifoni 1 , and Maurizio Rafanelli 2 1 IRPPS-CNR, via Nizza 128, 00198 Roma, Italy fernando.ferri@irpps.cnr.it, patrizia.grifoni@irpps.cnr.it
E N D
Evaluating semantic similarity using GML in Geographic Information Systems Fernando Ferri1, Anna Formica 2, Patrizia Grifoni 1, and Maurizio Rafanelli 2 1 IRPPS-CNR, via Nizza 128, 00198 Roma, Italy fernando.ferri@irpps.cnr.it, patrizia.grifoni@irpps.cnr.it 2 IASI-CNR, viale Manzoni 30, 00185 Roma, Italy formica@iasi.cnr.it, rafanelli@iasi.cnr.
Summary • Motivation • Related works • Coding a Part-of Hierarchy using GML • Similarity evaluation • Conclusion
Motivation (1) • In Geographic Information Systems (GISs) semantic similarity plays an important role, as it supports the identification of objects that are conceptually close, but not identical. • GML (Geography Markup Language) is emerging as the dominant standard for exchanging geographic data across the Internet. • A semantic similarity model facilitates comparison of entities and allows information retrieval and integration to handle semantically similar concepts . The goal of a similarity model is to obtain flexible and better matches between user-expected and system-retrieved information.
Motivation (2) • Given the relevance of the Is-in relationship in the geographic context, we focus on GML elements organized according to Part-of (meronymic) hierarchies. • The semantics essentially concerns parts which are similar to and inseparable from the whole.
Related works (1) • Similarity of hierarchically related concepts has been widely investigated in the literature [Resnik] [Rodriguez, Egenhofer]. • From the various proposals, we followed the probabilistic approach of Lin, which is based on the notion of information content and overcomes the drawbacks of the traditional edge-counting approach.
Related Works (2) • Resnik proposes algorithms that take advantage of taxonomic similarity in resolving syntactic and semantic ambiguities. • Lin starts from the Resnik’ work and addresses also the information content of the comparing concepts.
Coding a Part-of Hierarchy with GML (1) • The real world in the geographic domain can be represented as a set of features, and AbstractFeatureType codifies a geographic feature in GML. • Its geometry type is an important property, it is given in the reference coordinate system and describes the extent, position or relative location of the represented concept.
Coding a Part-of Hierarchy with GML (2) • The geometric types defined in GML provide the framework for modelling all the geographical concepts. • By means of this framework it is possible to model, for example, the concepts composing a communication ways network, such as roads, rivers, canals and other communication infrastructures.
AbstractFeatureType …….. MultiLineStringType MultiPolygonType ComWayType RoadType RiverType CanalType NavSegmentType NNavSegmentType Coding a Part-of Hierarchy with GML (3) • This figure shows an example of a type hierarchy that introduces concepts concerning communication infrastructures starting from the GML geometric types.
Coding a Part-of Hierarchy with GML (4) • As mentioned in the motivation, due to the relevance of the Is-in relationship in the geographic context, the paper focuses on GML elements organized according to Part-of (meronymic) hierarchies. • For instance, in our example a Part-of relationship exists among communication ways (ComWay) and roads, rivers and canals.
ComWay Canal Road River NavRiver NNavRiver NavCanal NNavCanal Kind Country Coding a Part-of Hierarchy with GML (5) • Usually, in the literature, Part-of hierarchies are modelled in XML using “sequences of elements”, and a similar approach could be followed in GML • However, this approach does not permit to distinguish between elements of the Part-of hierarchy and other elements eventually defined out of the Part-of hierarchy, such as Kind and Country
ComWay Country PartOfWay Kind Road River Canal PartOfRiv PartOfCan NavRiver NNavRiver NavCanal NNavCanal Coding a Part-of Hierarchy with GML (6) • In order to put in evidence meronymic relationships within the GML element hierarchy, a Part-of hierarchy could be modelled by introducing some special geographic types such as PartOfWayType, PartOfRivType, PartOfCanType • Each special type is introduced for modelling a Part-of relationship between a geographic concept and their component concepts
<element name="ComWay" type=="ComWayType"/> <element name="Road" type=="RoadType"/> <element name="River" type=="RiverType"/> <element name="Canal" type=="CanalType"/> <element name="NavRiver" type=="NavSegmentType"/> <element name="NNavRiver“type=="NNavSegmentType"/> <element name="NavCanal“type=="NavSegmentType"/> <element name="NNavCanal“type=="NNavSegmentType"/> <complexType name="ComWayType"> <sequence> <element name = "kind" type="string"/> <element name = "country" type="string"/> <element name = "PartOfWay" type="PartOfWayType"/> </sequence> <attribute name="label" type="string" /> <attribute name="label" type="string" /> <attribute name="length" type="integer" /> </complexType> <complexType name="PartOfWayType"> <sequence> <element name = "Road" type="RoadType"/> <element name = "River" type="RiverType"/> <element name = "Canal" type="CanalType"/> </sequence> </complexType> <complexType name="RoadType"> <attribute name="label" type="string" /> <attribute name="length" type="integer" /> <attribute name="maxspeed" type="integer" /> </complexType> ………………………….. This GML code shows how to put in evidence a meronymic relationship within the GML element hierarchy introducing a special geographic type such as PartOfWayType Coding a Part-of Hierarchy with GML (7)
Evaluating similarity (1) For evaluating concept similarity this paper combines and revisits: • the information content approach [Lin98], • a proposal inspired by the maximum weighted matching problem in bipartite graphs [FM02].
Evaluating similarity (2) • The starting assumption is that the association of probabilities with the Part-of taxonomy allows us the notion of a weighted element hierarchy to be introduced. In particular, in our example the probabilities have been estimated in line with WordNet 2.0. • For instance, below the concepts Road and River have been defined, with the related frequencies (the numbers in parenthesis). (95) Road – an open way (generally public) for travel and transportation (55) River – a large natural stream of water (larger than a creek)
Evaluating similarity (3) The probability of a concept • The probability of a concept c is defined as: p(c) = freq(c)/N where freq(c) is the frequency of the concept c in the taxonomy, and N is the total number of concepts. • In the example probabilities have been assigned according to WordNet.
Evaluating similarity (4) Example: Weighted ConceptHierarchy
Evaluating similarity (5) Following the standard approach of information theory [Ross76], the information content of a concept c can be quantified as: – log p(c) that is, as the probability increases, the informativeness decreases.
Evaluating similarity (6) The information content similarity (ics) of two concepts such as River and Canal is defined as: ics(River, Canal) = 2 log p(ComWay)/(log p(River)+log p(Canal)) = 0,72 where ComWay is the concept representing the maximum information content shared by River and Canal. According to the Lin’s approach the more information two concepts share, the more similar they are.
Evaluating similarity (7) Structural similarity (asim) Inspired by the maximum weighted matching problem in bipartite graphs, we have to identify the set of pairs of typed attributes such that is maximal the sum of the products of the information content similarity of the attributes and the related types.
Evaluating similarity (8) Example RiverType CanalType label:string length:integer flow:integer deepness:integer label:string profundity:integer capacity:integer length:integer
Evaluating similarity (9) In the previous example the set of pairs of attributes that maximizes the sum of the related information content similarity is the following: {(label,label), (length,length), (flow,capacity), (deepness,profundity)}
Evaluating similarity (10) In fact, by assuming that deepness and profundity are synonyms, we have: ics(label,label)=ics(length,length)= ics(deepness,profundity)= 1 and ics(flow,capacity)= 0.
Evaluating similarity (11) The similarity of the sets of attributes of complexTypes (asim) is therefore defined by the above maximum sum divided by the greatest of the cardinalities of the sets of attributes of the types compared. In the case of RiverType and CanalType we have: asim(RiverType,CanalType) = ¾ = 0.75
Evaluating similarity (12) Concept Similarity (Gsim) The Similarity (Gsim) of the concepts River and Canal is defined as: Gsim(River , Canal) =(ics(River , Canal)*w + asim(River, Canal)*(1-w)) * Bt(RiverType,CanalType) where: • ics(River , Canal) is the information content similarity • asim(River , Canal) is the structural similarity • w is a weight, s.t. 0 <= w <= 1. • Bt is a Boolean function that, given two complexTypes, returns 0 if their least upper bound in the type hierarchy is AbstractFeatureType, otherwise it returns 1.
Evaluating similarity (13) • In particular, if we assume w=0.5 Gsim(River , Canal) =(ics(River , Canal)*w + asim(River, Canal)*(1-w)) * Bt(RiverType,CanalType) Gsim(River , Canal) = 0.5 (0.72+0.75)*1 = 0.74
Conclusion Thank you