580 likes | 916 Views
Multi-layered approach to aligning heterogeneous ontologies Ph.D. dissertation defense. William Sunna The University of Illinois at Chicago Advisor: Professor Isabel F. Cruz. Problem.
E N D
Multi-layered approach to aligningheterogeneous ontologiesPh.D. dissertation defense William Sunna The University of Illinois at Chicago Advisor: Professor Isabel F. Cruz
Problem • Humans model same aspects of reality in different ways according to their knowledge, life experiences, culture, and many other factors. • As a result, the problem of data heterogeneity in various domains of knowledge arises, especially in distributed systems. • In the semantic web branch of computer science, ontologies are used to describe various aspects of reality in a given domain of knowledge. Multi-layered approach to aligning heterogeneous ontologies
Ontologies • In our case studies, we consider databases represented by ontologies. • What is an ontology • An ontology is an explicit specification of a conceptualization. In Computer Science, an ontology is a description of the concepts being specified and the relationships that can exist between them [1]. Multi-layered approach to aligning heterogeneous ontologies
Ontology alignment • Ontology alignment is the process of finding the correspondences between semantically related entities (concepts) that belong to heterogeneous ontologies in a given domain of knowledge [2]. • Ontologies can be heterogeneous in their structure, attributes, or even the values of their attributes. • Correspondences are established based on one or several “matching” criteria. • When a relationship between the corresponding concepts is defined, then we consider the concepts as being “mapped”. • Ontology alignment is useful for many tasks such as ontology merging, query processing, data translation, or navigation on the semantic web [2]. Multi-layered approach to aligning heterogeneous ontologies
Ontology alignment techniques • According to Shvaiko and Euzenat [5], ontology alignment techniques can be classified into: • Element level: • String based. • Language based. • Alignment reuse based. • Structure level: • Graph based. • Taxonomy based. • Model based. Multi-layered approach to aligning heterogeneous ontologies
Centralized integrated system • In a centralized integrated system, a global ontology is introduced. • Each distributed database that wishes to participate in the integrated system must have its ontology aligned with the global ontology. • We refer to the ontologies of the distributed databases as distributed or local ontologies. • The global ontology can be designed so as: • To encompass as much as possible the information contained in the distributed ontologies. • To become a standard for that domain. Centralized integrated system Multi-layered approach to aligning heterogeneous ontologies
Peer-to-peer integrated system • In this architecture, the ontology of every peer system is aligned with other peers in the network. • A query posed on one of the databases (the target database) is propagated to the others in the network. Five databases participating in a peer-to-peer integrated system Multi-layered approach to aligning heterogeneous ontologies
Geospatial ontologies • For a centralized architecture, we consider an example on land use codes for the state of Wisconsin • The distributed ontologies were separately developed and are quite dissimilar. • For a peer-to-peer architecture, we concentrate on wetland classifications: • In particular, we concentrate on an established standard, the "Cowardin'' Wetland Classification [3]. • We will use this standard together with its variant, the South African Wetland Classification Inventory [4] and establish their alignment Multi-layered approach to aligning heterogeneous ontologies
Land Use Code Case study of data heterogeneity in the geospatial domain • We take the state of Wisconsin land use system (WLIS) as an example of database heterogeneity. Each county (local authority) in the state of Wisconsin is divided into parcels, and each parcel is given a code that describes its land use. • There are 72 counties and hundreds of cities and towns in the state; each may have their own system of classifying Land Use codes. Land Use Code Land Use Code Land Use Code Multi-layered approach to aligning heterogeneous ontologies
Case study of data heterogeneity in the geospatial domain Heterogeneity in the values of the land use codes between the city of Madison and Fitchburg township. Multi-layered approach to aligning heterogeneous ontologies
Case study of data heterogeneity in the geospatial domain Heterogeneity in both the attributes and the values of the land use codes of Dane County, City of Madison, and Eau Claire County. Multi-layered approach to aligning heterogeneous ontologies
Real world problem • Consider this query: “Find all the land parcels used for rail transportation purposes in the state of Wisconsin.” • In order to answer this query, we need to propagate it to all the local land use databases of the various counties and municipalities in the state. • Since the databases are heterogeneous, it is impossible to use the same query for all of them. Therefore, the query has to be “rewritten” for each database. Multi-layered approach to aligning heterogeneous ontologies
Heterogeneity in Wetland Classifications • Organizations monitoring the wetlands data inventory have an interest in sharing data. • The lack of standard classification has long been identified as an obstacle to the development, implementation, and monitoring of wetland conservation strategies [4]. • In defining wetlands, the United States adopts the "Cowardin'' Wetland Classification System [3]. • In contrast, European nations use the International Ramsar Convention Definition (http://www.ramsar.org), and South Africa uses the National Wetland Classification Inventory [4]. Multi-layered approach to aligning heterogeneous ontologies
Heterogeneity in Wetland Classifications "Cowardin" Wetland Classification System. South African National Wetland Inventory Multi-layered approach to aligning heterogeneous ontologies
Multi-Layered Approach • We present a multi-layered approach to ontology alignment. In our approach, the concepts in the ontologies can be related to each other using several matching criteria. • We introduce four layers of mappings, each layer is based on a single matching criterion. Three of these layers are automatic. • We also created a software tool that implements our multi layered approach, the AgreementMaker, which aids domain experts to align two heterogeneous ontologies. Multi-layered approach to aligning heterogeneous ontologies
1. The automatic definition layer The domain expert invokes an automatic procedure that compares each concept in the source ontology to each concept in the target ontology according to their definition as provided by a dictionary. In this layer we employ three mapping techniques, one is element based and two are structure based. 2. The manual mapping layer The user matches concepts with each other manually according to the user’s knowledge of the domains represented by the ontologies. Alignment layers Multi-layered approach to aligning heterogeneous ontologies
Alignment layers 3. Context (semi-automatic) Bottom-up method that uses deduction rules. May require manual intervention when the process stops [7]. 4. Consolidation (semi-automatic) Domain experts rank the layers. An automatic procedure uses the mappings from previous layers according to that ranking. Multi-layered approach to aligning heterogeneous ontologies
The AgreementMaker • The AgreementMaker is a visual ontology alignment tool that implements our multi-layered approach Multi-layered approach to aligning heterogeneous ontologies
System Architecture Multi-layered approach to aligning heterogeneous ontologies
Application of our approachIntegration of heterogeneous database systems End user Query Processor Agreement document I Agreement document II Global Ontology Domain expert Domain expert Local Ontology I Local Ontology II Multi-layered approach to aligning heterogeneous ontologies
Structure-based Methods • Part of the definition layer (fully automatic). • Descendant’s Similarity Inheritance (DSI). • Sibling’s Similarity Contribution (SSC). • Both DSI and SSC methods utilize the results generated by the Base Similarity method (element-based). Multi-layered approach to aligning heterogeneous ontologies
Base Similarity Method • Let Cbe a concept in S(source ontology)and C'be a concept in T (target ontology). • The labels (and in some cases the definitions) of C and C'are compared using a dictionary. • We use function base_sim(C, C') that returns a similarity measure M, such that 0 £ M £1. • Parameter THis a threshold value such that C'is matched with Cwhen base_sim(C, C') ³TH. Multi-layered approach to aligning heterogeneous ontologies
Base Similarity Method • The Base Similarity method compares: • Concept labels. • Associated descriptions if available. • Definition of concepts from the dictionary if this feature is requested by the user. • All stop words are removed. • All remaining words are stemmed. • All hyphens, punctuations, …etc are removed. • Attached words are separated, for example: • air-to-air-missile air to air missile. • ServerSoftware Sever Software. Multi-layered approach to aligning heterogeneous ontologies
Base Similarity Method • The similarity between the processed strings representing the source concept S and the target concept T is determined as follows: • Let len(S) be the number of words in S. • Let len(T) be the number of words in T. • Let common(S,T) be the number of common words between S and T. • The similarity between S and T is calculated as follows Multi-layered approach to aligning heterogeneous ontologies
Descendant's Similarity Inheritance (DSI) Method • Modifies the base similarity between two concepts by considering the similarity between their ancestors, by defining a new function DSI_sim(C,C′ ) computed as follows: where: • MCP is the main contribution percentage (a value of 0.75 was found to work well). • parenti (C), i ≥0, is the ancestor A of C such that there is a path of length i (number of edges)betweenC and A. Multi-layered approach to aligning heterogeneous ontologies
DSI Method Example DSI_SIM(C,C′) = 0.75 X BASE_SIM(C,C′) + 0.17 X BASE_SIM(B,B′) + 0.08 X BASE_SIM(A,A′) Multi-layered approach to aligning heterogeneous ontologies
Sibling’s Similarity Contribution (SSC) Method • Modifies the base similarity between two concepts by considering the similarity between their siblings, by defining a new function SSC_sim(C,C′ ) computed as follows: where: • MCP is the main contribution percentage (a value of 0.75 was found to work well). • N is the number of siblings of concept C. • M is the number of siblings of concept C’. Multi-layered approach to aligning heterogeneous ontologies
SSC Method Example SIM(C,C’) = 0.75 X BASE_SIM(C,C’) + 0.25/2 X MAX(BASE_SIM(D,D’),BASE_SIM(D,E’),BASE_SIM(D,F’)) + MAX(BASE_SIM(E,D’),BASE_SIM(E,E’),BASE_SIM(E,F’)) Multi-layered approach to aligning heterogeneous ontologies
Similarity Flooding Technique • Melnik et al.'s similarity flooding technique[6]: • Structural based matching technique. • Initial similarity measures between concepts are calculated using element level matching techniques. • Iteratively, similarities between concepts propagate (flood) to their adjacent concepts until a stable state is reached (no change in similarities occurs after the last iteration). Multi-layered approach to aligning heterogeneous ontologies
Experiments • We compared the DSI method, the SSC method, and the Similarity Flooding (SF) algorithm [6] using geospatial ontology sets and OAEI ontology sets. • We ran extensive experiments comparing precision, recall, and execution time for DSI, SCC, and SF: • DSI or SSC do at least as well as SF. • Sometimes DSI is the best of the three, sometimes SSC is the best of the three, but DSI was the winner more often. Multi-layered approach to aligning heterogeneous ontologies
Ontology sets Characteristics of the ontology sets used in our experiments Multi-layered approach to aligning heterogeneous ontologies
Experimental Results for Wetland Ontologies Applying the Base Similarity, DSI, SSC, and Similarity Flooding algorithms on the geospatial wetland ontologies Multi-layered approach to aligning heterogeneous ontologies
Experimental Results for Other Ontologies Computer Networks Ontology Weapons Ontology People and Pets Ontology Russia Ontology Ontology sets available from the OAEI campaign Multi-layered approach to aligning heterogeneous ontologies
Runtime Performance • In addition to gathering precision and recall measures, we also gathered runtime figures (in milliseconds) from our experimental tests cases: Multi-layered approach to aligning heterogeneous ontologies
Ontology Alignment Evaluation Initiative (OAEI) 2007 • Annual international campaign which aims to evaluate ontology alignment techniques. • The campaign includes multiple tracks of competition in which alignment systems participate to compete in. • We participated in the medical track to align two ontologies using our DSI methods: • Mouse adult anatomy (2744 concepts). • Human anatomy published by the National Cancer Institute (NCI) (3304 concepts). Multi-layered approach to aligning heterogeneous ontologies
OAEI 2007 • The track we participated in is considered a blind test, which means that the expected results of the alignment are unknown to the participants. • A total of seven alignment systems including ours participated in this track. Like ours, all the other participating systems are not domain specific. • The track contained three test cases which aim to: • Optimize F-measure. • Optimize precision. • Optimize recall. • We assumed the third, fourth, and third places for the three aforementioned test cases respectively. Multi-layered approach to aligning heterogeneous ontologies
OAEI 2007 • The campaign allows researchers to participate with only one ontology alignment technique. • The alignment technique must be fully automatic, it cannot be manual or semi-automatic. • The campaign provides an API that can be used for the purposes of the competition. Multi-layered approach to aligning heterogeneous ontologies
OAEI 2007 • The API allows competitors to embed their alignment techniques to be tested. • The API generates a formatted alignment results file which can be automatically compared against a file containing the expected alignment results. • The results of the evaluation will be in form of percentages for recall, precision, and F-measure. Multi-layered approach to aligning heterogeneous ontologies
OAEI 2007 results Multi-layered approach to aligning heterogeneous ontologies
OAEI 2007 results Multi-layered approach to aligning heterogeneous ontologies
OAEI 2007 results • We note that we competed for the first time, while several of the other teams have competed in previous years. • It has been observed that participation recurrence leads to increasingly better outcomes. • The alignment results for the anatomy track are still kept confidential. Therefore, we did not have the flexibility to try how other methods in our framework perform. Multi-layered approach to aligning heterogeneous ontologies
OAEI 2007 results • The campaign does not fullyevaluate our multi-layered alignment approach which is a significant part of our research for the following reasons: • The multi-layered approach is semi-automatic and therefore can not be fully validated by the campaign. • The mapping layers in the multi-layered approach are meant to complement one another to achieve optimal results. • The framework is meant to allow for the addition of other mapping layers to enhance the alignment process. Multi-layered approach to aligning heterogeneous ontologies
Controlling precision and recall • Our Framework allows for the user to control the precision and recall by controlling the similarity threshold. • In general, increasing the threshold causes the precision to climb while causing the recall to drop. Multi-layered approach to aligning heterogeneous ontologies
The impact of changing similarity threshold on precision and recall Weapons People Computer networks Russia Multi-layered approach to aligning heterogeneous ontologies
Performance tuning • Having achieved very satisfactory results in OAEI 2007, our focus shifted to tune the performance of the alignment methods. • The performance is enhanced by reducing the number of comparisons between the concepts by only comparing concepts of matching sub-trees. • After the contest, on average, we succeeded to reduce the run time by 30%. • The precision dropped by around 3% in some cases and climbed 1% in other cases. • The recall dropped around 12%. Multi-layered approach to aligning heterogeneous ontologies
Performance tuning Performance tuning via sub-tree matching. Multi-layered approach to aligning heterogeneous ontologies
Performance tuning: Impact on precision and recall when using Base Similarity in aligning ontologies Time Recall Precision Multi-layered approach to aligning heterogeneous ontologies
Performance tuning: Impact on precision and recall when using DSI in aligning ontologies Time Recall Precision Multi-layered approach to aligning heterogeneous ontologies
Performance tuning: Impact on precision and recall when using SSC in aligning ontologies Time Recall Precision Multi-layered approach to aligning heterogeneous ontologies
Conclusions • We introduced a multi-layered approach to ontology alignment. Our approach includes multiple mapping layers that aim to complement one another to produce high quality alignment results. • Each mapping layer include one or several alignment methods. Our approach currently has four layers, one manual and three automatic. • Our approach is extensible, new mapping layers can be added to further enrich the alignment experience. • We have built a visual alignment tool which implements our multi-layered approach. This tool has been demonstrated at several conferences. • We have introduced two structure-based alignment methods, DSI and SSC. Both methods performed well in comparison with other existing alignment methods such as the similarity flooding method [6]. Multi-layered approach to aligning heterogeneous ontologies