380 likes | 387 Views
Ontology Mapping Tool for Diabetes By Madhuri Gopal. Topics covered: Project overview Design Principles Technology Stack Approach and Methodology Execution Framework Modules Covered Results. Project Overview : Background The aim of the project is to overcome semantic
E N D
Topics covered: • Project overview • Design Principles • Technology Stack • Approach and Methodology • Execution Framework • Modules Covered • Results
Project Overview: Background • The aim of the project is to overcome semantic heterogeneity in the WWW by using ontology mapping techniques that find the semantic correspondences between similar elements of two ontologies. • We are aiming to map ontology that are created from standard documents on Diabetes medical domain. • Our approach will enable better decision making support for queries on these documents Challenges in the existing systems • Identification of a safer drug regimen requires searching through a space of indicated regimens that outnumbers the pages Google searches 1000 to 1. • A single criterion is insufficient to guide the selection of a safer regimen. • Fragmented gathering and storage of clinical data • Lack of formal standardized knowledge representation of clinical data.
Design Principles Open Close Principle Software entities like classes, modules and functions should be open for extension but closed for modifications. Dependency Inversion Principle a) High-level modules should not depend on low-level modules. Both should depend on abstractions. b) Abstractions should not depend on details. Details should depend on abstractions. Interface Segregation Principle Clients should not be forced to depend upon interfaces that they don't use.
Design Principles contd… Single Responsibility Principle A class should have only one reason to change. Liskov's Substitution Principle Derived types must be completely substitutable for their base types.
Technology Stack The architecture followed is a 2 tier architecture. Front-End : Java Back-end : Ontology(.owl files)
Development Hardware Processor: Intel(R) Core™ 2 Duo CPU T6400 @ 2.00 GHZ Memory(RAM) : 4 GB System type: 32-bit Operating System Tools used Protégé - Ontology Creation (Stanford Open Source Tool) PDPTools – Neural networks Simulator ( Stanford Open Source Tool)
Approach and Methodology • Software prototyping (Incremental prototyping) methodology is used for development. • The final product is built as separate prototypes. • At the end the separate prototypes are merged in an overall design • Steps are: a)Identification of basic requirements. b) Development of the initial prototype c) Review of prototype d)Revision and Enhancement of the Prototype
Execution Framework • Eclipse IDE is used as the execution framework. • All the required plugins (jar files) from protégé/plugins/edu.stanford.smi.protegex.owl and OWL API ( open source API) are included in the build path of the Java project for accessing the ontology built using Protégé ( Stanford open source tool). • The IAC Neural networks is implemented using PDPTools suite of neural networks software ( Stanford tool for Parallel Distributed Processing) which runs in Matlab . All required inputs are taken from java environment by connectivity between Eclipse and Matlab
Overall Architecture
Modules covered 1) Creation of diabetes ontology from American Association of Clinical Endocrinologists (Benchmark document ) and from Wikipedia • Name Similarity Matrix calculated for all terms in both ontologies using the Levenshtein Distance formula ( Dynamic Programming Technique) • Profile Similarity Matrix calculated using term frequency – inverse document frequency (tf.idf statistical data mining algorithm ) . 4) Conversion of ontology terms to a vector space model and computation of Cosine Similarity matrix.
Modules covered contd…. 5) Structural similarity matrix for calculation of structural similarity between ontologies using basic structural features such as depth from root, number of children , number of instances. • Similarity Aggregator for aggregating the name similarity , profile similarity and structural similarity 7) Harmony function estimation for filtering out the most useful similarities and eliminating the erroneous similarity. • IAC neural networks algorithm that satisfies a constraint satisfaction problem for improving the mapping between the two ontologies.
Ontology Mapping Input: 2 homogeneous ontologies O1 and O2 expressed in formal ontology language (OWL/RDF) . Output: 4 Tuple: M(e1i , e2j , r, s) where ‘M’ is the mapping e1i is an element in O1 e2j is an element in O2 r mapping between e1i and e2j s confidence measure of mapping normalized from [0..1]
IR Based Similarity Generator Input: Ontologies O1 ,O2 Output : 3 similarity matrices that contain similarity scores for each pair of elements in ontologies. Similarity Matrices : • Name Similarity • Profile Similarity • Structural Similarity
Name Similarity This is calculated based on the edit distance between the name(id) of the elements NameSim(e1i, e2j) = 1- { EditDist(e1j , e2j) / Max(l(e1i) , l(e2j)) } where : EditDist - LevenShtein distance between elements. l(e1i) and l(e2j)- length of strings e1i and e2j.
Profile Similarity: The profile similarity is defined in 3 steps: • Profile Enrichment • Profile Propagation • Profile Mapping
Profile Enrichment and Propagation • Profile of a class Class ID + Comments + Properties Profiles + Instances Profiles • Profile of a property Property ID + Property Domain + Property Range • Profile of an instance Instance ID+ Descriptive information
Profile Mapping • Cosine similarity between the profiles of the 2 elements e1i and e2j is calculated in a vector space model . → → ProfileSim(e1i, e2j) = ( Vei1 Ve2j) / ( |Vei1||Ve2j| ) where: Ve1i and Ve2j are 2 vectors representing the profile of elements e1i and e2j respectively.
Structural similarity • This is applicable for classes alone as they have hierarchical information StructSim(e1i,e2j) = ∑ ( 1-diffk(e1i,e2j) / N where: e1i , e2j are 2 class elements in the ontology O1 and O2 respectively N – total number of structure features diffk(e1i , e2j) denotes the difference for feature k. diff(e1i,e2j) = (sf(e1i) - sf(e2j)) / max (sf(e1i) , sf(e2j)) where: sf(e1i), sf(e2j) denote the value of a structural feature of the element
Harmony • Harmony estimates the importance and reliability of different similarities. Harmony (h) = #s_max / min(#e1 ,#e2) where : #s_max - number of pairs of elements having the highest similarity in both the row and column in the similarity matrix. #ei - number of elements of ontology Oi
Adaptive Similarity Aggregator Input: Individual similarity matrices Output : Aggregated similarity matrix FinalSim(e1i,e2j) = ∑ hk * Simk( e1i,e2j) / n where: hk - kth similarity matrix harmony n- Total number of similarity matrices
IAC neural Network With Constraint Satisfaction
H11 H12 H1n Architecture SYNAPSIS 1 H21 H22 H2n SYNAPSIS 2 H31 H32 H3n