Inexact Matching of Ontology Graphs Using Expectation-Maximization

Inexact Matching of Ontology Graphs Using Expectation-Maximization Prashant Doshi, Christopher Thomas LSDIS Lab, Dept. of Computer Science, University of Georgia

Motivating Example Weapons ontology 2 (data) Weapons ontology 1 (model) Candidate Ontology Match

Ontology Matching • Problem: Match nodes and edges (if labeled) of different ontologies • Essential step in ontology engineering Types of Match • Exact matches – Isomorphisms with edge consistency • Bijection • E.g. GLUE (Doan02), BayesOWL (Ding05), FALCON-AO(Hu05), OMEN(Mitra05) • Inexact matches – Homomorphisms with edge consistency • Many-one or Many-Many • E.g. This approach (Many-one)

Match Quality Space of Matches Overview of Our Approach • Exploit structural and lexical similarity • Graph structure • Node and edge labels • Formulation within the iterative Expectation-Maximization (EM) scheme • Suitable for taxonomies but can be used for edge-labeled ontologies using reification May converge to local maxima

Edge-Labeled Ontology Graphs Reification • Reified bipartite graph (Hayes&Gutierrez04) • Distinct edge label is a node • Dummy nodes are introduced to preserve the relations. Edge-labeledgraph

Background: EM • Developed by Dempster, Laird and Rubin (1977) • Maximum likelihood estimate of an underlying model from observeddata (X) in the presence of missing values (Y) • E-step • Evaluate the likelihood of different models (Mn+1) given a seed model (Mn) M-step • Choose the best model and use it in the next iteration Generalized M-step • Select a model that is better than the current one

Graph Matching Using GEM • Treat the match assignments as the model • Mixture model • Given a data node, the correspondence with some model node is a hidden variable

E-Step becomes • Above equation is simplified considerably • Involves finding the lexical similarity between • the node labels • We use the generalized M-step

String Similarity Measures • String distance metrics (Cohen et al. 03): • Exact string match • Substring match • N-Gram score • Sequence alignment score (Smith&Waterman81) S1: Modern Naval Ship 000000 11111 0001111 S2: Naval Warship

Model Sampling • Model space is large: • Random sampling from the model space • Combine sampling with intuitive heuristics Mn+1 Map-Parent Heuristic Mn+1 Mn+1 Mn

Simple Example Q(M1’|M0) = 52.56 M1’ Q(M1|M0) = 51.57 M0 M1

Computational Complexity • Complexity of the E step is O([|Vd||Vm|]2) • In the M step, if we generate K samples within a sample set, the worst case complexity is O(K[|Vd||Vm|]2)

Performance Weapons ontologies from the I3CON repository Matching heuristics speed up the converge

Recall = 77.8% Precision = 63.6% Lexical Match

Recall = 100% Precision = 90% GEM Match

Discussion • A principled technique for inexact matching of ontology schemas using Generalized EM • Considers structural and label similarity • Produces the most likely match • Many-one correspondence allows mapping between clusters of different semantic granularity • Computational complexity is a issue • More efficient ways to cover the model space

Thank you Questions

Inexact Matching of Ontology Graphs Using Expectation-Maximization