180 likes | 202 Views
Inexact Matching of Ontology Graphs Using Expectation-Maximization. Prashant Doshi , Christopher Thomas LSDIS Lab, Dept. of Computer Science, University of Georgia. Motivating Example. Weapons ontology 2 (data). Weapons ontology 1 (model). Candidate Ontology Match. Motivating Example.
E N D
Inexact Matching of Ontology Graphs Using Expectation-Maximization Prashant Doshi, Christopher Thomas LSDIS Lab, Dept. of Computer Science, University of Georgia
Motivating Example Weapons ontology 2 (data) Weapons ontology 1 (model) Candidate Ontology Match
Motivating Example Weapons ontology 2 (data) Weapons ontology 1 (model) Candidate Ontology Match
Ontology Matching • Problem: Match nodes and edges (if labeled) of different ontologies • Essential step in ontology engineering Types of Match • Exact matches – Isomorphisms with edge consistency • Bijection • E.g. GLUE (Doan02), BayesOWL (Ding05), FALCON-AO(Hu05), OMEN(Mitra05) • Inexact matches – Homomorphisms with edge consistency • Many-one or Many-Many • E.g. This approach (Many-one)
Match Quality Space of Matches Overview of Our Approach • Exploit structural and lexical similarity • Graph structure • Node and edge labels • Formulation within the iterative Expectation-Maximization (EM) scheme • Suitable for taxonomies but can be used for edge-labeled ontologies using reification May converge to local maxima
Edge-Labeled Ontology Graphs Reification • Reified bipartite graph (Hayes&Gutierrez04) • Distinct edge label is a node • Dummy nodes are introduced to preserve the relations. Edge-labeledgraph
Background: EM • Developed by Dempster, Laird and Rubin (1977) • Maximum likelihood estimate of an underlying model from observeddata (X) in the presence of missing values (Y) • E-step • Evaluate the likelihood of different models (Mn+1) given a seed model (Mn) M-step • Choose the best model and use it in the next iteration Generalized M-step • Select a model that is better than the current one
Graph Matching Using GEM • Treat the match assignments as the model • Mixture model • Given a data node, the correspondence with some model node is a hidden variable
E-Step becomes • Above equation is simplified considerably • Involves finding the lexical similarity between • the node labels • We use the generalized M-step
String Similarity Measures • String distance metrics (Cohen et al. 03): • Exact string match • Substring match • N-Gram score • Sequence alignment score (Smith&Waterman81) S1: Modern Naval Ship 000000 11111 0001111 S2: Naval Warship
Model Sampling • Model space is large: • Random sampling from the model space • Combine sampling with intuitive heuristics Mn+1 Map-Parent Heuristic Mn+1 Mn+1 Mn
Simple Example Q(M1’|M0) = 52.56 M1’ Q(M1|M0) = 51.57 M0 M1
Computational Complexity • Complexity of the E step is O([|Vd||Vm|]2) • In the M step, if we generate K samples within a sample set, the worst case complexity is O(K[|Vd||Vm|]2)
Performance Weapons ontologies from the I3CON repository Matching heuristics speed up the converge
Recall = 77.8% Precision = 63.6% Lexical Match
Recall = 100% Precision = 90% GEM Match
Discussion • A principled technique for inexact matching of ontology schemas using Generalized EM • Considers structural and label similarity • Produces the most likely match • Many-one correspondence allows mapping between clusters of different semantic granularity • Computational complexity is a issue • More efficient ways to cover the model space
Thank you Questions