1 / 31

Discovering Missing Background Knowledge in Ontology Matching

This paper explores the problem of missing background knowledge in ontology matching and proposes a method to discover and incorporate this knowledge. It discusses the steps involved in semantic matching and introduces a preprocessing and matching phase. The paper also highlights the issue of low recall in matching and identifies the lack of knowledge as a major contributing factor. The proposed method aims to improve the quality of ontology matching results.

cassady
Download Presentation

Discovering Missing Background Knowledge in Ontology Matching

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Discovering Missing Background Knowledge in Ontology Matching Pavel Shvaiko joint work with Fausto Giunchiglia and Mikalai Yatskevich 17th European Conference on Artificial Intelligence (ECAI’06) 30 August 2006, Riva del Garda, Italy

  2. Introduction Semantic Matching Lack of Knowledge Iterative Semantic Matching Evaluation Conclusions and Future Work Outline

  3. Introduction Information sources (e.g., ontologies) can be viewed as graph-like structures containing terms and their inter-relationships Matching takes two graph-like structures and produces a mapping between the nodes of the graphs that correspond semantically to each other

  4. Semantic Matching

  5. ComputedR’s, listed in the decreasing binding strength order: equivalence { = } more general/specific { , } disjointness {  } I don’t know {idk} Semantic matching Semantic Matching:Given two graphs G1and G2, for any node n1iG1,find the strongest semantic relation R’ holding with node n2jG2 We compute semantic relations by analyzing the meaning (concepts, not labels) which is codified in the elements and the structures of ontologies Technically, labels at nodes written in natural language are translated into propositional logical formulas which explicitly codify the labels’ intended meaning. This allows us to codify the matching problem into a propositional validity problem

  6. Top 1 Entertainment 2 3 Music Books 4 5 Concept of a label & concept of a node Hobbies and Interests Concept of a labelis the propositional formula which stands for the set of documents that one would classify under a label it encodes Concept at a nodeis the propositional formula which represents the set of documents which one would classify under a node, given that it has a certainlabel and that it is in a certain position in a tree

  7. For all labels in T1 and T2 compute concepts at labels For all nodes in T1 and T2 compute concepts at nodes For all pairs of labels in T1 and T2 compute relations between concepts at labels (background knowledge) For all pairs of nodes in T1 and T2 compute relations between concepts at nodes Steps 1 and 2 constitute the preprocessing phase, and are executed once and each time after the ontology is changed (OFF- LINE part) Steps 3 and 4 constitute the matching phase, and are executed every time two ontologies are to be matched (ON - LINE part) Four macro steps Given two labeled trees T1 and T2, do:

  8. The idea Translate labels at nodes written in natural language into propositional logical formulas which explicitly codify the labels’ intended meaning Preprocessing Tokenization. Labels (according to punctuation, spaces, etc.) are parsed into tokens. E.g., Hobbies and Interests <Hobbies, and, Interests> Lemmatization. Tokens are further morphologically analyzed in order to find all their possible basic forms. E.g., HobbiesHobby Building atomic concepts. An oracle (WordNet) is used to extract senses of lemmas. E.g., Hobby has 3 senses Building complex concepts. Prepositions, conjunctions are translated into logical connectives and used to build complex conceptsout of the atomic concepts E.g.,CHobbies_and_Interests = <Hobby, U(WNHobby)> <Interest, U(WNIterest)>, where U is a union of the senses that WordNet attaches to lemmas Step 1:computeconcepts at labels

  9. Top 1 Hobbies and Interests Entertainment 2 3 C2= CTop CEntertainment C4= CTop(CHobbies CInterests) CBooks Books 4 Step 2:compute concepts at nodes The idea Extend concepts at labels by capturing the knowledge residing in a structure of a tree in order to define a context in which the given concept at a label occurs Computation Concept at a node for some node n is computed as a conjunction of concepts at labels located above the given node, including the node itself Example

  10. Step 3:compute relations between (atomic) concepts at labels The idea • Exploit a priori knowledge, e.g., lexical, domain knowledge, with the help of element level semantic matchers

  11. Step 3:Element level semantic matchers Sense-based matchers have two WordNet senses in input and produce semantic relations exploiting (direct) lexical relations of WordNet String-based matchers have two labels in input and produce semantic relations exploiting string comparison techniques

  12. Step 4:compute relations between concepts at nodes The idea • Decompose the graph (tree) matching problem into the set of node matching problems • Translate each node matching problem, namely pairs of nodes with possible relations between them, into a propositional formula • Check the propositional formula for validity

  13. ? Step 4: Example of a node matching task

  14. Lack of Knowledge

  15. Problem of lowrecall (incompletness) - I Facts • Matching has two components: element level matching and structure level matching • Contrarily to many other systems, the S-Match structure level algorithm is correct and complete • Still, the quality of results is not very good Why?... the problem of lack of knowledge Example

  16. Problem of lowrecall (incompletness) - II Preliminary (analytical) evaluation Dataset [Avesani et al., ISWC’05]

  17. On increasing the recall: an overview Multiple strategies • Strengthen element level matchers • Reuse of previous match results from the same domain of interest • PO = Purchase Order • Use general knowledge sources (unlikely to help) • WWW • Use, if available (!), domain specific sources of knowledge • UMLS

  18. Iterative Semantic Matching

  19. Iterative semantic matching (ISM) The idea Repeat Step 3and Step 4of the matching algorithm for some critical (hard) matching tasks ISM macro steps • Discover critical points in the matching process • Generate candidate missingaxiom(s) • Re-run SAT solver on a critical task taking into account the new axiom(s) • If SAT returns false, save the newly discovered axiom(s)for future reuse

  20. cLabsMatrix (result of Step 3) cNodesMatrix (result of Step 4) ISM:Discovering critical points -Example Google (T1) Looksmart (T2)

  21. ISM:Generating candidate axioms Sense-based matchers have two WordNet senses in input and produce semantic relations exploiting structural properties of WordNet hierarchies Gloss-based matchers have two WordNet senses as input and produce relations exploiting gloss comparison techniques

  22. Hierarchy distancereturns the equivalence relation if the distance between two input senses in WordNet hierarchy is less than a given threshold value (e.g., 3) and Idkotherwise ISM: generating candidate axioms Hierarchy distance There is no direct relation between games and entertainment inWordNet diversion Distance between these concepts is 2 (1 more general link and 1 less general). Thus, we can conclude that gamesand entertainment are close in their meaning and return the equivalence relation entertainment games

  23. Evaluation

  24. Testing methodology Dataset [Avesani et al., ISWC’05] Measuring match quality • Indicators • Precision, [0,1]; Recall, [0,1] • By construction in that dataset reference mappings represent only true positives, thus allowing us to estimate onlyrecall • Higher values of recall can be obtained at the expense of lower values of precision • Additional tests to ensure that precision does not decrease Indicators

  25. Experimental results

  26. Conclusions and Future Work

  27. The problem of missing domain knowledge is a major problem of all (!) matching systems This problem on the industrial size matching tasks is very hard We have investigated it by examples of light weight ontologies, such as Google and Yahoo Partial solution by applying semantic matching iteratively Conclusions

  28. Iterative semantic matching New element level matchers Interactive semantic matching GUI Cutomizing technology Extensive evaluation Testing methodology Industry-strength tasks Future work

  29. Project website - KNOWDIVE: http://www.dit.unitn.it/~knowdive/ F. Giunchiglia, P. Shvaiko: Semantic matching. Knowledge Engineering Review Journal, 18(3), 2003. F. Giunchiglia, P.Shvaiko, M. Yatskevich: Semantic schema matching. In Proceedings of CoopIS’05. P. Bouquet, L. Serafini, S. Zanobini: Semantic coordination: a new approach and an application. In Proceedings of ISWC, 2003. P. Avesani, F. Giunchiglia, M. Yatskevich: A large scale taxonomy mapping evaluation. In Proceedings of ISWC, 2005. C. Ghidini, F. Giunchiglia: Local models semantics, or contextual reasoning = locality + compatibility. Artificial Intelligence Journal, 127(3), 2001. Ontology Matching: http://www.OntologyMatching.org P. Shvaiko and J. Euzenat: A survey of schema-based matching approaches. Journal on Data Semantics, IV, 2005. References

  30. Thank you!

  31. System Matches Reference Matches TP FN FP TN FN – False negatives TP – True positives FP – False positives TN – True negatives

More Related