1 / 47

Balancing Lexicographic and Ontological Considerations in Ontology Development

Balancing Lexicographic and Ontological Considerations in Ontology Development. First International Workshop on Ontological Analysis Trento, IT 16-20 July, 2012 Amanda Hicks, University at Buffalo aellenhicks@gmail.com. Ontologies vs. Wordnets. Wordnets represent how we use language

creda
Download Presentation

Balancing Lexicographic and Ontological Considerations in Ontology Development

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Balancing Lexicographic and Ontological Considerations in Ontology Development First International Workshop on Ontological Analysis Trento, IT 16-20 July, 2012 Amanda Hicks, University at Buffalo aellenhicks@gmail.com

  2. Ontologies vs. Wordnets Wordnets represent • how we use language • the word ‘cat’ in context Ontologies represent • what it is to be a cat • e.g., whether being a cat is a rigid property Balancing Lexicographic and Ontological Considerations

  3. Overview of some ontologies Balancing Lexicographic and Ontological Considerations

  4. 3 Layers of Ontologies • Upper Most abstract • Middle Intermediately abstract • Domain Specific to a domain or application Balancing Lexicographic and Ontological Considerations

  5. Domain Ontologies • are often developed by domain experts. • model highly specific, technical information. • often for use in a particular community of researchers, technicians, etc. • Examples: • Gene Ontology • KYOTO domain ontology • Protein Ontology Balancing Lexicographic and Ontological Considerations

  6. Middle Ontologies • are developed by ontologists or other information technologists. • model concepts that are often part of a normal, spoken and written vocabulary and of an intermediate level of abstraction. • connect upper-level ontologies with the domain ontology. • Examples: • KYOTO Middle • Information Artifact Ontology Balancing Lexicographic and Ontological Considerations

  7. Upper Ontologies • developed by ontologists • models highly abstract concepts • endurant vs. perdurant • quality vs. substance • Because the axioms at this level will be inherited all the way down, we need to be really careful here! Balancing Lexicographic and Ontological Considerations

  8. Upper Ontologies, some examples BFO - http://www.ifomis.org/bfo SUMO- http://www.ontologyportal.org DOLCEhttp://www.loa.istc.cnr.it/DOLCE.html Balancing Lexicographic and Ontological Considerations

  9. BFO • is a relatively shallow top ontology • 36 classes • 6 layers deep • The BFO consortium coordinates many biomedical domain ontologies, users, and developers. Balancing Lexicographic and Ontological Considerations

  10. DOLCE • DOLCE-Lite • 37 classes • depth of 6 • DOLCE-Lite Plus • 208 classes • depth of 13 Balancing Lexicographic and Ontological Considerations

  11. The KYOTO Project • 7th frame EU project, 2007-2010 • facilitates data mining and sharing from texts in the domain of ecology across seven languages • WWF & ECNC are domain users • www.kyoto-project.eu Balancing Lexicographic and Ontological Considerations

  12. The KYOTO Ontology • Three layers Top, Middle, Domain • Seven wordnets mapped to KYOTO Ontology to facilitate data extraction and management • English • Spanish • Basque • Italian • Dutch • Japanese • Chinese Balancing Lexicographic and Ontological Considerations

  13. The KYOTO Ontology KYOTO 3 - three layers Top, Middle, Domain Wordnets mapped to KYOTO Ontology to facilitate data extraction and sound inference • English • Spanish • Basque • Italian • Dutch • Japanese • Chinese Use Protégé 4.0 or older. KYOTO is not written in OWL2. Balancing Lexicographic and Ontological Considerations

  14. KYOTO Top Based on DOLCE-Lite Plus • In DLP qualities are modeled according to the kinds of entities that bear the quality. • e.g., size is a physical quality since it inheres in a physical object • KYOTO Top extends the physical-quality hierarchy • amount-of-matter-quality • feature-quality • physical-object-quality • Added quality types • dispositional • relational Balancing Lexicographic and Ontological Considerations

  15. KYOTO Top KYOTO Top extends the role hierarchy. Roles are arranged according to the kind of entity that bears that role. • A physical-object-role is played by a physical object. • In the domain layer offspring is a subclass of organism-role since organisms are the kinds of things that play the role of offspring. Balancing Lexicographic and Ontological Considerations

  16. KYOTO Middle Includes: • Base Concepts (BCs) from WordNet • nouns • Units of measurement, e.g., length, and other qualities • 72 new perdurants (processes and states) • 123 new endurant terms (objects and substances) • qualities that model adjectives Balancing Lexicographic and Ontological Considerations

  17. Base Concepts in KYOTO Synsets from WordNet-3.0 (Fellbaum (1998)) • for each path from leaf to root: first node with at least 50 hyponyms • roughly: cheap (and inadequate?) computational model for basic level concepts. • CAREFUL: the set depends on structure and coverage of WordNet which is idosyncratic • cake Balancing Lexicographic and Ontological Considerations

  18. Base Concepts in KYOTO BCs facilitate mapping wordnets onto the ontology in KYOTO. • WordNet is mapped onto the ontology via BCs. • BC equivalents in other languages are indirectly mapped onto the ontology via mappings to WordNet’s BCs. Balancing Lexicographic and Ontological Considerations

  19. Base Concepts in KYOTO • 297 BCs from the noun hierarchy and • 578 BCs from the verb hierarchy • need work, in Domain layer • group names such as verb_change still appear though not ontological (Izquierdo et al. (2007)). Balancing Lexicographic and Ontological Considerations

  20. Sample BCs in KYOTO’s Middle Layer • unit-of-measurement • number • color • change • book • message • food Balancing Lexicographic and Ontological Considerations

  21. BCs and KYOTO In this case, the lexicon in conjunction with considerations of the application informed the population of the Middle and Domain layers of KYOTO. Balancing Lexicographic and Ontological Considerations

  22. KYOTO Domain Balancing Lexicographic and Ontological Considerations

  23. KYOTO Domain Sample concepts from user scenarios • fish family • coast • soil • water • breed • biodiversity Balancing Lexicographic and Ontological Considerations

  24. The Lexicon & The Ontology Balancing Lexicographic and Ontological Considerations

  25. is-a Balancing Lexicographic and Ontological Considerations

  26. “is a” The Problem “Is-a” is ambiguous between individuals and subclasses. This can lead to confusion. For example, species terms can be confused. Kermit is-aileptopelis vermiculatus. Leptopelis vermiculatusis-ac species. Therefore, Kermit is a species. Balancing Lexicographic and Ontological Considerations

  27. “is-a”The Rule The Rule: Every property of a class belongs to every instance of that class. Check for all inherited properties. Species are comprised of many organisms that can successfully reproduce fertile off-spring. Is Kermit comprised of many organisms that can successfully reproduce fertile off-spring? Balancing Lexicographic and Ontological Considerations

  28. “is a”KYOTO’s Solution Model species terms twice! • Species in the sense of a group are modeled as physical pluralities. This leptopelis vermiculatus is an instance NOT a subclass. • ‘Leptopelis vermiculatus’ can also refer to a class. This is a type of organism. Balancing Lexicographic and Ontological Considerations

  29. Rigid & Non-Rigid Terms Balancing Lexicographic and Ontological Considerations

  30. Rigidity The Problem In ontologies and WordNet the subsumption relations are determined according to different criteria. • WordNet • Hypernymy • Based on psycholinguistic data; native language speakers agree with word-use. • Ontology • Subclass • Based on extention of a term, every x is a y. Balancing Lexicographic and Ontological Considerations

  31. Transitivity of Subsumption BECAREFUL! WordNet’s Hypernomy can lead to unsound inferences. Conclusion: If every pet has an owner, then every cat has an owner. Balancing Lexicographic and Ontological Considerations

  32. RigidityKYOTO’s Solution • Distinguish rigid and non-rigid terms in the wordnet. • This distinction comes from OntoClean (Guarino and Welty) • Distinguish between roles and types in the ontology. • Map synsets to the ontology using different mapping relations. Balancing Lexicographic and Ontological Considerations

  33. Rigidity • “Cat” is a rigid concept. • “Pet” is a non-rigid concept. • A concept is rigid if it is essential to all of its instances. • Permanence: Fluffy is always a cat, not always a pet • Necessity: Fluffy cannot stop being a cat, Fluffy can stop being a pet. Balancing Lexicographic and Ontological Considerations

  34. The Rule of Thumb(See Giancarlo’s slides for a more nuanced view.) Non-rigid terms should not subsume rigid terms. or Roles should not subsume types. Balancing Lexicographic and Ontological Considerations

  35. A Jumbled Hierarchy amount of matter -R drug +R antibiotic +R chemical compound +Roil -R nutriment (a source of material to nourish the body) Balancing Lexicographic and Ontological Considerations

  36. Clean Hierarchies amount-of-matter +Rantibiotic +R chemical compound + R oil substance-role (role played-by some amount-of matter) -R drug -R nutriment Balancing Lexicographic and Ontological Considerations

  37. Mapping Synsets Balancing Lexicographic and Ontological Considerations

  38. Adjectives Balancing Lexicographic and Ontological Considerations

  39. AdjectivesGeneral Strategy in KYOTO Qualities are easily modeled according to the kinds of entities in which they inhere. For example, amounts of matter are the kinds of things that have pH levels. Balancing Lexicographic and Ontological Considerations

  40. AdjectivesGeneral Strategy in KYOTO The values for specific qualities like pH levels are located in regions. Balancing Lexicographic and Ontological Considerations

  41. AdjectivesThe Problem pH-levels are easy because • they are measureable, i.e., objective criteria. • they are confined to one kind of entity, namely, amounts of matter. Balancing Lexicographic and Ontological Considerations

  42. AdjectivesThe Problem How should we model concepts like “beneficial” or “important”? • Subjective component • Not necessarily “out there” in the world • Not typically quantifiable • Criteria are context dependent • Many kinds of entities can be beneficial or important. Balancing Lexicographic and Ontological Considerations

  43. AdjectivesKYOTO’s Solution The middle layer has a region evaluative-region to accommodate adjectives like ‘beneficial’ or ‘worthless’. Balancing Lexicographic and Ontological Considerations

  44. AdjectivesKYOTO’s Solution Concepts like “beneficial” and “important” are • not in the domain specific layer since they are general concepts. • not in the upper layer since they are “subjective”. • not in a strictly realist ontology like BFO. • modeled orthogonally to “real” qualities Balancing Lexicographic and Ontological Considerations

  45. AdjectivesKYOTO’s solution What kind of restriction can you write for length? long or 2m. Indefinite qualities Definite qualities length q-located-in (length-measurement-unit or indefinite-quality-region) Balancing Lexicographic and Ontological Considerations

  46. In Conclusion • Procurement - BCs influenced the concepts included in the KYOTO ontology. • Hierarchy - subsumption relations must be carefully distinguished in order to avoid influence from the lexicon that might lead to unsound inferences • Qualities - Lexicalized adjectives that may not have a realist corollary need to be modeled in an orthogonal way. Balancing Lexicographic and Ontological Considerations

  47. Bibliography Fellbaum, C., editor (1998). WordNet: An Electronic Lexical Database. The MIT Press. Guarino N., and Welty, C., (2004). An Overview of OntoClean, Handbook on Ontologies, ed. S. Staab and R. Studer. pp. 151-172. Herold, A., Hicks, A., Rigau, G., & Laparra, E. (2009) Kyoto Deliverable D6.2: Central Ontology Version - 1, www.kyoto-project.eu. Hicks, A., Rigau, G. (2010) Kyoto Deliverable D8.3: Domain Extension of the Central Ontology, www.kyoto-project.eu. Izquierdo, R., Suárez, A., and Rigau, G. (2007). Exploring the automatic selection of basic level concepts. In Proceedings of the International Conference on Recent Advances on Natural Language Processing (RANLP'07), Borovetz, Bulgaria. Masolo, C., Borgo, S., Gangemi, A., Guarino, N., Oltramari, A., & Schneider, L. (2002). Wonderweb Deliverable D17. The Wonderweb Library of Foundational Ontologies and the Dolce Ontology. Smith, B. (2004). Beyond Concepts: Ontology as Reality Representation. In Proccedings of FOIS 2004 International Conference on Formal Ontology and Information Systems. Vossen P., et al. 2008. KYOTO: A system for Mining, Structuring and Distributing Knowledge Across Languages and Cultures. In Proceedings of LREC 2008, Marrakech, Morocco, May 28-30, 2008. Balancing Lexicographic and Ontological Considerations

More Related