470 likes | 496 Views
Balancing Lexicographic and Ontological Considerations in Ontology Development. First International Workshop on Ontological Analysis Trento, IT 16-20 July, 2012 Amanda Hicks, University at Buffalo aellenhicks@gmail.com. Ontologies vs. Wordnets. Wordnets represent how we use language
E N D
Balancing Lexicographic and Ontological Considerations in Ontology Development First International Workshop on Ontological Analysis Trento, IT 16-20 July, 2012 Amanda Hicks, University at Buffalo aellenhicks@gmail.com
Ontologies vs. Wordnets Wordnets represent • how we use language • the word ‘cat’ in context Ontologies represent • what it is to be a cat • e.g., whether being a cat is a rigid property Balancing Lexicographic and Ontological Considerations
Overview of some ontologies Balancing Lexicographic and Ontological Considerations
3 Layers of Ontologies • Upper Most abstract • Middle Intermediately abstract • Domain Specific to a domain or application Balancing Lexicographic and Ontological Considerations
Domain Ontologies • are often developed by domain experts. • model highly specific, technical information. • often for use in a particular community of researchers, technicians, etc. • Examples: • Gene Ontology • KYOTO domain ontology • Protein Ontology Balancing Lexicographic and Ontological Considerations
Middle Ontologies • are developed by ontologists or other information technologists. • model concepts that are often part of a normal, spoken and written vocabulary and of an intermediate level of abstraction. • connect upper-level ontologies with the domain ontology. • Examples: • KYOTO Middle • Information Artifact Ontology Balancing Lexicographic and Ontological Considerations
Upper Ontologies • developed by ontologists • models highly abstract concepts • endurant vs. perdurant • quality vs. substance • Because the axioms at this level will be inherited all the way down, we need to be really careful here! Balancing Lexicographic and Ontological Considerations
Upper Ontologies, some examples BFO - http://www.ifomis.org/bfo SUMO- http://www.ontologyportal.org DOLCEhttp://www.loa.istc.cnr.it/DOLCE.html Balancing Lexicographic and Ontological Considerations
BFO • is a relatively shallow top ontology • 36 classes • 6 layers deep • The BFO consortium coordinates many biomedical domain ontologies, users, and developers. Balancing Lexicographic and Ontological Considerations
DOLCE • DOLCE-Lite • 37 classes • depth of 6 • DOLCE-Lite Plus • 208 classes • depth of 13 Balancing Lexicographic and Ontological Considerations
The KYOTO Project • 7th frame EU project, 2007-2010 • facilitates data mining and sharing from texts in the domain of ecology across seven languages • WWF & ECNC are domain users • www.kyoto-project.eu Balancing Lexicographic and Ontological Considerations
The KYOTO Ontology • Three layers Top, Middle, Domain • Seven wordnets mapped to KYOTO Ontology to facilitate data extraction and management • English • Spanish • Basque • Italian • Dutch • Japanese • Chinese Balancing Lexicographic and Ontological Considerations
The KYOTO Ontology KYOTO 3 - three layers Top, Middle, Domain Wordnets mapped to KYOTO Ontology to facilitate data extraction and sound inference • English • Spanish • Basque • Italian • Dutch • Japanese • Chinese Use Protégé 4.0 or older. KYOTO is not written in OWL2. Balancing Lexicographic and Ontological Considerations
KYOTO Top Based on DOLCE-Lite Plus • In DLP qualities are modeled according to the kinds of entities that bear the quality. • e.g., size is a physical quality since it inheres in a physical object • KYOTO Top extends the physical-quality hierarchy • amount-of-matter-quality • feature-quality • physical-object-quality • Added quality types • dispositional • relational Balancing Lexicographic and Ontological Considerations
KYOTO Top KYOTO Top extends the role hierarchy. Roles are arranged according to the kind of entity that bears that role. • A physical-object-role is played by a physical object. • In the domain layer offspring is a subclass of organism-role since organisms are the kinds of things that play the role of offspring. Balancing Lexicographic and Ontological Considerations
KYOTO Middle Includes: • Base Concepts (BCs) from WordNet • nouns • Units of measurement, e.g., length, and other qualities • 72 new perdurants (processes and states) • 123 new endurant terms (objects and substances) • qualities that model adjectives Balancing Lexicographic and Ontological Considerations
Base Concepts in KYOTO Synsets from WordNet-3.0 (Fellbaum (1998)) • for each path from leaf to root: first node with at least 50 hyponyms • roughly: cheap (and inadequate?) computational model for basic level concepts. • CAREFUL: the set depends on structure and coverage of WordNet which is idosyncratic • cake Balancing Lexicographic and Ontological Considerations
Base Concepts in KYOTO BCs facilitate mapping wordnets onto the ontology in KYOTO. • WordNet is mapped onto the ontology via BCs. • BC equivalents in other languages are indirectly mapped onto the ontology via mappings to WordNet’s BCs. Balancing Lexicographic and Ontological Considerations
Base Concepts in KYOTO • 297 BCs from the noun hierarchy and • 578 BCs from the verb hierarchy • need work, in Domain layer • group names such as verb_change still appear though not ontological (Izquierdo et al. (2007)). Balancing Lexicographic and Ontological Considerations
Sample BCs in KYOTO’s Middle Layer • unit-of-measurement • number • color • change • book • message • food Balancing Lexicographic and Ontological Considerations
BCs and KYOTO In this case, the lexicon in conjunction with considerations of the application informed the population of the Middle and Domain layers of KYOTO. Balancing Lexicographic and Ontological Considerations
KYOTO Domain Balancing Lexicographic and Ontological Considerations
KYOTO Domain Sample concepts from user scenarios • fish family • coast • soil • water • breed • biodiversity Balancing Lexicographic and Ontological Considerations
The Lexicon & The Ontology Balancing Lexicographic and Ontological Considerations
is-a Balancing Lexicographic and Ontological Considerations
“is a” The Problem “Is-a” is ambiguous between individuals and subclasses. This can lead to confusion. For example, species terms can be confused. Kermit is-aileptopelis vermiculatus. Leptopelis vermiculatusis-ac species. Therefore, Kermit is a species. Balancing Lexicographic and Ontological Considerations
“is-a”The Rule The Rule: Every property of a class belongs to every instance of that class. Check for all inherited properties. Species are comprised of many organisms that can successfully reproduce fertile off-spring. Is Kermit comprised of many organisms that can successfully reproduce fertile off-spring? Balancing Lexicographic and Ontological Considerations
“is a”KYOTO’s Solution Model species terms twice! • Species in the sense of a group are modeled as physical pluralities. This leptopelis vermiculatus is an instance NOT a subclass. • ‘Leptopelis vermiculatus’ can also refer to a class. This is a type of organism. Balancing Lexicographic and Ontological Considerations
Rigid & Non-Rigid Terms Balancing Lexicographic and Ontological Considerations
Rigidity The Problem In ontologies and WordNet the subsumption relations are determined according to different criteria. • WordNet • Hypernymy • Based on psycholinguistic data; native language speakers agree with word-use. • Ontology • Subclass • Based on extention of a term, every x is a y. Balancing Lexicographic and Ontological Considerations
Transitivity of Subsumption BECAREFUL! WordNet’s Hypernomy can lead to unsound inferences. Conclusion: If every pet has an owner, then every cat has an owner. Balancing Lexicographic and Ontological Considerations
RigidityKYOTO’s Solution • Distinguish rigid and non-rigid terms in the wordnet. • This distinction comes from OntoClean (Guarino and Welty) • Distinguish between roles and types in the ontology. • Map synsets to the ontology using different mapping relations. Balancing Lexicographic and Ontological Considerations
Rigidity • “Cat” is a rigid concept. • “Pet” is a non-rigid concept. • A concept is rigid if it is essential to all of its instances. • Permanence: Fluffy is always a cat, not always a pet • Necessity: Fluffy cannot stop being a cat, Fluffy can stop being a pet. Balancing Lexicographic and Ontological Considerations
The Rule of Thumb(See Giancarlo’s slides for a more nuanced view.) Non-rigid terms should not subsume rigid terms. or Roles should not subsume types. Balancing Lexicographic and Ontological Considerations
A Jumbled Hierarchy amount of matter -R drug +R antibiotic +R chemical compound +Roil -R nutriment (a source of material to nourish the body) Balancing Lexicographic and Ontological Considerations
Clean Hierarchies amount-of-matter +Rantibiotic +R chemical compound + R oil substance-role (role played-by some amount-of matter) -R drug -R nutriment Balancing Lexicographic and Ontological Considerations
Mapping Synsets Balancing Lexicographic and Ontological Considerations
Adjectives Balancing Lexicographic and Ontological Considerations
AdjectivesGeneral Strategy in KYOTO Qualities are easily modeled according to the kinds of entities in which they inhere. For example, amounts of matter are the kinds of things that have pH levels. Balancing Lexicographic and Ontological Considerations
AdjectivesGeneral Strategy in KYOTO The values for specific qualities like pH levels are located in regions. Balancing Lexicographic and Ontological Considerations
AdjectivesThe Problem pH-levels are easy because • they are measureable, i.e., objective criteria. • they are confined to one kind of entity, namely, amounts of matter. Balancing Lexicographic and Ontological Considerations
AdjectivesThe Problem How should we model concepts like “beneficial” or “important”? • Subjective component • Not necessarily “out there” in the world • Not typically quantifiable • Criteria are context dependent • Many kinds of entities can be beneficial or important. Balancing Lexicographic and Ontological Considerations
AdjectivesKYOTO’s Solution The middle layer has a region evaluative-region to accommodate adjectives like ‘beneficial’ or ‘worthless’. Balancing Lexicographic and Ontological Considerations
AdjectivesKYOTO’s Solution Concepts like “beneficial” and “important” are • not in the domain specific layer since they are general concepts. • not in the upper layer since they are “subjective”. • not in a strictly realist ontology like BFO. • modeled orthogonally to “real” qualities Balancing Lexicographic and Ontological Considerations
AdjectivesKYOTO’s solution What kind of restriction can you write for length? long or 2m. Indefinite qualities Definite qualities length q-located-in (length-measurement-unit or indefinite-quality-region) Balancing Lexicographic and Ontological Considerations
In Conclusion • Procurement - BCs influenced the concepts included in the KYOTO ontology. • Hierarchy - subsumption relations must be carefully distinguished in order to avoid influence from the lexicon that might lead to unsound inferences • Qualities - Lexicalized adjectives that may not have a realist corollary need to be modeled in an orthogonal way. Balancing Lexicographic and Ontological Considerations
Bibliography Fellbaum, C., editor (1998). WordNet: An Electronic Lexical Database. The MIT Press. Guarino N., and Welty, C., (2004). An Overview of OntoClean, Handbook on Ontologies, ed. S. Staab and R. Studer. pp. 151-172. Herold, A., Hicks, A., Rigau, G., & Laparra, E. (2009) Kyoto Deliverable D6.2: Central Ontology Version - 1, www.kyoto-project.eu. Hicks, A., Rigau, G. (2010) Kyoto Deliverable D8.3: Domain Extension of the Central Ontology, www.kyoto-project.eu. Izquierdo, R., Suárez, A., and Rigau, G. (2007). Exploring the automatic selection of basic level concepts. In Proceedings of the International Conference on Recent Advances on Natural Language Processing (RANLP'07), Borovetz, Bulgaria. Masolo, C., Borgo, S., Gangemi, A., Guarino, N., Oltramari, A., & Schneider, L. (2002). Wonderweb Deliverable D17. The Wonderweb Library of Foundational Ontologies and the Dolce Ontology. Smith, B. (2004). Beyond Concepts: Ontology as Reality Representation. In Proccedings of FOIS 2004 International Conference on Formal Ontology and Information Systems. Vossen P., et al. 2008. KYOTO: A system for Mining, Structuring and Distributing Knowledge Across Languages and Cultures. In Proceedings of LREC 2008, Marrakech, Morocco, May 28-30, 2008. Balancing Lexicographic and Ontological Considerations