340 likes | 438 Views
L ogics for D ata and K nowledge R epresentation. The DERA methodology. Outline. Overview: the semantic heterogeneity problem Ontologies The DERA methodology Use cases Projects. The semantic heterogeneity problem. OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS.
E N D
Logics for Data and KnowledgeRepresentation The DERA methodology
Outline • Overview: the semantic heterogeneity problem • Ontologies • The DERA methodology • Use cases • Projects
The semantic heterogeneity problem OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS The difficulty of establishing a certain level of connectivity between people, software agents or IT systems [Uschold & Gruninger, 2004] at the purpose of enabling each of the parties to appropriately understand the exchanged information [Pollock, 2002]
Early solutions OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS Physical connectivity relies on the presence of a stable communication channel between the parties, for instance ODBC data gateways and software adapters. Syntactic connectivity is established by instituting a common vocabulary of terms to be used by the parties or by point-to-point bridges that translate messages written in one vocabulary in messages in the other vocabulary. This rigidity and lack of explicit meaning causes very high maintenance costs (up to 95% of the overall ownership costs) as well as integration failure (up to 88% of the projects) [Pollock, 2002]
The semantic interoperability solution OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS The solution in three points: Semantic mediation: the usage of an ontology, providing a shared vocabulary of terms with explicit meaning. Semantic mapping: using the ontology, the establishment of a mapping constituted by a set of correspondences between semantically similar data elements independently maintained by the parties. Context sensitivity: the mapping has contextual validity, i.e. it has to be used by taking into account the conditions and the purposes for which it was generated.
Animal Part-of Is-a Is-a Part-of Bird Mammal Head Body Is-a Is-a Is-a Chicken Predator Herbivore Is-a Is-a Eats Is-a Eats Eats Cat Tiger Goat Ontologies OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS • An ontology is an explicit specification of a shared conceptualization [Gruber, 1993] • Ontologies are often thought of as directed graphs whose nodes represent concepts and whose edges represent relations between concepts • By providing a common formal terminology and understanding of a given domain of interest, it allows for automation (logical inference), supports reuse and favor interoperability across applications and people. • They differ according to the purpose and the semantics
Definition of ontology in detail OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS • An ontology is an explicit specification of a shared conceptualization [Gruber, 1993] • Conceptualization: an abstract model of how people theorize (part of) the world in terms of basic cognitive units called concepts. Concepts represent the intention, i. e. the set of properties that distinguish the concept from others, and summarize the extension, i.e. the set of objects having such properties. • Explicit specification: the abstract model is made explicit by providing names and definitions for the concepts, i.e. the name and the definition of the concept provide a specification of its meaning in relation with other concepts. • Formal specification: when it is written in a language with formal syntax and formal semantics, i.e. in a logic-based language. • Shared conceptualization: it captures knowledge which is common to a community of people and therefore represents concretely the level of agreement reached in that community.
Kinds of ontologies [Uschold and Gruninger, 2004] OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS
Classification vs. Descriptive Ontologies OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS • Classification ontologies They are used to classify things, such as books, documents, web pages, etc.; the purpose is to provide domain specific terminology and organize individuals accordingly. Such ontologies usually take the form of classifications with (BT\NT\RT) or without explicit relations. • Descriptive ontologies They are used to describe a piece of world, such as the Gene ontology, Industry ontology, etc.; the purpose is to offer an unambiguous description of the world. Relations are typically explicit (e.g. is-a) and can be of any kind. See [Giunchiglia et al., 2009]
OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS Classification ontologies
OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS Descriptive ontologies
Classification vs. Real World semantics OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS • Classification ontologies are in classification semantics In classification ontologies,the extension of each concept (label of a node) is the set of documents about the entities or individual objects described by the label of the concept. For example, the extension of the concept animal is “the set of documents about animals” of any kind. • Descriptive ontologies are in real world semantics In descriptive ontologies, concepts represent real world entities. For example, the extension of the concept animal is the set of real world animals, which can be connected via relations of the proper kind.
How to build a robust ontology?We answer to this question with DERA[Giunchiglia et al., 2013] OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS
Domains OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS • Any area of knowledge or field of study that we are interested in or that we are communicating about that deals with specific kinds of entities: • conventional fields of study (e.g. physics, mathematics) • applications of pure disciplines (e.g. engineering, agriculture) • any aggregate of such fields (e.g. physical sciences, social sciences) • capturing knowledge about our everyday lives (e.g. music, movie, sport, recipes, tourism) • Domains are the main means by which the diversity of the world is captured, in terms of language, knowledge and personal experience.
How to build domains? OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS • Several methodologies have been developed for the construction and maintenance of vocabularies in specific domains • Among them, the faceted approach [Ranganathan, 1967] is known to have great benefits in terms of quality and scalability of the developed resources • BUT the faceted approach - and Knowledge Organization (KO) in general - aims at the development of controlled vocabularies built asclassification ontologies, while Knowledge Representation (KR) needs descriptive ontologies
The faceted approach as from KO OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS • The faceted approach is a well-established methodology used in library & information science (LIS) for the organization of knowledge in libraries [Ranganathan, 1967] • It is based on the fundamental notions of domain and facets, which allow capturing the different aspects of a domain and, at the same time, allow for an incremental growth of knowledge. • Originally facets were of 5 types (PMEST): Personality, Matter, Energy, Space, Time). • For instance, in the medicine domain the important facets are the body parts, the diseases that affect them and the treatment to cure or prevent them.
From KO to KR: DERA OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS • How to build high quality and scalable descriptive ontologies? • DERA is faceted as it is inspired to the principles and canons of the faceted approach by Ranganathan • DERA is a KR approach as it models entities of a domain (D) by their entity classes (E), relations (R) and attributes (A) Domain Entity Classes Attributes Relations R A D E ARRAY CONCEPT CATEGORY FACET
Elements of DERA OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS A DERA domain is a triple D = <E, R, A> where: • E (for Entity)is a set of facets grouping terms denoting entityclasses, whose instances (the entities) have either perceptual or conceptual existence. Terms in these hierarchies are explicitly connected by is-aor part-of relation. • R (for Relation) is a set of facets grouping terms denoting relations between entities. Terms in these hierarchies are connected by is-a relation. • A (for Attribute)is a set of facets grouping terms denoting qualitative/quantitative or descriptive attributes of the entities. We differentiate between attribute names and attribute values such that each attribute name is associated corresponding values. Attribute names are connected by is-a relation, while attribute values are connected to corresponding attribute names by value-of relations.
Mapping DERA to Description Logic (DL) OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS • is-a, part-of and value-of relations form the backbone of facets, are assumed to be transitive and asymmetric, and hence are said to be hierarchical. • Other relations, whenever defined, not having such properties, are said to be associative and connect terms in different facets. • All together facets constitute the TBox of a descriptive ontology. The mapping of E/R/A above to DL should be obvious: • Classes correspond to concepts • Relations and Attributes to roles • is-aand part-of correspond to subsumption (⊑) between classes and between roles • value-of correspond to value restrictions
An example of DERA domain OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS
Steps in the DERA approach OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS • Step 1: Identification of the atomic concepts (E) watercourse, stream: a natural body of running water flowing on or under the earth • Step 2: Analysis • a body of water • a flowing body of water • no fixed boundary • confined within a bed and stream banks • larger than a brook
Steps in the DERA approach (cont.) OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS • Step 3: Synthesis (E) Body of water (is-a) Flowing body of water (is-a) Stream (is-a) Brook (is-a) River (is-a) Still body of water (is-a) Pond (is-a) Lake • Step 4: Standardization (E) stream, watercourse: a natural body of running water flowing on or under the earth
Steps in the DERA approach (cont.) OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS • Step 5: Ordering Terms and concepts in the facets are ordered • Step 6: Formalization Descriptive ontologies are translated into Description Logic formal ontologies, e.g.,: Lake ⊑ Body-of-Water Lake (Garda Lake) Depth (Garda Lake, deep)
Guiding principles [Dutta et al., 2011] OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS
Advantages of DERA OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS • DERA facets have explicit semantics and are modeled as descriptive ontologies • DERA facets inherits all the important properties of the faceted approach, such as robustness and scalability • DERA allows for automated reasoning via the formalization into Description Logics ontologies. In particular, DERA allows for a very expressive search by any entity property
Concrete use-case of the application of DERA principles OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS
Problems with WordNet OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS The position of nodes is driven by syntax Glosses exhibit space and time bias Some concepts are too similar in meaning Some concepts are actually individuals
Reorganizing WordNet with DERA OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS Educational Institution <by level of complexity> Preschool School Primary school Secondary school Post-secondary school <by programme orientation> Training school Vocational school Technical school Graduate school College University
Reorganizing WordNet with DERA OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS Preschool (an educational institution for children too young for primary school) Nursery school, kindergarten (a preschool where children below the age of compulsory education play and learn) Playschool(a part time preschool where children meeting for half-day session School (an educational institution designed for the teaching of students (or "pupils") under the direction of teachers) Primary school (a school for children where they receive the first stage of compulsory education) Infant school (a primary school for very young children where they learn basic reading and writing skills) Junior school (a primary school for young children where they learn about some simple basic disciplines like mathematics, geography and history) Secondary school (a school for students intermediate between primary school and tertiary school) Junior high school (a secondary school where students learn about complex aspects of basic disciplines) Senior high school (a secondary school where students can opt for their field of study) …
Concrete projects with DERA OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS
The space ontology [Giunchiglia et al., 2012] OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS • Knowledge is extracted from GeoNames and the Getty Thesaurus of Geographic Names • Terms are collected, categorized into classes, entities, relations and attributes, and synsets are generated • Synsets are mapped to and integrated with WordNet • Synsets are analyzed and arranged into facets • Terms are standardized and ordered Landform Natural depression Oceanic depression Oceanic valley Oceanic trough Continental depression Trough Valley Natural elevation Oceanic elevation Seamount Submarine hill Continental elevation Hill Mountain Body of water Flowing body of water Stream River Brook Stagnant body of water Lake Pond
The semantic-geo catalogue [Farazi et al., 2012] OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS • Knowledge is extracted from the geographical dataset of the Province of Trento • The faceted ontology was built in English and Italian • Usage of the ontology • The ontology is used in combination with S-Match within the search component of the geo-catalogue to improve search • The evaluation shows that at the price of a drop in precision of 0.16% we double recall Body of water Lake Group of lakes Stream River Rivulet Spring Waterfall Cascade Canal Natural elevation Highland Hill Mountain Mountain range Peak Chain of peaks Glacier Natural depression Valley Mountain pass
References • Gruber, T. R. (1993). A translation approach to portable ontology specifications. Knowledge Acquisition, 5 (2), 199–220. • Maltese V. (2012). Dealing with semantic heterogeneity in classifications. PhD thesis: http://eprints-phd.biblio.unitn.it/700/(see chapters 1.1. and 2.1) • Uschold, M., Gruninger, M. (2004). Ontologies and semantics for seamless connectivity. SIGMOD Rec., 33(4), 58–64. • Pollock, J. (2002). Integration’s Dirty Little Secret: It’s a Matter of Semantics. Whitepaper, The Interoperability Company. • Giunchiglia, F., Dutta, B., Maltese, V. (2009). Faceted lightweight ontologies. In “Conceptual Modeling: Foundations and Applications”, LNCS 5600 Springer. • Giunchiglia, F., Dutta, B., Maltese, V. (2013). From Knowledge Organization to Knowledge Representation. ISKO UK conference. • Ranganathan (1967). Prolegomena to library classification. Asia Publishing House. • Dutta, B., Giunchiglia, F. and Maltese, V. (2011). A Facet-based Methodology for Geo-Spatial Modeling. GEOS conference, 6631, pp 133–150. • Farazi, F., Maltese, V., Dutta, B., Ivanyukovich, A., V. Rizzi (2013). A semantic geo-catalogue for a local administration. AI Review Journal, 40 (2), 193-212. • Giunchiglia, F., Dutta, B., Maltese, V., Farazi, F., 2012. A facet-based methodology for the construction of a large-scale geospatial ontology. Journal on Data Semantics, 1 (1), pp. 57-73.