280 likes | 286 Views
This guide explores lightweight ontologies, converting classifications, real-world semantics, and practical applications in query-answering and document classification.
E N D
Logics for Data and KnowledgeRepresentation Applications of ClassL: Lightweight Ontologies
Outline • Ontologies • Descriptive and classification ontologies • Real world and classification semantics • Lightweight Ontologies • Converting classifications into Lightweight Ontologies • Applications on Lightweight Ontologies • Document Classification • Query-answering • Semantic Matching 2
Animal Part-of Is-a Is-a Part-of Bird Mammal Head Body Is-a Is-a Is-a Chicken Predator Herbivore Is-a Is-a Eats Is-a Eats Eats Cat Tiger Goat Ontologies ONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES • Ontologies are explicit specifications of conceptualizations [Gruber, 1993] • They are often thought of as directed graphs whose nodes represent concepts and whose edges represent relations between concepts
Concepts and Relations between them ONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES • CONCEPT: it represents a set of objects or individuals • EXTENSION: the set above is called the concept extension or the concept interpretation • Concepts are often lexically defined, i.e. they have natural language nameswhich are used to describe the concept extensions (e.g. Animal, Lion, Rome), often with an additional description (gloss) • RELATION: a link from the source concept to the target concept • The backbone structure of an ontology graph is a taxonomy in which the relations are ‘is-a’, ‘part-of’ and ‘instance-of’, whereas the remaining structure of the graph supplies auxiliary information about the modeled domain and may include relations like ‘located-in’, ‘eats’, ‘ant’, etc. They are respectively called hierarchical (BT/NT) and associative (RT) relations in Library Science.
Animal Part-of Is-a Is-a Part-of Bird Mammal Head Body Is-a Is-a Is-a Chicken Predator Herbivore Is-a Is-a Eats Is-a Eats Eats Cat Tiger Goat Ontology as a graph: a mathematical definition ONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES • An ontology is an ordered pair O = <V, E> • Vis the set of vertices describing theconcepts • Eis the set of edges describingrelations
Animal Part-of Is-a Is-a Part-of Bird Mammal Head Body Is-a Is-a Is-a Chicken Predator Herbivore Is-a Is-a Is-a Cat Tiger Goat Tree-like Ontologies ONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES • Take the ontology in the previous slide and remove those auxiliary relations… • … we get a tree-like ontology consisting of its backbone structure with ‘is-a’ and ‘part-of’ relations (*), that is an informal lightweight ontology. (*) Notice that in some cases we can obtain more complex structures like DAGs or even with cycles
Descriptive VS. Classification Ontologies ONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES • Descriptive ontologies They are used to describe a piece of world, such as the Gene ontology, Industry ontology, etc. The purpose is to make a clear description of the world. This is usually the first idea to mind when people talk about ontologies. • Classification ontologies They are used to classify things, such as books, documents, web pages, etc. The aim is to provide a domain specific category to organize individuals accordingly. Such ontologies usually take the form of classifications with or without explicit meaningful links.
Real world and Classification semantics ONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES • Real world semantics In descriptive ontologies, concepts represent real world entities. For example, the extension of the concept animal is the set of real world animals, which can be connected via relations of the proper kind. • Classification semantics In classification ontologies,the extension of each concept (label of a node) is the set of documents about the entities or individual objects described by the label of the concept. For example, the extension of the concept animal is “the set of documents about animals” of any kind. 8
Why ‘Lightweight’ Ontologies? ONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES • The majority of existing ontologies are ‘simple’ taxonomies or classifications, i.e., hierarchically organized categories used to classify resources. • Ontologies with arbitrary relations do exist, but no intuitive and efficient reasoning techniques support such ontologies in general. … so we need ‘lightweight’ ontologies.
Lightweight Ontologies ONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES • A (formal) lightweight ontology is a triple O = <N,E,C> where: • N is a finite set of nodes, • E is a set of edges on N, such that <N,E> is a rooted tree, • C is a finite set of concepts expressed in a formal language F, such that for any node ni ∈ N, there is one and only one concept ci ∈ C, and, if ni is the parent node for nj, then cj ⊑ ci. NOTE: lightweight ontologies are in classification semantics
Converting tree-like structures into LOs ONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES • For a descriptive ontology, the backbone taxonomy of ‘is-a’ and ‘instance-of’ is intuitively coincident with the subsumption (‘⊑’) relation in LOs. NOTE: ‘part-of’ relations correspond to subsumption only if transitive. For instance the following chain cannot be translated: handle part-of door part-of school part-of school system • For a classification ontology, the extension of each node is the set of documents (books, websites, etc.) that should be classified under the node. Therefore, the links has to be interpreted as ‘subset’ relations and can be transformed directly into subsumption in the target LOs.
Descriptive and classification ontologies (a) (b) Animal World A A part-of is-a is-a part-of Vertebrate B C Invertebrate Europe B C Asia is-a is-a part-of part-of D E D E Bird Mammal France Italy part-of F Rome ONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES • (a) and (b) are two descriptive ontologies. The corresponding classification ontologies are obtained by substituting all the relations with ‘subset’. • (a) and (b) can be converted into lightweight ontologies by substituting the relations into subsumptions. However, the semantics changes from real world to classification semantics. 12
Populated (Lightweight) Ontologies ONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES • In Information Retrieval, the term classification is seen as the process of arranging a set of objects (e.g., documents) into a set of categories or classes. • A classification ontology is said populated if a set of objects has been classified under ‘proper’ nodes. • Thus a populated (lightweight) ontology includes (explicit or implicit) ‘instance-of’ relations
Example of a Populated Ontology ONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES Animal ⊑ ⊑ ⊑ ⊑ Bird Mammal Head Body ⊑ ⊑ ⊑ Chicken Predator Herbivore Instance-of ⊑ ⊑ ⊑ ‘Chicken Soup’ Instance-of Cat Tiger Goat ‘How to Raise Chicken’ Instance-of Instance-of Instance-of ‘Tom and Jerry’ ‘www.protectTiger.org’ …
Lightweight Ontologies in ClassL: TBox ONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES • Subsumption terminologies. Recall: ‘… C is a finite set of concepts expressed in a formal language F, such that for any node ni∈N, there is one and only one concept ci∈C, and, if ni is the parent node for nj ,then cj ⊑ ci.’ • Bird ⊑ Animal • Mammal ⊑ Animal • Chicken ⊑ Bird • Cat ⊑ Predator • … NOTE: a tree-like ontology can be transformed into a lightweight ontology, but not vice versa. This is because we loose information during the translation.
Populated LOs in ClassL: TBox + ABox ONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES • ‘instance-of’ links are encoded into ‘concept assertions’: • Chicken(ChickenSoup) • Cat(TomAndJerry) • … • Instances are the elements of the domain, namely the documents classified in the categories.
Classifications are: ONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES • Easy to use for humans • Pervasive (Google, Yahoo, Amazon, our PC directories, email folders, address book, etc.). • Largely used in commercial applications (Google, Yahoo, eBay, Amazon, BBC, CNN, libraries, etc.). • Have been studied for very long time (e.g., Dewey Decimal Classification system - DDC, Library of Congress Classification system - LCC, etc.).
Classification Example: Yahoo! Directory ONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES
Classification Example: Email Folders ONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES
Classification Example: E-Commerce Category ONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES
0 Subjects (1) 1 Computers andInternet … (3) 2 … … (5) Programming 3 … … (7) Java Language … (8) Java Beans … Label Semantics ONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES Level • Natural language words are often ambiguous E.g. Java (an island, a beverage, a programming language) • When used with other words in a label, improper senses can be pruned E.g., “Java Language” – only the 3rd sense of Java is preserved • We translate node labels into unambiguous propositions in ClassL in classification semantics • This can be done by using NLP (Natural Language Processing) techniques 4
A ? B 1 A A B B 2 C (a) (b) Link semantics ONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES • Get-specific principle: Child nodes in a classification are always considered in the context of their parent nodes. As a consequence they specialize the meaning of the parent nodes. • Subsumption relation (a): the extension of the child node is a proper subset of the parent node. The meaning of node 2 is B. • General intersection relation(b): the extension of the child node is a subset of the parent node. The meaning of node 2 is C = A ⊓ B. • We generalize to (b). The meaning of the node is what we call the concept at node.
Europe 1 2 3 Pictures Wine and Cheese 4 5 Italy Austria Concept at node ONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES In ClassL: C4= Ceurope⊓ Cpictures⊓ Citaly
Document Classification ONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES • Document concept: each document d in a classification is assigned a proposition Cd in ClassL, build from d in two steps: • keywords are retrieved from d by using standard text mining techniques. • keywords are converted into propositions by using the methodology discussed above to translate node labels. • Automatic classification: For any given document d and its concept Cd we classify d in each node ni such that: • ⊨ Cd ⊑ Ci, • and there is no node nj (j ≠ i), for which ⊨ Cj ⊑ Ci and ⊨ Cd ⊑ Cj. In other words we always classify in the node with the most specific concept.
Query-answering ONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES • Query-answering on a hierarchy of documents based on a query q as a set of keywords is defined in two steps: • The ClassL proposition Cq is build from q by converting q’s keywords as said above. • The set of answers (retrieval set) to q is defined as a set of subsumption checking problems in ClassL: Aq = {d ∈ document | T ⊨ Cd ⊑ Cq}
Semantic Matching: Why? ONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES • Most popular knowledge can be represented as graphs. The heterogeneity between knowledge graphs demands the exposition of relations, such as semantically equivalent. • Some popular situations that can be modeled as a matching problem are: • Concept matching in semantic networks. • Schema matching in distributed databases. • Ontology matching (ontology “alignment”) in the Semantic Web. 26
The Matching Problem ONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES • Matching Problem: given two finite graphs, finds all nodes in the two graphs that syntactically or semantically correspond to each other. • Given two graph-like structures (e.g., classifications, XML and database schemas, ontologies), a matching operator produces a mapping between the nodes of the graphs. • Solution: A possible solution [Giunchiglia & Shvaiko, 2003], consists in the conversion of the two graphs in input into lightweight ontologies and then matching them semantically. 27
A Matching Problem ? ? ? ONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES 28