940 likes | 1.09k Views
L ogics for D ata and K nowledge R epresentation. Application of (Ground) ClassL. Outline. Ontologies Lightweight Ontologies Classifications Optimization of Classifications Document Classification in LOs Query-answering in LOs Semantic Matching. Ontology. Animal.
E N D
Logics for Data and KnowledgeRepresentation Application of (Ground) ClassL
Outline • Ontologies • Lightweight Ontologies • Classifications • Optimization of Classifications • Document Classification in LOs • Query-answering in LOs • Semantic Matching
Ontology Animal • Ontologies are explicit specifications of conceptualizations. • They are often thought of as directed graphs whose nodes represent concepts and whose edges represent relations between concepts. Part-of Is-a Is-a Part-of Bird Mammal Head Body Is-a Is-a Is-a Chicken Predator Herbivore Is-a Is-a Eats Is-a Eats Eats Cat Tiger Goat
Concept • The notion of concept is understood as defined in Knowledge Representation, i.e., as a set of objects or individuals. • This set is called the concept extension or the concept interpretation. • Concepts are often lexically defined, i.e. they have natural language names which are used to describe the concept extensions.
Relation • The notion of relation is understood as a set of ordered pairs, with the two items of the pair from the source concept and the target concept respectively. • The backbone structure of the ontology graph is a taxonomy in which the relations are ‘is-a’, ‘part-of’ and ‘instance-of’ whereas the remaining structure of the graph supplies auxiliary information about the modeled domain and may include relations like ‘located-in’, ‘sibling-of’, ‘ant’, etc.
Ontology as a graph • A mathematical definition comes from ‘graph’, an ontology is an ordered pair O=<V, E> in which V is the set of vertices describing the concepts and E is the set of edges describing relations.
Tree-like Ontologies Animal • Take the ontology in previous slide, remove those auxiliary relations… • … we get a tree-like ontology consisting of the backbone structure with ‘is-a’, ‘part-of’ and even ‘instance-of’ relations. • They are informal Lightweight Ontologies. Part-of Is-a Is-a Part-of Bird Mammal Head Body Is-a Is-a Is-a Chicken Predator Herbivore Is-a Is-a Eats Is-a Eats Eats Cat Tiger Goat
Descriptive VS. Classification Ontologies • Some ontologies are used to describe a piece of world, such as the Gene ontology, Industry ontology, etc. The purpose it to make a clear description of the world. This is usually the first idea to mind when people talk about ontologies. • Some other ontologies are used to classify things, such as books, documents, web pages, etc. The aim is to provide a domain specific category to organize individuals accordingly. Such ontologies usually take the form of classifications with or without explicit meaningful links. • We will see the difference further, in the transformation into formal Lightweight Ontologies.
Why ‘Lightweight’ Ontologies? Two observations: • Majority of existing ontologies are ‘simple’ taxonomies or classifications, i.e., categories to classify resources. • Ontologies with arbitrary relations do exist, but no intuitively reasoning techniques support such ontologies in general. … so we need ‘lightweight’ ontologies.
Outline • Ontologies • Lightweight Ontologies • Classifications • Optimization of Classifications • Document Classification in LOs • Query-answering in LOs • Semantic Matching
Lightweight Ontologies • A (formal) lightweight ontology is a triple O = <N,E,C> • where • N is a finite set of nodes, • E is a set of edges on N, such that <N,E> is a rooted tree, • and C is a finite set of concepts expressed in a formal language F, such that for any node ni∈N, there is one and only one concept ci∈C, and, if ni is the parent node for nj ,then cj ⊑ ci.
From Tree-like Ontologies to LOs Animal Animal Part-of Part-of Is-a Is-a ⊑ ⊑ Part-of Part-of Bird Mammal Head Body Bird Mammal Head Body Is-a Is-a Is-a ⊑ ⊑ ⊑ Chicken Predator Herbivore Chicken Predator Herbivore Is-a Is-a ⊑ ⊑ ⊑ Is-a Cat Tiger Goat Cat Tiger Goat
In Classification Semantics… Animal Animal Part-of ⊑ Part-of Is-a Is-a ⊑ ⊑ ⊑ Part-of Part-of Bird Mammal Head Body Bird Mammal Head Body Is-a Is-a Is-a ⊑ ⊑ ⊑ Chicken Predator Herbivore Chicken Predator Herbivore Is-a Is-a ⊑ ⊑ ⊑ Is-a Cat Tiger Goat Cat Tiger Goat
From Tree-like Ontologies to LOs cont. • For a descriptive tree-like ontology, the backbone taxonomy of ‘is-a’ intuitively coincident with ‘subsumption’ relation in LOs. But ‘part-of’ relations has to be modeled as a new kind of binary relation in order to preserve the semantics. • For a classification ontology, the semantics behind the labels of the nodes are the extension interpretation, i.e. the documents (books, websites, etc.) that should be classified under the nodes. Therefore, ‘part-of’ relation also follows the intuition of ‘subsumption’ and can be transformed directly into ‘⊑’ in the target LOs.
Populated (Lightweight) Ontologies • In Information Retrieval, the term classification is seen as the process of arranging a set of objects (e.g., documents) into categories or classes. • A classification Ontology is said populated if a set of objects have been classified under ‘proper’ nodes. • Thus a populated (Lightweight) Ontology consists a new type of links: instance-of.
Example of a Populated Ontology Animal ⊑ ⊑ ⊑ ⊑ Bird Mammal Head Body ⊑ ⊑ ⊑ Chicken Predator Herbivore Instance-of ⊑ ⊑ ⊑ ‘Chicken Soup’ Instance-of Cat Tiger Goat ‘How to Raise Chicken’ Instance-of Instance-of Instance-of ‘Tom and Jerry’ ‘www.protectTiger.org’ …
Lightweight Ontologies in ClassL:TBox • Subsumption terminologies: ‘… C is a finite set of concepts expressed in a formal language F, such that for any node ni∈N, there is one and only one concept ci∈C, and, if ni is the parent node for nj ,then cj ⊑ ci.’ • Bird⊑ Animal • Mammal⊑ Animal • Chicken⊑ Bird • Cat⊑ Predator • … Observation: a tree-like ontology can be transformed into a lightweight ontology, but not vise versa.
Populated LOs in ClassL: TBox+ABox • Subsumption terminologies: ‘… cj ⊑ ci.’ • ‘Instance of’ links: ‘concept assertion!’ • … • … • … • … • Chicken(ChickenSoup) • Cat(TomAndJerry) • …
Outline • Ontologies • Lightweight Ontologies • Classifications • Optimization of Classifications • Document Classification in LOs • Query-answering in LOs • Semantic Matching
Classifications… • Classifications hierarchies are easy to use... ... for humans. • Classifications hierarchies are pervasive (Google, Yahoo, Amazon, our PC directories, email folders, address book, etc.). • Classifications hierarchies are largely used in industry (Google, Yahoo, eBay, Amazon, BBC, CNN, libraries, etc.). • Classification hierarchies have been studied for very long (e.g., Dewey Decimal Classification system -- DCC, Library of Congress Classification system –LCC, etc.).
Classifications .. more • Classifications hierarchies are lightweight (no roles, trees or simple DAGs, …). • Classification hierarchies are a kind of concept hierarchies. • Labels are natural language sentences; useful but hard to deal with in an automated way. • Links are of the kind “child-of” (e.g. “economy child-of Europe”), where in an ontology you would have, (instance-of}, or roles, or {is-a} links. • No clear semantics for both labels at nodes and links. How to use such informal information?
Recall: Lightweight Ontologies • A (formal) lightweight ontology is a triple O = <N,E,C>, • where • N is a finite set of nodes, • E is a set of edges on N, such that <N,E> is a rooted tree, • and C is a finite set of concepts expressed in a formal language F, such that for any node ni∈N, there is one and only one concept ci∈C, and, if ni is the parent node for nj ,then cj ⊑ ci. A classification already has. To be fixed
What do LOs Bring? • We know that a lightweight ontology is aformal conceptualization of a domain in terms of concepts and {is-a, instance-of}relationships. • Lightweight ontologies (LOs) add a formal semantics and {instance-of} relationships to classification hierarchies. • In short: LOs make classifications formal!
LOs and Ground Class Logic • Ground ClassL provides a formal language (syntax + semantics) to model lightweight ontologies, where: • concepts are modeled by propositions and formulas; • ‘is-a’ relationship is modeled by subsumption (⊑) • and ‘is-instance-of’ relationship is modeled by individual assertion (i.e., wffs like P(a)).
0 Subjects (1) 1 Computers andInternet … (3) 2 … … (5) Programming 3 … … (7) Java Language … (8) Java Beans … Label Semantics Level • Natural language words are often ambiguous. • E.g. Java (an island, a beverage, an OO programming language) • When used with other words in a label, improper senses can be pruned. • E.g., “Java Language” – only the 3rd sense of Java is preserved. 4
From NL Labels to Labels in Class Logic • Several approaches to rewrite a natural language label into a ClassL proposition. • Following (Giunchiglia et al., 2007), we may distinguish four steps: • Tokenization (get distinct words); Italian Pictures ‘Italian’, ‘Pictures’ • Words stemming (get to a basic form); Pictures picture • Rewrite each word into its proposition; picture picture-noun-1⊓picture-noun-2⊓…⊓picture-verb-2 • Prune inconsistent senses. picture-noun-1⊓picture-noun-2⊓…⊓picture-verb-2pictureN1
Class Logic Label Eamples • E.g.1:“Java” becomes the proposition Java#1 ⊔ Java#2 ⊔ Java#3 where Java#i is a propositional variable representing the ith-sense of the word “Java” according to a dictionary (e.g., WordNet). • E.g.2: “JavaBeans” becomes: (Java#1 ⊔ Java#2 ⊔ Java#3)⊓(Bean#1 ⊔ Bean#2)
Advantages of Propositions • NL labels are ambiguous, propositions are NOT! • Extensional semantics of propositions naturally maps nodes to real world objects. • Labels as propositions allow us to deal with the standard problems in classification (e.g., document classification, query-answering, and matching) by means of ClassL’s reasoning, mainly the SAT problem.
Formalizing the Meaning of Links (1) • Child nodes in a classification are always considered in the context of their parent nodes. • Child nodes therefore specialize the meaning of the parent nodes. • Contextuality property of classifications.
A 1 A A ? B C B B 2 (a) (b) Formalizing the Meaning of Links (2) • General intersection relationship(a): can be used to represent facets. The meaning of node 2 is C = A ⊓ B. • Subsumption relationship (b): child nodes are specific case of the parent nodes. The meaning of node 2 is B.
l1 = “Subjects” l3= “Computers and Internet” l5= “Programming” computer programming scheduling, planning hardware software networking … General Intersection Example
Concept at a Node • Parental contextuality is formalized in ClassL by the notion of “concept at a node.” • A concept Cr at the root node r is the class proposition (label) used to denote the node. • A concept Ci at a node ni is the conjunction of a proposition Pi (label of ni) and the concept Cj at node nj parent to ni (if it has any parents). In ClassL: Pi⊓ Cj.
Concept at a Node • A concept at a node ni can be computed as the conjunction of all the labels from the root of the classification hierarchy to ni. • Concepts at nodes capture the classification semantics by using the meaning of labels (propositions defined by using WordNet and a linguistic analysis) and the nodes' position.
Europe 1 2 3 Pictures Wine and Cheese 4 5 Italy Austria Concept at a Node: Example In ClassL: C4= Ceurope⊓ Cpictures⊓ Citaly
What have we done? • Calculate the concepts and label and concept at nodes. • In which format? ClassL Java#1 ⊔ Java#2 ⊔ Java#3 Ceurope⊓ Cpictures⊓ Citaly … • We have built the ClassL formulas for each node!
Distinctions Among Ontology, LO and CLS Tree-like Ontology Ontology A A A Is-a Is-a ⊑ Instance-of Instance-of ⊑ Backbone Taxonomy Likes A⊓B B B C A⊓C C Is-a Is-a ⊑ Part-of Part-of ⊑ Locate-in Descriptive Ontologies A⊓B⊓D D D E A⊓B⊓E E Classification Ontologies Most common format Classification Semantics A Classification Child-of Child-of Formal Lightweight Ontology B C Formalization Child-of Child-of D E
Outline • Ontologies • Lightweight Ontologies • Classifications • Optimization of Classifications • Document Classification in LOs • Query-answering in LOs • Semantic Matching
Rational LOs • LOs may be not perfect… • Reconstruct a LO based on the “most specific subsumer” relation. • Nodes get parents which most specifically describe them, still being more general. • The new structure is called, a Rational LO (RLO) • NOTE: classification semantics do not change. EU Italy Schengen States Germany France Pictures EU Schengen States Germany Italy France Pictures
Optimization of Classifications • Problem: to find ‘the most specific subsumer’ of a given node. • Suppose we have, for all nodes in the LO, the concepts at label in ClassL, i.e. wff’s after NLP. • Then we can refer to the ‘subsumption’ reasoning service which finds the minimal with respect to the ordering ‘⊑’. • E.g.: Italy⊑EU, ShengenState⊑EU, Italy⊑ShengenState…
Outline • Ontologies • Lightweight Ontologies • Classifications • Optimization of Classifications • Document Classification in LOs • Query-answering in LOs • Semantic Matching
Document Classification • Each document d in a classification is assigned a proposition Cd in ClassL. • Cd is called document concept. • Cd is build from d in two steps: • keywords are retrieved from d by using standard text mining techniques. • keywords are converted into propositions by using methodology discussed above.
“Get specific” Rule For any given document d and its concept Cd we classify d in each node ni such that: • ⊨Cd ⊑Ci(i.e. the concept at node ni is more general than Cd); • and there is no node nj (j ≠ i), whose concept at node Cj is more specific than Ci and more general than Cd: ⊨Cj ⊑ Ci and ⊨ Cd⊑ Cj. Subsumption reasoning Of ClassL
Level 0 Subjects (1) 1 Business andInvesting Computers andInternet … (2) (3) 2 … … Small Business and Entrepreneurship (4) (5) Programming 3 … … New Business Enterprises (6) (7) Java Language 4 … (8) Java Beans … Example • Suppose we need to classify “Professional Java, JDK-5th Edition” by W. Clay Richardson et al. • The document concept of such document d is:Cd = Java#3⊓Programming#2. • The node 7 is the only node which conforms to the “get specific” rule.
Level 0 Subjects (1) 1 Business andInvesting Computers andInternet … (2) (3) 2 … … Small Business and Entrepreneurship (4) (5) Programming 3 … … New Business Enterprises (6) (7) Java Language 4 … (8) Java Beans … Example (cont’) • Suppose we need to classify “Visual Basic.Net Programming for Business” by Philip A. Koneman. • The document concept of such document d is:Cd = VisualBasicNet#1⊓Programming#2⊓Business#1 • The nodes 2,5 conform to the “get specific” rule.
What have we done by far? • Classify documents. • How? • Get specific algorithm! • But how to implement the algorithm? ClassL! We are reasoning with the ‘Concept Realization’ service of ClassL! (With an empty ABox.) ⊨Cj ⊑ Ciand ⊨ Cd⊑ Cj
Outline • Ontologies • Lightweight Ontologies • Classifications • Optimization of Classifications • Document Classification in LOs • Query-answering in LOs • Semantic Matching
Intuitive Query-answering • Query-answering on a hierarchy of documents based on a query q as a set of keywords is defined in two steps: • The ClassL proposition Cq is build from q by converting q’s keywords as said above. • The set of answers (retrieval set) to q is defined as a set of subsumption checking problems in Ground ClassL: Aq ={d∈ document | T⊨ Cd ⊑ Cq}.