Comprehensive Guide to Ontology Learning: Methods, Tools, and Applications

Ontology Learning Μπαλάφα Κάσσυ Πλασταρά Κατερίνα

Contents • Introduction – Ontologies, Ontology learning • Technical description • Ontology learning in the Semantic Information description • Ontology Learning – Process • Ontology Learning - Architecture • Ontology Learning data sources • Methods used in ontology learning • Tools of ontology learning • Uses of ontology learning

Ontologies • Provide a formal, explicit specification of a shared conceptualization of a domain that can be communicated between people and heterogeneous and widely spreads application systems. • They have been developed in Artificial Intelligent and Machine Learning to facilitate knowledge sharing and reuse. • Unlike knowledge bases ontologies have “all in one”: • formal or machine readable representation • full and explicitly described vocabulary • full model of some domain • consensus knowledge: common understanding of a domain • easy to share and reuse

Ontology learning - General • Machine learning of ontologies • Main task: to automatically learn complicated domain ontologies • Explores techniques for applying knowledge discovery techniques to different data sources ( html documents, dictionaries, free text, legacy ontologies etc.) in order to support the task of engineering and maintaining ontologies

Introduction – Ontologies, Ontology learning • Technical description • Ontology learning in the Semantic Information descritpion • Ontology Learning – Process • Ontology Learning - Architecture • Ontology Learning data sources • Methods used in ontology learning • Tools of ontology learning • Uses of ontology learning

Ontology learning – Technical description • The manual building of ontologies is a tedious task, which can easily result in a knowledge acquisition bottleneck. In addition, human expert modeling by hand is biased, error prone and expensive • Fully automatic machine knowledge acquisition remains in the distant future • Most systems are semi-automatic and require human (expert) intervention and balanced cooperative modeling for constructing ontologies

Semantic Information Integration

Ontology Engineering

Ontology learning – Process (1/2)

Ontology learning – Process (2/2) • Stages analysis: • Merging existing structures or defining mapping rules between these structures allows importing and reusing existing ontologies • Ontology extraction models major parts of the target ontology, with learning support fed from various input sources • The target ontology’s rough outline, which results from import, reuse and extraction is pruned to better fit the ontology to its primary purpose • Ontology refinement profits from the pruned ontology but completes the ontology at a fine granularity (in contrast to extraction) • The target application serves as a measure for validating the resulting ontology • The ontology engineer can begin this cycle again- for example, to include new domains in the constructing ontology or to maintain and update its scope

Ontology learning – Architecture (1/5)

Ontology learning – Architecture (2/5) • Ontology Engineering Workbench: A sophisticated means for manual modeling and refining of the final ontology. The ontology engineer can browse the resulting ontology from the ontology learning process and decide to follow, delete or modify the proposals as the task requires.

Ontology learning – Architecture (3/5) • Management component: The ontology engineer uses the management component to select input data – that is relevant resources such as HTML and XML documents, DTDs, databases or existing ontologies that the discovery process can further exploit. Then, using the management component the engineer chooses of a set of resource-processing methods available in the resource-processing component and from a set of algorithms available in the algorithm library.

Ontology learning – Architecture (4/5) • Resource processing Component: Depending on the available data the engineer can choose various strategies for resource processing: • Index and reduce HTML documents to free text • Transform semi-structured documents such as dictionaries into predefined relational structure • Handle semi-structured and structured schema data by following different strategies for import • Process free natural text After first preprocessing data according to one of these or similar strategies the resource processing module transforms the data into an algorithm specific relational representation.

Ontology learning – Architecture (5/5) • Algorithm library: A collection of various algorithms that work on the ontology definition and the preprocess input data. Although specific algorithms can vary greatly from one type of input to the next, a considerable overlap exists for underlying learning approaches such as associations rules, formal concept analysis or clustering.

Contents • Introduction – Ontologies, Ontology learning • Technical description • Ontology learning in the Semantic Information descritpion • Ontology Learning – Process • Ontology Learning - Architecture • Ontology Learning data sources • Methods used in ontology learning • Tools of ontology learning • Uses of ontology learning

Ontology Learning from Natural Language • Natural language texts exhibit morphological, syntactic, semantic, pragmatic and conceptual constraints that interact in order to convey a particular meaning to the reader. Thus, the text transports information to the reader and the reader embeds this information into his background knowledge • Through the understanding of the text, data is associated with conceptual structures and new conceptual structures are learned from the interacting constraints given through language • Tools that learn ontologies from natural language exploit the interacting constraints on the various language levels (from morphology to pragmatics and background knowledge) in order to discover new concepts and stipulate relationships between concepts

Ontology Learning from Semi-structured Data • HTML data, XML data, XML DTDs, XML-Schemata and their likes add - more or less expressive - semantic information to documents • A number of approaches understand ontologies as a common generalizing level that may communicate between the various data types and data descriptions. Ontologies play a major role for allowing semantic access to these vast resources of semi-structured data • Learning of ontologies from these data and data descriptions may considerably enforce the application of ontologies and, thus, facilitate the access to these data

Ontology Learning from Structured Data • The learning of ontologies from metadata, such as database schemata, in order to derive a common high-level abstraction of underlying data descriptions can be an important precondition for data warehousing or intelligent information agents

Methods for learning ontologies (1/8) • Clustering • The elaboration of any clustering method involves the definition of two main elements- a distance metrics and a classification algorithm • A workbench that supports the development of conceptual clustering methods for the (semi-) automatic construction of ontologies of a conceptual hierarchy type from parsed corpora is the Mo’K workbench

Methods for learning ontologies (2/8) • Clustering • Ontologies are organized as multiple hierarchies that form an acyclic graph where nodes are term categories described by intention and links represent inclusion. • Learning though hierarchical classification of a set of objects can be performed in two main ways: top down, by incremental specialization of classes and bottom-up by incremental generalization

Methods for learning ontologies (3/8) • Information Extraction Rules

Methods for learning ontologies (4/8) • Information Extraction Rules • We start with: • An initial hand crafted seed ontology of reasonable quality which contains already the relevant types of relationships between ontology concepts in the given domain • An initial set of documents which exemplarily represent (informally) substantial parts of the knowledge represented in the seed ontology

Methods for learning ontologies (5/8) • Information Extraction Rules • Compared to other ontology learning approaches this technique is not restricted to learning taxonomy relationships, but arbitary relationships in an application domain. • A project that uses this technique is the FRODO project.

Methods for learning ontologies (6/8) • Association Rules • Association-rule-learning algorithms are used for prototypical applications of data mining and for finding associations that occur between items in order to construct ontologies (extraction stage) • ‘Classes’ are expressed by the expert as a free text conclusion to a rule. Relations between these ‘classes’ may be discovered from existing knowledge bases and a model of the classes is constructed (ontology) based on user-selected patterns in the class relations • This approach is useful for solving classification problems by creating classification taxonomies (ontologies) from rules

Methods for learning ontologies (7/8) • Association Rules – Example • A classification knowledge based system with experimental results based on medical data (Suryanto & Compton – Australia) • Ripple Down Rules (RDR) were used to describe classes and their attributes: Satisfactory lipid profile previous raisedLDL noted  (LDL <= 3.4)AND(Triglycerideis NORMAL)AND(Max(LDL)>3.4)OR ((LDL is NORMAL)AND(Triglyceride isNORMAL)AND(Max(LDL) is HIGH) • Experts were allowed to modify or add conclusions in order to correct errors • The conclusions of the rules formed the classes of the classification ontology

Methods for learning ontologies (8/8) • Association Rules – Example • Ontology learning methodology used: • Firstly, class relations between rules were discovered. There were three basic relations: subsumption/ intersection, mutual exclusivity and similarity • Secondly, more compound relations which appeared interesting using the three basic relations were specified • Finally, instances of these compound relations or patterns were extracted and the class model was assembled • Problems that occurred: • Very similar conclusions were sometimes identified as mutually exclusive in cases where there different values for the same attribute • The method did not consider any other information about the classes themselves

Ontology learning tools – ASIUM (1/8) • Acronym for "Acquisition of Semantic knowledge Using Machine learning method" • The main aim of Asium is to help the expert in the acquisition of semantic knowledge from texts and to generalize the knowledge of the corpus • Asium provides the expert with an interface which will first help him or her to explore the texts and then to learn knowledge which are not in the texts • During the learning step, Asium helps the expert to acquire semantic knowledge from the texts, like subcategorization frames and an ontology. The ontology represents an acyclic graph of the concepts of the studied domain. The subcategorization frames represent the use of the verbs in these texts

Ontology learning tools – ASIUM (2/8) • Methodology: The input for Asium are syntactically parsed texts from a specific domain. It then extracts these triplets: verb, preposition/function (if there is no preposition), lemmatized head noun of the complement. Next, using factorization, Asium will group together all the head nouns occurring with the same couple verb, preposition/function. These lists of nouns are called basic clusters. They are linked with the couples verb,preposition/ function they are coming from.

Ontology learning tools – ASIUM (3/8) • Methodology: Asium then computes the similarity among all the basic clusters together. The nearest ones will be aggregated and this aggregation is suggested to the expert for creating a new concept. The expert defines a minimum threshold for gathering clusters into concepts. Any learned concepts can contain noise (e.g. mistakes in the parsing), any sub-concepts the expert wants to identify or over-generalization due to aggregations may occur,so the expert’s contribution is necessary.

Ontology learning tools – ASIUM (4/8) • Methodology: After this, Asium will have learned the first level of the ontology. Asium computes similarity again but among all the clusters; the old and the new ones in order to learn the next level of the ontology. The cooperative process runs until there are no more possible aggregations. The output of the learning process is an ontology and subcategorization frames. The ontology represents an acyclic graph of the concepts of the studied domain. The subcategorization frames represent the use of the verbs in these texts.

Ontology learning tools – ASIUM (5/8) • Methodology • The advantages of this method are twofold: • First, the similarity measure identifies all concepts of the domain and the expert can validate or split them. Next the learning process is, for one part, based on these new concepts and suggests more relevant and more general concepts. • Second, the similarity measure will offer the expert aggregations between already validated concepts and new basic clusters in order to get more knowledge from the corpus.

Ontology learning tools – ASIUM (6/8) • The interface This window allows the expert to validate the concepts learned by Asium.

Ontology learning tools – ASIUM (7/8) • The interface This window displays the list of all the examples covered for the learned concept. This display allows the expert to visualize all the sentences which will be allowed if this class is validated.

Ontology learning tools – ASIUM (8/8) • The interface This window displays the ontology like it actually is in memory i.e. learned concepts and concepts to be proposed for a level (each blue circle represents a class).

Ontology learning tools – TEXT-TO-ONTO (1/8) • It develops a semi-automatic ontology learning from text • It tries to overcome the knowledge acquisition bottleneck • It is based on a general architecture for discovering conceptual structures and engineering ontologies from text

Ontology learning tools – TEXT-TO-ONTO (2/8)

Ontology learning tools – TEXT-TO-ONTO (3/8) • Architecture

Ontology learning tools – TEXT-TO-ONTO (4/8) • Architecture - Main components • Text & Processing Management Component • The ontology engineer uses that component to select domain texts exploited in the further discovery process.Can choose among a set of text (pre-) processing methods available on the Text Processing Server and among a set of algorithms available at the Learning & Discovering component.The former module returns text that is annotated by XML and XML-tagged is fed to the Learning & Discovering component

Ontology learning tools – TEXT-TO-ONTO (5/8) • Architecture - Main components • Text Processing Server • It contains a shallow text processor based on the core system SMES. SMES is a system that performs syntactic analysis on natural language documents • It organized in modules, such as tokenizer, morphological and lexical processing and chunk parsing that use lexical resources to produce a mixed syntactic/semantic information • The results are stored in annotations using XML-tagged text

Ontology learning tools – TEXT-TO-ONTO (6/8) • Architecture - Main components • Lexical DB & Domain Lexicon • SMES accesses a lexical database with more than 120.000 stem entries and more than 12.000 subcategorization frames that are used for lexical analysis and chunk parsing • The domain-specific part of the lexicon associates word stems with concepts available in the concept taxonomy and links syntactic information with semantic knowledge that may be further refined in the ontology

Ontology learning tools – TEXT-TO-ONTO (7/8) • Architecture - Main components • Learning & Discovering component • Uses various discovering methods on the annotated texts e.g. term extraction methods for concept acquisition.

Ontology learning tools – TEXT-TO-ONTO (8/8) • Architecture - Main components • Ontology Engineering Enviroment-ONTOEDIT • Supports the ontology engineer in semi-automatically adding newly discovered conceptual structures to the ontology • Internally stores modeled ontologies using an XML serialization

Uses of ontology learning – Knowledge sharing (1/2) • Identifying candidate relations between expressive, diverse ontologies using concept cluster integration in multi-agent systems • Agents with diverse ontologies should be able to share knowledge by automated learning methods and agent communication strategies • Agents that do not know the relationships of their concepts to each other need to be able to teach each other these relationships (ontology learning)

Comprehensive Guide to Ontology Learning: Methods, Tools, and Applications

Comprehensive Guide to Ontology Learning: Methods, Tools, and Applications

Presentation Transcript

Ontology

Ontology Learning from Text

Ontology (Science) vs. Ontology (Engineering)

“Ontology”

Ontology learning and population from from text

Ontology Learning and Population from Text

Ontology

Towards Ontology Learning from Folksonomies

Parallel Corpora for Multilingual Ontology Learning

Ontology

OCM Ontology and Ontology Services

Ontology Learning

Learning Goal Ontology How Can We Form Effective Collaborative Learning Groups?

ontology

Knowledge Discovery in Ontology Learning

Ontology

Actively Learning Ontology Matching via User Interaction

An Ontology-based Learning Design Assistant

On the Need to Bootstrap Ontology Learning with Extraction Grammar Learning

Ontology

Ontology…

Ontology