140 likes | 273 Views
Marko Grobelnik, Janez Brank, Blaž Fortuna, Igor Mozetič. Contextualizing Ontologies With Ontolight : A Pragmatic Approach. Outline. Ontology Ontolight Definition Grounding Population Applications Integration in OntoGen Demo. What is ontology?.
E N D
Marko Grobelnik, Janez Brank, Blaž Fortuna, Igor Mozetič Contextualizing Ontologies With Ontolight: A Pragmatic Approach
Outline • Ontology • Ontolight • Definition • Grounding • Population • Applications • Integration in OntoGen • Demo
What is ontology? • Ontology is a data model that represents a set of concepts within a domain and the relationships between those concepts. • Generally it consist of • Classes: sets, collections, or types of objects • Instances: the basic or "ground level" objects • Relations: ways that objects can be related to one another • It can be used • … as schema for knowledge management system, • … to reason about the objects within that domain, • etc.
Examples of Real-world Ontologies • AgroVoc • Multilingual thesaurus for the field of Agriculture, Forestry, Fisheries, Food Security and related stuff • Consists of • terms in different languages, • thesaurus relationships between terms • Broader, narrower, related • ASFA • Thesaurus used for annotating bibliography related to aquatic science literature • EuroVoc • Multilingual thesaurus used by European institutions • Acquis Communitarian corpus is annotated by EuroVoc • Cyc • Knowledge base, formalization of fundamental human knowledge • Dmoz – The Open Directory Project • Worlds largest directory of WWW, maintained by volunteer editors
What is Ontolight? • Simple model covering most of the well known light-weight ontologies • Stores ontology like a rich graph • Defined as: • List of languages used for lexical terms (covers multliliguality) • List of class-types (types of nodes in the graph) • List of classes (nodes in the graph) • List of relation types (types of links in the graph) • List of relations (links in the graph) • Grounding model • A function which proposes a set of classes for a given instance • Classification in machine learning
Grounding • Mutliclass classification model trained on the instances of ontology • In case of Dmoz web pages • In case of EuroVoc EU legislation • We used centroid-based classifier • Calculates a centroid vector for each class • Uses knowledge of hierarchy • Classification performed by kNN algorithm • Highly scalable – can handle 100s of thousands of classes
Population • Takes instance as an input • Output is a list of suggested classes • Example from EuroVoc • Instance: “Slovenia and Croatia are having a fishing industry” • Output:
OntoGen • Ontology construction and learning • Semi-Automatic: • Text-mining methods provide suggestions and insights into the domain • The user can interact with parameters of text-mining methods • All the final decisions are taken by the user • Data-Driven: • Most of the aid provided by the system is based on some underlying data provided by the system • Instances are described by features extracted from the data (e.g. bag-of-words vectors) Ontology visualization Concept hierarchy Selected concept Selected instance Concept’s details List of suggested sub-concepts Selected concept Keywords Concept’s instance management
Contextualized ontology generation • Ontolight is integrated with Ontogen • Helps at new ontology generation by means of existing ontologies • User loads Ontolight into Ontogen at start • Suggestion methods: • Concept suggestion • Offers concepts from loaded Ontolight as possible sub-concepts • Name suggestion • Offers names of concepts from Ontolight as possible concept names • All suggestions are integrated in semi-automatic manner
Concept suggestion • User selects concept • User selects Ontolight • OntoGen classifies each document into context – Ontolight ontology • Concepts with most documents are provided as suggestions to the user
Name suggestion • User selects concept • OntoGen classifies each document into context – loaded Ontolight ontologies • Names of concepts with most classified documents are provided as suggestions to the user
Demo AgroVoc and EuroVoc applied to Yahoo finance data