120 likes | 137 Views
Semi-Automatic Data-Driven Ontology Construction System. Blaz Fortuna, Marko Grobelnik, Dunja Mladenic Jozef Stefan Institute. Main features of OntoGen. Semi-Automatic Text-mining methods provide suggestions and insights into the domain
E N D
Semi-Automatic Data-Driven Ontology Construction System Blaz Fortuna, Marko Grobelnik, Dunja Mladenic Jozef Stefan Institute
Main features of OntoGen • Semi-Automatic • Text-mining methods provide suggestions and insights into the domain • The user can interact with parameters of text-mining methods • All the final decisions are taken by the user • Data-Driven • Most of the aid provided by the system is based on some underlying data provided by the system • Instances are described by features extracted from the data (e.g. bag-of-words vectors)
OntoGen v1.0 • Designed for construction of topic ontologies • Clustering algorithms used for topic suggestion • Keyword extractions methods help the user to name the concept • Interactive user interface
OntoGen v2.0 • Improved user interface • Based on the feedback from users • New features: • Active Learning • Learning new concepts based on user queries and user classification of carefully selected documents • Simultaneous Ontologies • Optimization of similarity measure based on provided document categories • Concept’s Instances Visualization • Integration of Document Atlas visualization • Ontology Population • Interactive classification of new instances into ontology
Concept hierarchy Sub-Concept suggestion Ontology visualization
Concept hierarchy Concept’s documents management Selected concept’s details
Active Learning • SVM hyperplane distance based active learning algorithm • First few labelled documents are bootstrapped using user query and nearest-neighbour search • In each step the unlabeled document closest to the hyperplane is chosen for user classification
Simultaneous Ontologies Topics view • Data: Reuters news articles • Each news is assigned two different sets of categories: • Topics • Countries • Each set of categories offers a different view on the data Countries view Documents
Ontology Population • One vs. All linear SVM used classification • Interactive user interface where user can finalize the classifications
New documents Classification of the selected document Selected document