220 likes | 379 Views
An Ontology Creation Methodology: A Phased Approach. Jon Atle Gulla Norwegian University of Science and Technology; Norway jag@idi.ntnu.no Vijay Sugumaran Oakland University, USA sugumara@oakland.edu. Agenda. Ontology development Traditional ontology learning
E N D
An Ontology Creation Methodology: A Phased Approach Jon Atle Gulla Norwegian University of Science and Technology; Norway jag@idi.ntnu.no Vijay Sugumaran Oakland University, USA sugumara@oakland.edu
Agenda • Ontology development • Traditional ontology learning • Limitations of ontology learning • A phased approach to ontology learning
The Challenge • How to develop large complex ontologies? • How to keep ontologies updated in dynamic domains?
Traditional ontology engineering approach Project:Form team of ontology and domain experts Ontology & domain experts:Collaborative manual modeling process Domain experts:Verify ontology against domain knowledge Ontology experts:Verify ontology against syntactic and semantic quality measures Expensive and time-consuming approach Stable domains assumed Ontology learning approach: Domain experts:Find representative domain text Tool:Extract candidate classes, individuals and properties automatically from domain texts Ontology & domain experts:Verify candidate structures and complete ontology Can also be used to verify domain quality of existing ontology Cost-effective approach Not unproblematic in dynamic domains Ontology Modeling vs. Learning
Agenda • Ontology development • Traditional ontology learning • Limitations of ontology learning • A phased approach to ontology learning
Ontology Learning Basis • People communicate using domain-specific concepts • People document using domain-specific concepts • Ontology learning: Extract ontology structures from written documentation • Requirements: • Documents representative for domain terminology • Documents cover all the terminology • Well-defined and consistent use of terminology in domain Realm of ontology engineering Ontology discussions Realm of ontology learning Ontology in use
Levels of Ontology Learning Degree of difficulty x,y(manager(x,y) → report(y,x)) Rules FINANCE(ag:SPONSOR, go: PROJECT) Relations is_a(MANAGER, EMPLOYEE) Concept hierarchies Concepts PROJECT Synonyms (leader, manager, lead) Terms sponsors, costs, charter
Term extraction Linguistic analysis Statistical analysis Synonyms Classification-based techniques Distribution-based techniques Concept formation Structure recognition Keyphrase generation Instance learning Concept hierarchy Clustering Lexico-syntactic patterns Head-modifier approaches Subsumption approaches Classification-based techniques Relations Association rules Concept vectors Rules Structure recognition for meta-property recognition Dependency trees and path similarities Ontology Learning Strategies
Ontology Learning Process Scope management WBS Business need Constituent components Product description ... Abstract elements Constraints Properties Rules PMBOK Domain text Concept candidates Search ontology Reference set Manual selection of candidates and completion of model Automatic extraction of concept and relationship candidates
Scope/NNP planning/NN is/VBZ the/DT process/NN of/IN progressively/RB elaborating/VBG and/CC documenting/VBG the/DT project/NN work/NN (/( project/NN scope/NN )/) that/WDT produces/VBZ the/DT product/NN of/IN the/DT project/NN ./. Scope planning is the process of progressively elaborating and documenting the project work (project scope) that produces the product of the project. Scope plan process progress elaborate document project work project scope produce product project {scope planning, process, project work, project scope, product, project} {(scope planning, 0.0097), (project scope, 0.0047), (product, 0.0043), (project work, 0.0008), (project, 0.0001), (process, 0.0000)} Ex 1. Learning Concept/Individual Candidates Scope planning is the process of progressively elaborating and documenting the project work (project scope) that produces the product of the project. POS tagging Stopword removal (571 words) Lemmatization/stemming (POS tags not shown) Select consecutive nouns as candidate phrases Calculate tf.idf score for phrases
Classes Relevant to the Drama Genre • Data sources: IMDB, Wikipedia, Videoload • Keyphrase extraction technique • Noun phrases ranked according to various statistical measures
Concept profiles Lucene Document indexer Light stemmer Lucene Paragraph indexer Concept profile builder Concept similarity calculation Lucene Sentence indexer Relationship merger Tokenizer Association rules miner GATE Sentence splitter GATE Tagger GATE Lemmatizer GATE Noun phrase extractor Noun phrase indexer Association rules Ex 2. Learning Relationship Candidates
Relationships Relevant to Drama Genre • Association rules on extracted concepts
Agenda • Ontology development • Traditional ontology learning • Limitations of ontology learning • A phased approach to ontology learning
Limitations of Ontology Learning • Different techniques produce different results • Different data sources produce different results • Lost control over process • Extensive verification of final ontology needed • New data hard to combine with old data
Agenda • Ontology development • Traditional ontology learning • Limitations of ontology learning • A phased approach to ontology learning
Ontology Learning for Entertainment Domain • Ontology evolution for DeutscheTelecom’s Videoload downloadservice • What does Brangelina mean? • Should Pitt be Brad Pitt or Michael Pitt? • Actor vs. Schauspieler? • All movies of Brad Pitt? • Last movie of Pitt?
Ontology Learning Project • Duration: Nov 2007 – Nov 2009 • Domain: movie download service • Ontology analysis and creation based on indexed noun phrases from movie documents • Ontology used for search and navigation on top of FAST search platform • Ontology learning challenges: • Domain changes from one day to another • No consistent domain terminology • No professional domain terminology • Multiple languages • Movies about anything... unlimited domain • Ontology needs to be up to date to support search
Ontology Workbench • 3 phases that are carried out independently • Crawling into Lucene indices • Supervised extraction of candidates • Combining candidates into ontology structures
Interactive Ontology Development Expandable indices Subset of data source Focus of analysis List of techniques Partial results Stored results Set operations for combining results