520 likes | 605 Views
Ontology Generation -- surveys. Yihong Ding CS652 Spring 2004. Three Papers. Mariano Fernández-López. Overview of Methodologies for Building Ontologies . In IJCAI-99 Workshop on Ontologies and Problem-solving Methods , 1999.
E N D
Ontology Generation-- surveys Yihong Ding CS652 Spring 2004
Three Papers • Mariano Fernández-López. Overview of Methodologies for Building Ontologies. In IJCAI-99 Workshop on Ontologies and Problem-solving Methods, 1999. • Borys Omelayenko. Learning of Ontologies for the Web: the Analysis of Existent Approaches. In International Workshop on Web Dynamics held in conj. with the 8th International Conference on Database Theory (ICDT'01), 2001. • Ying Ding and Schubert Foo. Ontology research and development. Part 1: A review of ontology generation. In Journal of Information Science, 2002.
Mariano Fernández-López, 1999 • Propose lots of guidelines based on IEEE Standard 1074-1995 for manual ontology development • Examine the methodologies for five different projects • Uschold and King 1995 • Grüninger And Fox, 1995 • Berneras et. al., 1996 • METHONTOLOGY, 1996 • SENSUS, 1997
IEEE Standard 1074-1995 The standard for developing software life cycle • Software life cycle model processes (identify and select life cycle) • Project management processes (create framework of project) • Software development-oriented processes • Pre-development processes (study the environment) • Development processes • Requirement process (develop software requirements specification) • Design process (develop software representation that meets the requirements) • Implementation process (transform representation to programming language) • Post-development processes (install, operate, support, and maintenance) • Integral process (ensure the completion and quality)
Criteria for Analyzing Methodologies • C1. Inheritance from Knowledge Engineering • C2. Detail of the methodology • C3. Recommendation for knowledge formalization • C4. Strategy for building ontologies • Application dependent, semi-dependent, or independent • C5. Strategy for identifying concepts • Bottom-up, top-down, or middle-out • C6. Recommended life cycle • C7. Differences between the methodology and IEEE 1074-1995 • C8. Recommended techniques • C9. Ontology and system built
Uschold and King • Description: developing the Enterprise Ontology for enterprise modeling processes • Building process (middle-out) • Ontology capture • Identify key concepts and relationships • Produce precise unambiguous text definitions • Identify other terms refer to identified concepts and relationships • Coding • Integrating existing ontologies
Uschold and KingAnalysis of Methodology • C1. partial: identifies an acquisition, coding and evaluation stage, but without feasibility study and prototyping • C2. very little • C4. application-independent • C5. middle-out: from most important to less important, the others from generalization and specialization • C7. • Processes missing: management, pre-development, and post-development, design • Activities missing: environment study, feasibility study, training and configuration management • C8. technical details are unclear
Grüninger And Fox • Description: developing the TOVE (TOronto Virtual Enterprise) project ontology within the domain of business processes and activities modeling • Building process (middle-out) • Capture of motivating scenarios • Motivating scenarios: problems or examples which are not adequately addressed by existing ontologies • Motivating scenario provides possible solutions • Solutions provide an informal intended semantics for the objects and relations • Formulation of informal competency questions • Based on the motivating scenarios • Serve as constraints rather than determining a particular design • Evaluate ontological commitment • Specification of the terminology of the ontology within a formal language • Getting informal terminology: terms extracted from the questions • Specification of formal terminology: formalizing terms • Formulation of formal competency questions using the terminology of the ontology • Specification of axioms and definitions for the terms in the ontology within the formal language • Establish conditions for characterizing the completeness of the ontology
Grüninger And FoxAnalysis of Methodology • C1. small: this is a question-answer-pair driven approach, not very much involved in knowledge-based system development • C2. little • C3. logic • C4. application-semidependent (scenarios) • C5. middle-out • C7. • Processes missing: management, pre-development, and post-development, design • Activities missing: training and configuration management • C8. technical details are unclear
Berneras et. al • Description: developing the Esprit KACTUS project to investigate the feasibility of knowledge reuse in complex technical systems and the role of ontologies to support it • Building process (top-down) • Specification of the application • Preliminary design based on relevant top-level ontological categories • It involves searching ontologies developed for other applications, which are refined and extended for use in the new application. • Ontology refinement and structuring
Berneras et. AlAnalysis of Methodology • C1. big: follow the tradition of knowledge engineering • C2. very little • C4. application-dependent • C5. top-down • C7. • Processes missing: management, pre-development, and post-development • Activities missing: training, documentation, configuration management, verification, and validation • C8. technical details are unclear
METHONTOLOGY • Description • Enabling the construction of ontologies at the knowledge level • Supported by Ontology Design Environment (ODE) • Including • Identification of the ontology development process • A life cycle based on evolving prototypes • Particular techniques for carrying our each activity • Ontologies developed • CHEMICALS • Environment pollutants ontologies • The Reference-Ontology • The restructured version of (KA)2 ontology • Building process (middle-out): refers to which activities are carried out • Project management activities • Planning: identify tasks • Control: guarantee planned tasks being completed when intended • Quality Assurance: assure the quality of outputs • Development-oriented activities • Specification, conceptualization, formalization, and implementation • Support activities • Knowledge acquisition, evaluation, integration, documentation, and configuration management
METHONTOLOGYAnalysis of Methodology • C1. big: it has its roots in a methodology for developing knowledge-based systems • C2. a lot • C3. flexible • C4. application-independent • C5. middle-out: most relevant concepts are identified first • C6. evolving prototypes • C7. • Processes missing: software life cycle model, and pre-development • Activities missing: project initiation, installation, support, retirement, and training • C8. technical details are unclear
SENSUS • Description • Developed for natural language processing • Content obtained by extracting and merging information from various electronic sources of knowledge • PENMAN Upper Model, ONTOS, manually built semantic categories, WordNet, Spanish and Japanese lexical entries • Including • More than 50,000 concepts organized in a hierarchy • Both high and medium level of abstraction • Generally not cover terms from specific domains • Building process (bottom-up) • Take a series of seed terms, linked to SENSUS by hand • Specify paths from the seed terms to the root • Add more relevant terms • Prune any irrelevant terms
SENSUSAnalysis of Methodology • C1. none: based on adding terms into an existing ontology • C2. medium: not very detailed • C3. semantic networks • C4. application-semidependent • C5. bottom-up • C7. • Processes missing: management, pre-development, and post-development, design • Activities missing: training, documentation, configuration management, verification, and validation • C8. technical details are unclear
Summary • None of the methodologies are fully mature comparing with the IEEE standard • The proposals are not unified • SENSUS is completely different from the others • It suggests we adopt several widely accepted methodologies than on standardized one • Interpretability between systems is allowed
Borys Omelayenko 2001 • Learning-based ontology development • Examine eleven different approaches • Bisson et. al. 2000 • Faure and Poibeau, 2000 • Agirre et. al., 2000 • Junker et. al., 1999 • Craven et. al., 2000 • Bowers et. al., 2000 • Taylor et. al., 1997 • Webb, Wells, Zheng, 1999 • Soderland et. al., 1995 • Maedche and Staab, 2000 • Suryanto and Compton 2000
Ontological Components • Natural language ontologies (horizontal) • Contain lexical relations between language concepts • Large in size and do not require frequent updates • Used to expand user queries • Capture concepts but not provide detailed descriptions • Domain ontologies (vertical) • Capture knowledge of a particular domain • Provide detailed descriptions of the domain • Ontology instances (dot) • Main piece of knowledge presented in the future Semantic Web • Serve for Web pages • Contain links to other instances
Ontology Learning Tasks • Ontology acquisition • Ontology creation • Ontology schema extraction • Extraction of ontology instances • Ontology maintenance • Ontology integration and navigation • Ontology update • Ontology enrichment
Machine Learning Techniques • Ontology representation requires symbolic learning methods • Skip neural networks, genetic algorithm, and the family of ‘lazy learners’. • Methods studies in this paper • Propositional rule learning (zero-order logic) • First-order logic rules learning • Bayesian learning • Clustering algorithms
ML vs. Manually • Modeling primitives • ML: simple and limited (usually simple rules) • Man: rich (frames, subclasses, rules with rich set of operations, functions, etc.) • Knowledge base structure • ML: flat and homogeneous • Man: hierarchical, consisting of various components with subclass-of, part-of, and other relations • Tasks • ML: categorize objects into a limited and unstructured set of classes • Man: classify objects into a tree of structured classes • Problem-solving methods • ML: very primitive, based on simple search strategies • Man: complicated, inference over a knowledge base with rich structure • Solution space • ML: non-extensible, fixed set of class labels • Man: extensible set of primitive and compound solutions • Readability of the knowledge bases to a human • Not required • required
Requirements for OL • Aim: automatically construct ontologies with the properties of manually constructed ontologies • Requirements • Ability to interact with a human • Readability of internal and external results of the learner • Ability to use complex modeling primitives • Ability to deal with complex solution spaces
Requirements for Ontological Components • NLO • Hierarchical clustering of language concepts • Limited set of relations • Ability to link to specific domain ontologies • ML focus: enrichment based on domain texts is popular • Do not require frequent or automatic updates • DO • Use the whole set of modeling primitives • Complex in structure • ML focus: discovering statistically valid patterns for creation • Require more updates • OI • Concepts mark-up of the underlying domain ontology in Web pages • ML focus: IE and annotation • Require frequent updates
Leaning of NLO Bisson et. al. 2000 (Mo’K tool) • Human-assisted bottom-up clustering of conceptual hierarchies from corpora • Human selects input examples and attributes, level of pruning, and distance evaluation functions • Group ‘similar’ objects to create the classes • Group ‘similar’ classes to form the hierarchy • No human interaction during clustering process • Further study on integrating NLO enrichment with the Web search of relevant texts
Leaning of NLO Agirre et. al., 2000 • Enrich WordNet by exploiting texts from the Web • Construct lists (topic signatures) of topically related words (with weight/strength) for each concept in WordNet • Each word sense has one associated list of related words • Related Web pages from AltaVista search engine by specifying particular queries • Query refers to a particular sense but not others • Example: waiter AND and (restaurant OR menu) AND NOT (station OR airport)
Leaning of NLO Faure and Poibeau, 2000 (Asium) • Creating domain-specific NLO by unsupervised domain-specific clustering of texts from corpora • Generate syntactical structure of texts by Sylex • Cooperative learning of semantic knowledge from parsed texts • Bottom-up, breadth-first clustering for form the hierarchy • Expert validate and label concepts
Learning of DO Maedche and Staab, 2000 • Semiautomatically ontology learning from texts • Input : a set of transactions • Transaction: contain a set of items appearing together • Association rule: sets of items that appear together sufficiently often • ML: discover generalized association rule • Final: present the rules to the knowledge engineer
Learning of DO Suryanto and Compton 2000 • First attempt of using ML to discover hierarchical relations between textually described classes • Discovery class relations between classification rules • Three basic relations: intersection, mutual-exclusion, similarity • Each relation is defined a measure of degree for three basic relations
Learning of DO Taylor et. al., 1997 • Ontology-based induction of high-level classification rules • Ontologies not only for explaining rules but also to guide learning algorithm • Algorithm generates queries for an external learner ParkaDB • DO and input data check consistency of queries • Consistent queries become classification rules • Query generation continues until the set of rules covers the whole data set
Learning of DO Webb, Wells, Zheng, 1999 • ML plus knowledge acquisition from experts improves the accuracy of developed domain ontology and reduce development time • Three types of knowledge acquisition systems • Manually based on experts • ML systems • Integrated system • ML method: C4.5 decision tree
Learning of OI Bowers et. al., 2000 • Replacing the attribute-value dictionary with a more expressive one that consists of simple data types, tuples, sets and graphs • Using modified C4.5 learner
Learning of OI Soderland et. al., 1995 (CRYSTAL) • Formalize ontology instances from text and generate a concept hierarchy from the instances • Given domain model as input • Use a richer set of modeling primitives • Generalize semantic mark-up of the manually marked-up training corpora • Formalize the instance level of hierarchy • Searched-based generalization of concept nodes
Learning of OI Craven et. al., 2000 (Web-KB) • Systematic study of the extraction of OI from Web documents • Ontology as an academic web-site to populate it with actual instances and relations from CS departments’ web sites • Three learning tasks • Recognize class instances from hypertext documents guided by the ontology • Recognize relation instance from the chains of hyperlinks • Recognize class and relation instances from the pieces of hypertext • Two supervised learning methods • Naïve Bayes learner • Modified FOIL (first-order rule learner) • Automatically create mapping between the manually constructed domain ontology and the Web pages by generalizing from the training instances
Summary • Main problem of OL: flat and homogeneous structure learned • Learning of NLO • General-purpose NLO exists • Mainly enrichment • Most popular ML algorithm: clustering • Learning of DO • Human-guided learning • Learning plays only a minor role in knowledge acquisition • Most popular ML algorithm: propositional learning • Learning of OI • The structure of OI is too rich to be adequately captured by propositional rules • Multiple different ML algorithm are applied
Ying Ding and Schubert Foo 2002 • Methods used and problems encountered in many recent ontology generation approaches • Examine seven main collection of approaches • InfoSleuth (MCC) • SKC (Stanford) • Ontology Learning (AIFB) • ECAI2000 • Inductive logic programming (UT) • Library Science and Ontology • Others
InfoSleuth • A research project at MCC (Microelectronics and Computer Technology Corporation) • Develop and deploy new technologies for finding information available both in corporate networks and external networks • Description • Locating, evaluating, retrieving, and merging information in a frequently updating environment • Build up an ontology-based agent architecture • Been successfully implemented in • Knowledge management • Business intelligence • Logistics • Crisis management • Genome mapping • Environment data exchange network
InfoSleuth: method • Input resources • Human expert feeds system a small set of seedwords (high-level concept) • IR engine feeds relevant documents (with or without POS tagged) automatically • System process • Parse documents • Extract phrases with seedwords • Generate concept terms • Place them into ontology • Collect candidate seedwords for next round of processing • Relationship retrieving • is-a, part-of, manufactured-by, owned-by, etc. • assoc-with is used to define relations except is-a • Use linguistic properties to identify relations • Human experts evaluate and adjust results • Special features • Expand ontology with new concepts and alert human expert to update • Discover attributes associated with certain concepts • Index documents for future retrieval • Allow users to decide between precision and completeness by browsing
InfoSleuth: problems • Syntactic structure ambiguity (concept token identification) • image process software • Different phrases refer to the same concept • Word sense disambiguation • Proper attachment of adjective modifier may help avoid non-concepts • Heterogeneous resources (inconsistent terminologies) • Automatically constructed ontology can be too prolific and deficient at the same time (because of the seedwords)
SKC (Scalable Knowledge Composition) • A research project at Stanford • Resolve semantic heterogeneity in information systems • Description • Derive general methods for ontology integration • Application-independent • Develop an ontology algebra • Convert Webster’s dictionary to a graph structure • Funded by • AFOSR, DARPA, HPKB
SKC: method • Concept graph technique detail is unknown • Use a novel algebraic extraction technique to generate the graph structure and create thesaurus entries for all words including some stopwords • Idea from PageRank algorithm • ArcRank algorithm to extract relations • Basic hypothesis: structural relationships between terms are relevant to their meaning • Pattern/Relation extraction algorithm • Compute a set of nodes that contain arcs comparable to seed arc set • Threshold them according to ArcRank value • Extend seed arc set, when nodes contain further commonality • If the node set increased in size repeat from the first step • The algorithm is self-limited via threshold and distinguish senses
SKC: problems • Syllable and accent markers in head words • Misspelled head words • Mis-tagged fields • Stemming and irregular verbs • Common abbreviations in definitions • Undefined words with common prefixes • Multi-word head words • Undefined hyphenated and compound words
Ontology Learning • A project in AIFB (Institute of Applied Informatics and Formal Description Methods, University of Karlsruhe, Germany) • Extract ontology from domain data • Description • To learn both taxonomic and non-taxonomic relations for ontologies
OL: method • Shallow text processing • Implement on top of SMES (text process for German) • Use weighted finite state transducers to process phrasal and sentential patterns • Output dependency relations • Learning algorithm • Input dependency relations • Select the set of documents • Define association rules • Determine confidence for the rules • Output association rules exceeding the user-defined confidence
OL: problems • Lightweight ontology contains too many noisy data • Word sense problem generates lots of ambiguity • Refinement of the lightweight ontologies is a trickle issue (need future work) • Relationship learning is not trivial
ECAI 2000 • Ontology Learning Workshop of ECAI 2000 (European Conference on Artificial Intelligence) • Description • Use NLP techniques • Extract important (high frequency) words or phrases to define concepts • Use general top-level ontology (WordNet, SENSUS) to assist disambiguation • Problem: relation extraction
Inductive Logic Programming • WOLFIE (WOrd Learning From Interpreted Examples) at Machine Learning Group in University of Texas at Austin • Description • Learn semantic lexicon from a corpus of sentences • Learned lexicon • Consist of words with meaning • Allow synonym and ploysymy • Ultimate goal: learn to parse novel sentences into their meaning representations • Have the potential to be a workbench for ontological concept extraction and relation detection • Problem: how to deploy their methods for ontology concept and rule learning to make the workbench work
Library Science and Ontology • Digital Library + Semantic Web • Digital libraries use various forms of vocabularies instead of formal ontologies • Kwasnik (1999) convert a controlled vocabulary scheme into an ontology • Higher levels of conception of descriptive vocabulary • Deeper semantics for class/subclass and cross-class relationships • Ability to express concepts and relationship in a description language • Reusable and sharable of the ontological constructs • Strong inference and reasoning functions • Problems • Different ways of modeling knowledge (shallow or deeper semantics) • Different ways of representing knowledge (lexical-flavored or mathematical and logical-flavored) • To merge or create a common standard for the two fields will be a long way
Others • Borgo 1997 • Use lexical semantic graphs to create ontology • Based on WordNet • Yamaguchi 1999 • Construct domain ontologies • Based on a machine-readable dictionary • Kashyap 1999 • Construct ontology for IR • Based on database schema
Ontology Learning(Research Location Index) [34] • Europe • France (7) • Germany (5) • Spain (3) • Others: Italy (2), Austria, Greece, Netherlands, Portugal, Switzerland, UK • *European Union (2): • OntoWeb: University of Karlsruhe • On-To-Knowledge: many countries • USA • Stanford (2) • Austin (2): UT, MCC • Dallas (2): UT, Southern Methodist University • Other: UC Berkeley, Mississippi State University, BYU, UW • Others • Australia, Canada, Israel, Japan, Taiwan (China)