1 / 52

Ontology Generation -- surveys

Ontology Generation -- surveys. Yihong Ding CS652 Spring 2004. Three Papers. Mariano Fernández-López. Overview of Methodologies for Building Ontologies . In IJCAI-99 Workshop on Ontologies and Problem-solving Methods , 1999.

alessa
Download Presentation

Ontology Generation -- surveys

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ontology Generation-- surveys Yihong Ding CS652 Spring 2004

  2. Three Papers • Mariano Fernández-López. Overview of Methodologies for Building Ontologies. In IJCAI-99 Workshop on Ontologies and Problem-solving Methods, 1999. • Borys Omelayenko. Learning of Ontologies for the Web: the Analysis of Existent Approaches. In International Workshop on Web Dynamics held in conj. with the 8th International Conference on Database Theory (ICDT'01), 2001. • Ying Ding and Schubert Foo. Ontology research and development. Part 1: A review of ontology generation. In Journal of Information Science, 2002.

  3. Mariano Fernández-López, 1999 • Propose lots of guidelines based on IEEE Standard 1074-1995 for manual ontology development • Examine the methodologies for five different projects • Uschold and King 1995 • Grüninger And Fox, 1995 • Berneras et. al., 1996 • METHONTOLOGY, 1996 • SENSUS, 1997

  4. IEEE Standard 1074-1995 The standard for developing software life cycle • Software life cycle model processes (identify and select life cycle) • Project management processes (create framework of project) • Software development-oriented processes • Pre-development processes (study the environment) • Development processes • Requirement process (develop software requirements specification) • Design process (develop software representation that meets the requirements) • Implementation process (transform representation to programming language) • Post-development processes (install, operate, support, and maintenance) • Integral process (ensure the completion and quality)

  5. Criteria for Analyzing Methodologies • C1. Inheritance from Knowledge Engineering • C2. Detail of the methodology • C3. Recommendation for knowledge formalization • C4. Strategy for building ontologies • Application dependent, semi-dependent, or independent • C5. Strategy for identifying concepts • Bottom-up, top-down, or middle-out • C6. Recommended life cycle • C7. Differences between the methodology and IEEE 1074-1995 • C8. Recommended techniques • C9. Ontology and system built

  6. Uschold and King • Description: developing the Enterprise Ontology for enterprise modeling processes • Building process (middle-out) • Ontology capture • Identify key concepts and relationships • Produce precise unambiguous text definitions • Identify other terms refer to identified concepts and relationships • Coding • Integrating existing ontologies

  7. Uschold and KingAnalysis of Methodology • C1. partial: identifies an acquisition, coding and evaluation stage, but without feasibility study and prototyping • C2. very little • C4. application-independent • C5. middle-out: from most important to less important, the others from generalization and specialization • C7. • Processes missing: management, pre-development, and post-development, design • Activities missing: environment study, feasibility study, training and configuration management • C8. technical details are unclear

  8. Grüninger And Fox • Description: developing the TOVE (TOronto Virtual Enterprise) project ontology within the domain of business processes and activities modeling • Building process (middle-out) • Capture of motivating scenarios • Motivating scenarios: problems or examples which are not adequately addressed by existing ontologies • Motivating scenario provides possible solutions • Solutions provide an informal intended semantics for the objects and relations • Formulation of informal competency questions • Based on the motivating scenarios • Serve as constraints rather than determining a particular design • Evaluate ontological commitment • Specification of the terminology of the ontology within a formal language • Getting informal terminology: terms extracted from the questions • Specification of formal terminology: formalizing terms • Formulation of formal competency questions using the terminology of the ontology • Specification of axioms and definitions for the terms in the ontology within the formal language • Establish conditions for characterizing the completeness of the ontology

  9. Grüninger And FoxAnalysis of Methodology • C1. small: this is a question-answer-pair driven approach, not very much involved in knowledge-based system development • C2. little • C3. logic • C4. application-semidependent (scenarios) • C5. middle-out • C7. • Processes missing: management, pre-development, and post-development, design • Activities missing: training and configuration management • C8. technical details are unclear

  10. Berneras et. al • Description: developing the Esprit KACTUS project to investigate the feasibility of knowledge reuse in complex technical systems and the role of ontologies to support it • Building process (top-down) • Specification of the application • Preliminary design based on relevant top-level ontological categories • It involves searching ontologies developed for other applications, which are refined and extended for use in the new application. • Ontology refinement and structuring

  11. Berneras et. AlAnalysis of Methodology • C1. big: follow the tradition of knowledge engineering • C2. very little • C4. application-dependent • C5. top-down • C7. • Processes missing: management, pre-development, and post-development • Activities missing: training, documentation, configuration management, verification, and validation • C8. technical details are unclear

  12. METHONTOLOGY • Description • Enabling the construction of ontologies at the knowledge level • Supported by Ontology Design Environment (ODE) • Including • Identification of the ontology development process • A life cycle based on evolving prototypes • Particular techniques for carrying our each activity • Ontologies developed • CHEMICALS • Environment pollutants ontologies • The Reference-Ontology • The restructured version of (KA)2 ontology • Building process (middle-out): refers to which activities are carried out • Project management activities • Planning: identify tasks • Control: guarantee planned tasks being completed when intended • Quality Assurance: assure the quality of outputs • Development-oriented activities • Specification, conceptualization, formalization, and implementation • Support activities • Knowledge acquisition, evaluation, integration, documentation, and configuration management

  13. METHONTOLOGYAnalysis of Methodology • C1. big: it has its roots in a methodology for developing knowledge-based systems • C2. a lot • C3. flexible • C4. application-independent • C5. middle-out: most relevant concepts are identified first • C6. evolving prototypes • C7. • Processes missing: software life cycle model, and pre-development • Activities missing: project initiation, installation, support, retirement, and training • C8. technical details are unclear

  14. SENSUS • Description • Developed for natural language processing • Content obtained by extracting and merging information from various electronic sources of knowledge • PENMAN Upper Model, ONTOS, manually built semantic categories, WordNet, Spanish and Japanese lexical entries • Including • More than 50,000 concepts organized in a hierarchy • Both high and medium level of abstraction • Generally not cover terms from specific domains • Building process (bottom-up) • Take a series of seed terms, linked to SENSUS by hand • Specify paths from the seed terms to the root • Add more relevant terms • Prune any irrelevant terms

  15. SENSUSAnalysis of Methodology • C1. none: based on adding terms into an existing ontology • C2. medium: not very detailed • C3. semantic networks • C4. application-semidependent • C5. bottom-up • C7. • Processes missing: management, pre-development, and post-development, design • Activities missing: training, documentation, configuration management, verification, and validation • C8. technical details are unclear

  16. Summary • None of the methodologies are fully mature comparing with the IEEE standard • The proposals are not unified • SENSUS is completely different from the others • It suggests we adopt several widely accepted methodologies than on standardized one • Interpretability between systems is allowed

  17. Borys Omelayenko 2001 • Learning-based ontology development • Examine eleven different approaches • Bisson et. al. 2000 • Faure and Poibeau, 2000 • Agirre et. al., 2000 • Junker et. al., 1999 • Craven et. al., 2000 • Bowers et. al., 2000 • Taylor et. al., 1997 • Webb, Wells, Zheng, 1999 • Soderland et. al., 1995 • Maedche and Staab, 2000 • Suryanto and Compton 2000

  18. Semantic Querying over the Web

  19. Ontological Components • Natural language ontologies (horizontal) • Contain lexical relations between language concepts • Large in size and do not require frequent updates • Used to expand user queries • Capture concepts but not provide detailed descriptions • Domain ontologies (vertical) • Capture knowledge of a particular domain • Provide detailed descriptions of the domain • Ontology instances (dot) • Main piece of knowledge presented in the future Semantic Web • Serve for Web pages • Contain links to other instances

  20. Ontology Learning Tasks • Ontology acquisition • Ontology creation • Ontology schema extraction • Extraction of ontology instances • Ontology maintenance • Ontology integration and navigation • Ontology update • Ontology enrichment

  21. Machine Learning Techniques • Ontology representation requires symbolic learning methods • Skip neural networks, genetic algorithm, and the family of ‘lazy learners’. • Methods studies in this paper • Propositional rule learning (zero-order logic) • First-order logic rules learning • Bayesian learning • Clustering algorithms

  22. ML vs. Manually • Modeling primitives • ML: simple and limited (usually simple rules) • Man: rich (frames, subclasses, rules with rich set of operations, functions, etc.) • Knowledge base structure • ML: flat and homogeneous • Man: hierarchical, consisting of various components with subclass-of, part-of, and other relations • Tasks • ML: categorize objects into a limited and unstructured set of classes • Man: classify objects into a tree of structured classes • Problem-solving methods • ML: very primitive, based on simple search strategies • Man: complicated, inference over a knowledge base with rich structure • Solution space • ML: non-extensible, fixed set of class labels • Man: extensible set of primitive and compound solutions • Readability of the knowledge bases to a human • Not required • required

  23. Requirements for OL • Aim: automatically construct ontologies with the properties of manually constructed ontologies • Requirements • Ability to interact with a human • Readability of internal and external results of the learner • Ability to use complex modeling primitives • Ability to deal with complex solution spaces

  24. Requirements for Ontological Components • NLO • Hierarchical clustering of language concepts • Limited set of relations • Ability to link to specific domain ontologies • ML focus: enrichment based on domain texts is popular • Do not require frequent or automatic updates • DO • Use the whole set of modeling primitives • Complex in structure • ML focus: discovering statistically valid patterns for creation • Require more updates • OI • Concepts mark-up of the underlying domain ontology in Web pages • ML focus: IE and annotation • Require frequent updates

  25. Leaning of NLO Bisson et. al. 2000 (Mo’K tool) • Human-assisted bottom-up clustering of conceptual hierarchies from corpora • Human selects input examples and attributes, level of pruning, and distance evaluation functions • Group ‘similar’ objects to create the classes • Group ‘similar’ classes to form the hierarchy • No human interaction during clustering process • Further study on integrating NLO enrichment with the Web search of relevant texts

  26. Leaning of NLO Agirre et. al., 2000 • Enrich WordNet by exploiting texts from the Web • Construct lists (topic signatures) of topically related words (with weight/strength) for each concept in WordNet • Each word sense has one associated list of related words • Related Web pages from AltaVista search engine by specifying particular queries • Query refers to a particular sense but not others • Example: waiter AND and (restaurant OR menu) AND NOT (station OR airport)

  27. Leaning of NLO Faure and Poibeau, 2000 (Asium) • Creating domain-specific NLO by unsupervised domain-specific clustering of texts from corpora • Generate syntactical structure of texts by Sylex • Cooperative learning of semantic knowledge from parsed texts • Bottom-up, breadth-first clustering for form the hierarchy • Expert validate and label concepts

  28. Learning of DO Maedche and Staab, 2000 • Semiautomatically ontology learning from texts • Input : a set of transactions • Transaction: contain a set of items appearing together • Association rule: sets of items that appear together sufficiently often • ML: discover generalized association rule • Final: present the rules to the knowledge engineer

  29. Learning of DO Suryanto and Compton 2000 • First attempt of using ML to discover hierarchical relations between textually described classes • Discovery class relations between classification rules • Three basic relations: intersection, mutual-exclusion, similarity • Each relation is defined a measure of degree for three basic relations

  30. Learning of DO Taylor et. al., 1997 • Ontology-based induction of high-level classification rules • Ontologies not only for explaining rules but also to guide learning algorithm • Algorithm generates queries for an external learner ParkaDB • DO and input data check consistency of queries • Consistent queries become classification rules • Query generation continues until the set of rules covers the whole data set

  31. Learning of DO Webb, Wells, Zheng, 1999 • ML plus knowledge acquisition from experts improves the accuracy of developed domain ontology and reduce development time • Three types of knowledge acquisition systems • Manually based on experts • ML systems • Integrated system • ML method: C4.5 decision tree

  32. Learning of OI Bowers et. al., 2000 • Replacing the attribute-value dictionary with a more expressive one that consists of simple data types, tuples, sets and graphs • Using modified C4.5 learner

  33. Learning of OI Soderland et. al., 1995 (CRYSTAL) • Formalize ontology instances from text and generate a concept hierarchy from the instances • Given domain model as input • Use a richer set of modeling primitives • Generalize semantic mark-up of the manually marked-up training corpora • Formalize the instance level of hierarchy • Searched-based generalization of concept nodes

  34. Learning of OI Craven et. al., 2000 (Web-KB) • Systematic study of the extraction of OI from Web documents • Ontology as an academic web-site to populate it with actual instances and relations from CS departments’ web sites • Three learning tasks • Recognize class instances from hypertext documents guided by the ontology • Recognize relation instance from the chains of hyperlinks • Recognize class and relation instances from the pieces of hypertext • Two supervised learning methods • Naïve Bayes learner • Modified FOIL (first-order rule learner) • Automatically create mapping between the manually constructed domain ontology and the Web pages by generalizing from the training instances

  35. Summary • Main problem of OL: flat and homogeneous structure learned • Learning of NLO • General-purpose NLO exists • Mainly enrichment • Most popular ML algorithm: clustering • Learning of DO • Human-guided learning • Learning plays only a minor role in knowledge acquisition • Most popular ML algorithm: propositional learning • Learning of OI • The structure of OI is too rich to be adequately captured by propositional rules • Multiple different ML algorithm are applied

  36. Ying Ding and Schubert Foo 2002 • Methods used and problems encountered in many recent ontology generation approaches • Examine seven main collection of approaches • InfoSleuth (MCC) • SKC (Stanford) • Ontology Learning (AIFB) • ECAI2000 • Inductive logic programming (UT) • Library Science and Ontology • Others

  37. InfoSleuth • A research project at MCC (Microelectronics and Computer Technology Corporation) • Develop and deploy new technologies for finding information available both in corporate networks and external networks • Description • Locating, evaluating, retrieving, and merging information in a frequently updating environment • Build up an ontology-based agent architecture • Been successfully implemented in • Knowledge management • Business intelligence • Logistics • Crisis management • Genome mapping • Environment data exchange network

  38. InfoSleuth: method • Input resources • Human expert feeds system a small set of seedwords (high-level concept) • IR engine feeds relevant documents (with or without POS tagged) automatically • System process • Parse documents • Extract phrases with seedwords • Generate concept terms • Place them into ontology • Collect candidate seedwords for next round of processing • Relationship retrieving • is-a, part-of, manufactured-by, owned-by, etc. • assoc-with is used to define relations except is-a • Use linguistic properties to identify relations • Human experts evaluate and adjust results • Special features • Expand ontology with new concepts and alert human expert to update • Discover attributes associated with certain concepts • Index documents for future retrieval • Allow users to decide between precision and completeness by browsing

  39. InfoSleuth: problems • Syntactic structure ambiguity (concept token identification) • image process software • Different phrases refer to the same concept • Word sense disambiguation • Proper attachment of adjective modifier may help avoid non-concepts • Heterogeneous resources (inconsistent terminologies) • Automatically constructed ontology can be too prolific and deficient at the same time (because of the seedwords)

  40. SKC (Scalable Knowledge Composition) • A research project at Stanford • Resolve semantic heterogeneity in information systems • Description • Derive general methods for ontology integration • Application-independent • Develop an ontology algebra • Convert Webster’s dictionary to a graph structure • Funded by • AFOSR, DARPA, HPKB

  41. SKC: method • Concept graph technique detail is unknown • Use a novel algebraic extraction technique to generate the graph structure and create thesaurus entries for all words including some stopwords • Idea from PageRank algorithm • ArcRank algorithm to extract relations • Basic hypothesis: structural relationships between terms are relevant to their meaning • Pattern/Relation extraction algorithm • Compute a set of nodes that contain arcs comparable to seed arc set • Threshold them according to ArcRank value • Extend seed arc set, when nodes contain further commonality • If the node set increased in size repeat from the first step • The algorithm is self-limited via threshold and distinguish senses

  42. SKC: problems • Syllable and accent markers in head words • Misspelled head words • Mis-tagged fields • Stemming and irregular verbs • Common abbreviations in definitions • Undefined words with common prefixes • Multi-word head words • Undefined hyphenated and compound words

  43. Ontology Learning • A project in AIFB (Institute of Applied Informatics and Formal Description Methods, University of Karlsruhe, Germany) • Extract ontology from domain data • Description • To learn both taxonomic and non-taxonomic relations for ontologies

  44. OL: method • Shallow text processing • Implement on top of SMES (text process for German) • Use weighted finite state transducers to process phrasal and sentential patterns • Output dependency relations • Learning algorithm • Input dependency relations • Select the set of documents • Define association rules • Determine confidence for the rules • Output association rules exceeding the user-defined confidence

  45. OL: problems • Lightweight ontology contains too many noisy data • Word sense problem generates lots of ambiguity • Refinement of the lightweight ontologies is a trickle issue (need future work) • Relationship learning is not trivial

  46. ECAI 2000 • Ontology Learning Workshop of ECAI 2000 (European Conference on Artificial Intelligence) • Description • Use NLP techniques • Extract important (high frequency) words or phrases to define concepts • Use general top-level ontology (WordNet, SENSUS) to assist disambiguation • Problem: relation extraction

  47. Inductive Logic Programming • WOLFIE (WOrd Learning From Interpreted Examples) at Machine Learning Group in University of Texas at Austin • Description • Learn semantic lexicon from a corpus of sentences • Learned lexicon • Consist of words with meaning • Allow synonym and ploysymy • Ultimate goal: learn to parse novel sentences into their meaning representations • Have the potential to be a workbench for ontological concept extraction and relation detection • Problem: how to deploy their methods for ontology concept and rule learning to make the workbench work

  48. Library Science and Ontology • Digital Library + Semantic Web • Digital libraries use various forms of vocabularies instead of formal ontologies • Kwasnik (1999) convert a controlled vocabulary scheme into an ontology • Higher levels of conception of descriptive vocabulary • Deeper semantics for class/subclass and cross-class relationships • Ability to express concepts and relationship in a description language • Reusable and sharable of the ontological constructs • Strong inference and reasoning functions • Problems • Different ways of modeling knowledge (shallow or deeper semantics) • Different ways of representing knowledge (lexical-flavored or mathematical and logical-flavored) • To merge or create a common standard for the two fields will be a long way

  49. Others • Borgo 1997 • Use lexical semantic graphs to create ontology • Based on WordNet • Yamaguchi 1999 • Construct domain ontologies • Based on a machine-readable dictionary • Kashyap 1999 • Construct ontology for IR • Based on database schema

  50. Ontology Learning(Research Location Index) [34] • Europe • France (7) • Germany (5) • Spain (3) • Others: Italy (2), Austria, Greece, Netherlands, Portugal, Switzerland, UK • *European Union (2): • OntoWeb: University of Karlsruhe • On-To-Knowledge: many countries • USA • Stanford (2) • Austin (2): UT, MCC • Dallas (2): UT, Southern Methodist University • Other: UC Berkeley, Mississippi State University, BYU, UW • Others • Australia, Canada, Israel, Japan, Taiwan (China)

More Related