880 likes | 1.12k Views
Ontology. An introduction and overview. The concept of ontology. Ontology have something to do with meaning It is used within data bases and artificial intelligence. Often called “semantics” since it deals with meaning of lingusitic expressions These concepts are here used synonymously.
E N D
Ontology • An introduction and overview
The concept of ontology • Ontology have something to do with meaning • It is used within data bases and artificial intelligence. • Often called “semantics” since it deals with meaning of lingusitic expressions • These concepts are here used synonymously.
Sources • Main source for these slides: Dieter Fensel: Ontologies: Silver Bullet for Knowledge Management and Electronic Commerce • lnteresting articles about semantics and ontology: http://w3.msi.vxu.se/~per/IVC743/Semantics.html • There are also reports from some courses in Växjö dealing with various things about ontology
Definition (från AI) • In the simplest case, an ontology describes a hierarchy of concepts related by subsumption relationships; in more sophisticated cases, suitable axioms are added in order to express other relationships between concepts and to constrain their intended interpretation.
More definitions • Fensel: shared and common understanding of a domain that can be communicated between people and heterogeneous and widely spread application systems. • Fensel again: ontologies describe the static domain knowledge of a knowledge-based system.
More about ontologies • A language for defining ontologies is syntactically and semantically richer than common approaches for databases. • The information that is described by an ontology consists of semi-structured natural language texts and not tabular information. • An ontology must be a shared and consensual terminology because it is used for information sharing and exchange. • An ontology provides a domain theory and not the structure of a data container.
Schedule - syntax • Simple table structure • Data base schema
Schedule - instance • A row in the schedule • Is in fact a representation of a fact.
Schedule - Data Description • Is a meta-meta description in relation to the fact The number of the week, according to standard ISO 321-543-432-645.a The day expressed as weekday, number and month
Ontology for IVC743 • Name: Schedule at VXU • Purpose: Temporal relation between the following entities: Room, Person and Activity. • Temporal expression: Week, day and time in sep-oct • Room-domain: All lecture rooms at Växjö university with a capacity of at least 35 persons • Person-domain: PF, RL, Inge Andersson, Olle Dahlborg, Carina Hallqvist, Bertil Ekdahl, Anna Wingkvist, Gunnar Mosnik, Jonas Richardson • Activity-domain: {a description of what is going to be dealt with in each lecture}
Purpose • The ontology can be used in many cases, not only for our schedule • It says something about the content, not the form • If you see an instance with a value not belonging to the ontology, you know something is incorrect • If you are familiar with the ontology, no further explanations is needed in order to understand the meaning.
Course ontology • This generic ontology has the following syntax: • Name:<string> • Purpose: <Temporal/causal/result…> relation between <list of entities> • Temporal expression: <ddd dd mmm hh - hh> • Room-domain: <list of lecture rooms> • Person-domain: <list of teachers> • Activity-domain: <list of activities>
Initiatives • Resource Description Framework (RDF) • Semantic web • XML Schemes, standard for describing the structure and part of the semantics of data. • XSL, describing mappings between different presentation sheets.
Intranets and semantics • In a fast changing world knowledge becomes increasingly important • Maintaining and accessing knowledge (organisational memory) is thus important. • The knowledge is often weakly structured, stored in intranets and in different formats (picture, sound etc.) • Knowledge management, which turn information into useful knowledge is thus heavily needed. Knowledge = Content
Document management • Key-word based retrieval provides lots of irrelevant information out of context. • Extracting information requires human attention, both for extracting and integrating • Maintaining weakly structured sources is time-consuming
Semantic possibilities • Search for content, not key-words • Query answering instead of information retrieval • Correct exchange of structured or semi-structured information via for instance XSL. • Define view on documents or sets of documents, information fusion
Example: Shop-bots B2C-site B2C-site B2C-site B2C-site • Agents that find the best shopping opportunity. Shop-bot Wrapper
Problems with current shop-bots • A wrapper is needed for each place and type of bot. • No flexibility in retrieving the information • Information at the B2C-site must be provided in a structured form. • Usually this information is provided in natural language also which inevitably will cause inconsistency problems
Solution • Using various XML-techniques provides better possibilities for translation between the bot and the site. • However, they must share the same ontology. • An ontology describes the various products and can be used to navigate and search automatically for the required information.
Electronic commerce B2B • Standard techniques, such as EDIFACT cumbersome and error-prone and not integrated with other documents. • The XML-family of techniques can be used for describing syntax and semantics of data, but not for the business processes and for the products. • Standard ontologies in combination with XLS-based translation services is thus needed.
About ontologies • A brief introduction
Definition • An ontology is a formal, explicit specification of a shared conceptualisation. • A conceptualisation refers to an abstract model of some phenomenon in the world which identifies the relevant concepts of that phenomenon. • Explicit means that the type of concepts used and the constraints on their use are explicitly defined. • Formal refers to the fact that the ontology should be machine readable.
Role of an ontology • Facilitate the construction of a domain model by providing a vocabulary of terms and relations. • Still the problem of translating between different ontologies persist. • Also when you go outside the target for the ontology you are lost. The ontology might conserve a certain way of thinking.
Types of ontologies • Domain ontologies capture the knowledge valid for a particular type of domain • Metadata ontologies like Dublin Core provide a vocabulary for describing the content of on-line information sources (Libraries). • Generic or common sense ontologies aim at capturing general knowledge about the world, providing basic notions and concepts for things like time, space, state, event etc.
More types • Representational ontologies provide representational entities without stating what should be represented. A well-known representational ontology is the Frame Ontology which defines concepts such as frames, slots, and slot constraints allowing the expression of knowledge in an object-oriented or frame-based way. • Method and task ontologies provide a reasoning point of view on domain knowledge such as hypothesis, cause-effect statements etc.
Constructing ontologies • Prerequisite: ontologies are small modules with a high internal coherence and a limited amount of interaction between the modules. • Constructing a new ontology is a matter of assembling existing ones. • Inclusion • Restriction • Polymorphic refinement
Formal languages • Various kind of formal languages are used for representing ontologies, among others • Description logics • Frame Logic • First-order predicate logic extended with meta-capabilities to reason about relations.
The Sensus system • The basic idea is to use so-called seed elements which represent the most important domain concepts for identifying the relevant parts of a toplevel ontology. • The selected parts are then used as starting points for extending the ontology with further domain specific concepts.
Word-Net • An on-line lexical reference system. English nouns, verbs, adjectives and adverbs are organized into synonym sets, each representing one underlying lexical concept. Different semantic relations link the synonym sets. • WordNet contains around 100.000 word meanings organized in a taxonomy. • http://www.cogsci.princeton.edu/~wn/
Semantical relationships • Synonymy: Similarity in meaning of words. • Antonymy: Dichotomy in meaning of words • Hyponymy: Is-a relationship between concepts. This is-a hierarchy ensures the inheritance of properties from superconcepts to subconcepts. • Meronymy: Part-of relationship between concepts. • Morphological relations which are used to reduce word forms.
Features of Word-Net • Free of charge • Multilingual European version also exists (http://www.let.uva.nl/~ewn ) • Its large size (i.e., number of concepts) • Its domain-independence • Its low level of formalization • The definitions are vague and limits the possibility for automatic reasoning support
CYC http://www.cyc.com/ • Comes from AI • Humans decide based on their common sense knowledge what to learn and what not to learn from their observations. • CYC started as an approach to formalize this knowledge and provide it with a formal and executable semantics. • Hundreds of thousands of concepts have been formalized with millions of logical axioms, rules, and other assertions.
Features of CYC • The upper-level ontology of CYC with 3000 concepts has been made publicly available. • Most of the more specific concepts are kept secret • CYC groups concepts into microtheories to structure the overall ontology. They are a means to express context dependency i.e., what is right in one context may be wrong in another • CycL, a variant of predicate logic, is used as language for expressing these theories.
TOVE (TOronto Virtual Enterprise) • Task and domain-specific ontology. • The ontology supports enterprise integration, providing a shareable representation of knowledge in a generic, reusable data model • TOVE provides a reusable representation (i.e., ontology) of industrial concepts. • http://www.eil.utoronto.ca/tove/toveont.html
Characteristics • It provides a shared terminology for the enterprise that each agent can jointly understand and use • It defines the meaning of each term in precise and unambiguous manner as possible • It implements the semantics in a set of axioms that will enable TOVE to automatically deduce the answer to many “common sense” questions about the enterprise • It defines a symbology for depicting a term or the concept constructed thereof in a graphical context
(KA)2 – a case study on • Knowledge Annotation Initiative of Knowledge Acquisition Community • http://www.aifb.uni-karlsruhe.de/WBS/broker/KA2.html • The process of developing an ontology for a heterogeneous and world-wide (research) community • The use of the ontology for providing semantic access to on-line information sources of this community.
Example in (KA)2 • Class: research-topic • Attributes: • Name: <string> • Description: <text> • Approaches: <set-of keyword> • Research-groups: <set-of research-group> • Researchers: <set-of researcher> • Related-topics: <set-of research-topic> • Subtopics: <set-of research-topic> • Events: <set-of events> • Journals: <set-of journal> • Projects: <set-of project> • Application-areas: <text> • Products: <set-of product> • Bibliographies: <set-of HTML-link> • Mailing-lists: <set-of mailing-list> • Webpages: <set-of HTML-link> • International-funding-agencies: <funding-agency> • National-funding-agencies: <funding-agency> • Author-of-ontology: <set-of researcher> • Date-of-last-modification: <date>
Procedure • A lot of instances of the schema was developed and published on the home page • Examples: • specification languages • knowledge acquisition methodologies • agent-oriented approaches • knowledge acquisition from natural language • knowledge management
Knowledge management • An application
Knowledge management deals with • Acquiring • Maintaining • Accessing knowledge of an organization. • Here we will apply it to internet and concentrate on the last issue: Search for knowledge.
Search engines • They have typically three parts: A webcrawler for downloading, an indexer for finding key terms and a query interface that retrieves answers to the proposed questions. • They are all based on keywords. The indexing process of the web-pages is thus crucial for the retrieval.
Search domain • Consists of about 300 millions fix documents, but this is only about 20% of what is available in total. The rest (80%) is dynamically generated (example: Aftonbladet) • Altavista provides it all, Google sort according to documents pointing at the actual document and Yahoo uses human invention.
Dimensions in searching • Precision: how many retrieved documents are really relevant? • Recall: have I found all relevant information? • Time: for the humans to find the desired information among the retrieved. • Scattering:The information might be scattered over several pages with only implicit relations between them
Ontobroker • Define an ontology • Use it to annotate/structure/wrap your web documents • Somebody else can make use of Ontobroker’s advanced query and inference services to consult your knowledge. • To achieve this goal, Ontobroker provides three interleaved languages and two tools.