580 likes | 773 Views
Экспансия онтологий: онтологии в информационных системах . Калиниченко Л.А. (ИПИ РАН) Симпозиум «Онтологическое моделирование», КФУ, 11-12 Октября 2010 г. План рассмотрения. Онтологи и как концептуальн ые схемы
E N D
Экспансия онтологий: онтологии в информационных системах Калиниченко Л.А.(ИПИ РАН) Симпозиум «Онтологическое моделирование», КФУ, 11-12 Октября 2010 г.
План рассмотрения • Онтологии как концептуальные схемы • Развитие языков на дескриптивных логиках в контексте БД и ИС (до появления стека W3C) • Стеки языков W3C • OWL QL и DL-Lite Family • Ontology based data access (OBDA) systems • Интеграция баз данных на основе онтологий
Экспансия онтологий в контекст ИС и БД Наблюдается активная экспансия онтологий в область БД и ИС: • Исследуются возможности использования онтологических языков, основанных на дескриптивных логиках, в ИС и БД • В частности, возможности использования дескриптивных логик в качестве языков концептуального моделирования, а также как языков определения данных • Онтологии применяются как схемы баз данных и концептуальные схемы • Обсуждаются архитектурные решения систем доступа к данным на основе онтологий • Экспансия онтологического подхода в сторону ИС и БД подкрепляется соответствующими определениями Анализ содержания выполняемых работ, их уровня, новизны и влияния на область БД и ИС – цель доклада
Эволюция понятия «онтология» In the early 1990's, an effort to create interoperability standards identified the ontology as a standard component of knowledge systems . According to Gruber, an ontology is a specification of a conceptualization, i.e., a formal description of the concepts and their relations for a „universe of discourse“. Gruber 2008: In the context of computer and information sciences, an ontology defines a set of representational primitives with which to model a domain of knowledge or discourse. The representational primitives are typically classes (or sets), attributes (or properties), and relationships (or relations among class members). In the context of database systems, ontology can be viewed as a level of abstraction of data models, analogous to hierarchical and relational models, but intended for modeling knowledge about individuals, their attributes, and their relationships to other individuals. Ontologies are said to be at the "semantic" level, whereas database schema are models of data at the "logical" or "physical" level. Due to their independence from lower level data models, ontologies are used for integrating heterogeneous databases, enabling interoperability among disparate systems, and specifying interfaces to independent, knowledge-based services.
Онтологии: уровни • Ontology is a representation scheme that describes a formal conceptualization of a domain of interest (D.Calvanese) • The specification of an ontology usually comprises two distinct levels: • Intensionallevel: specifies a set of conceptual elements and of rules to describe the conceptual structures of the domain (compare with IDB in deductive DB). • Extensional level: specifies a set of instances of the conceptual elements described at the intensional level (compare with EDB in deductive DB).. • Note: an ontology may specify also a meta-level, which defines a set of modeling categories of which the conceptual elements are instances.
Интенсиональный уровень • An ontology language for expressing the intensional level usually includes: • Concepts (vs. entity types in IS) • Properties of concepts • Relationships between concepts, and their properties • Axioms • Queries • Ontologies are typically rendered as diagrams (e.g., Semantic Networks, Entity-Relationship schemas, UML Class Diagrams).
Онтологии, представляемые как диаграммы классов UML • Понятия (классы) именуемые также типами сущностей, фреймами • Свойства, именуемые атрибутами, слотами, свойствами данных • Связи, именуемые ассоциациями, типами связей, атрибутами объектов, ролями
Экстенсиональный уровень At the extensional level we have individuals and facts: • An instance represents an individual (or object) in the extension of aconcept.e.g., domenico is an instance of Employee • A fact represents a relationship holde.g., worksFor (domenico, tones)
Сопоставление с другими языками • Ontology languages vs. knowledge representation languages: • Ontologies are knowledge representation schemas. • Ontology vs. logic: • Logic is the tool for assigning semantics to ontology languages. • Ontology languages vs. conceptual data models: • Conceptual schemas are special ontologies, suited for conceptualizing asingle logical model (database). • Ontology languages vs. programming languages: • Class definitions are special ontologies, suited for conceptualizing a singlestructure for computation.
Классы онтологических языков • Graph-based • Semantic networks • Conceptual graphs • UML class diagrams, Entity-Relationship schemas • Frame Based • Frame Systems • OKBC, XOL • Logic based • Description Logics (e.g., SHOIQ, DLR, DL-Lite, OWL, . . . ) • Rules (e.g., RuleML, LP/Prolog, F-Logic) • First Order Logic (e.g., KIF) • Non-classical logics (e.g., non-monotonic, probabilistic)
Horrocks: OBIS Ontology-based Information Systems – View of data that is independent of logical/physical schema – Queries use terms familiar to users – Answers reflect schema & data, e.g.: “Patients suffering from Vascular Disease” – Query expansion/navigation/refinement – Incomplete and semi-structured data – Integration of heterogeneous sources
Что нового ? • Создается впечатление, что онтологии открывают дорогу новым направлениям после десятилетий развития и исследований в области концептуальных моделей данных, систем интеграции неоднородных баз данных, семантической интероперабельности, дедуктивных баз данных • Необходимый теоретический фундамент и конкретные высокоуровневые модели уже существуют • Эти результаты теперь как бы открываются заново и объявляются достижениями исследований в области онтологий • В действительности же речь идет об исследованиях возможностей онтологических языков для концептуального моделирования ИС и БД • Базы данных, их схемы, информационные системы, концептуальные схемы не перестают оставаться таковыми при использовании тех или иных языковых средств • Нужно отбросить терминологическую шелуху и отфильтровать то новое, что удалось привнести а теорию и практику БД и ИС за счет онтологических средств
Что нового ? • Онтологические модели, рассматриваемые в публикациях, посвященных их использованию в системах БД и ИС, основаны на логике предикатов первого порядка, чаще всего на ее подмножествах – дескриптивных логиках. • По существу, в контексте БД и ИС онтологические языки (в частности, языки на дескриптивных логиках) играют роль [концептуальных] моделей данных, и не более • Таким образом, следует сосредоточиться на изучении особенностей языков на дескриптивных логиках и новизны, привносимой ими в контекст БД и ИС . Что могут дать модели данных на дескриптивных логиках в сравнении с реляционными, объектными и другими моделями данных ?
Концептуальное моделирование • Концептуальное моделирование реализует абстрактное, семантическое моделирование предметной области (определение классов объектов предметной области, их взаимосвязей, ограничений) , независящее от реализации, и служащее в качестве средства порождения эталонной спецификации, отражающей консенсус в сообществе, включающем разработчиков, пользователей ИС, и, собственно, самих ИС • Концептуальные схемы применяются также в качестве глобальных схем при интеграции информационных ресурсов (баз данных) • КС используются в процессе проектирования ИС и в процессе исполнения
Концептуальное моделирование (2) • Концептуальная схема описывает структуру предметной области, тогда как онтология ориентирована главным образом на определения используемых в предметной области понятий • Концептуальные схемы базы данных, помимо описания классов объектов предметной области и ограничений, содержат описания поведения объектов (методов, функций, процессов), чего онтологии не содержат (пока). • Онтологическая модель предметной области задает определения понятий, которыми аннотируются соответствующие имена определений концептуальной схемы – вот пример того, где онтологии могли бы проявить себя оригинальным образом
Развитие языков на дескриптивных логиках в контексте БД и ИС до появления стека W3C
Гуарино, 1998 • Рассмотрена идея создания информационных систем, движимых онтологиями. • В общих терминах обсуждаются возможные подходы использования онтологий в процессе проектирования ИС и в процессе работы (run time). • По-существу, излагаются хорошо известные концепции семантически интероперабельных систем и систем интеграции информационных ресурсов • В рассматриваемых сценариях онтологии играют роль спецификаций, не зависящих от ресурсов • Эта статья явилась переизложением известных подходов в терминах онтологий.
Дескриптивные логики этого периода • CLASSIC [Borgida, 1989] used by differentsystems including OBSERVER [1996] • GRAIL [1997], LOOM [1991] , SIMS [1996]), OIL [2000] used for terminology integration • These languages are capable of almost all common concept forming operators. An exception is CLASSIC that does not allow the use of disjunction and negation in concept definitions • OIL can also be used to define instances • LOOM provides reasoning support for A- and T-Box but it cannot guarantee soundness and completeness • Terminological axioms that seem to be important are equality and disjointness.
Дескриптивные логикис правилами • CARIN,[1999] - description logic extended with function-free Horn rules (used in DWQ project by Calvanese) • AL−log [1998] a combination of a simple description logics with Datalog • DLR a description logic with n-ary relations used by Calvanese for information integration [2001] • The integration of description logics with rule-based reasoning makes it necessary to restrict the expressive power of the terminological part of the language in order to remain decidable
Classical frame-based/logic based languages • Ontolingua [1993] • OKBC [1998] • F-Logic [1995] used in Ontobroker [1998] and COIN [1997]). • These languages provide common elements for the definition of concepts and relations, such as typing, default values and cardinalities. Further, compared to the description logic languages the used FOL- and frame-based languages have a larger variety of options for capturing terminological knowledge. This is mainly a result of the possibility to define first-order axioms in ontology specifications.
SSA ontologically related languages Languages in a stack directly related to ontology specification • RDF is a simple language for expressing data, which refer to objects ("resources") and their relationships. An RDF-based model can be represented in XML syntax. • RDF Schema extends RDF and is a vocabulary for describing properties and classes of RDF-based resources, with semantics for generalized-hierarchies of such properties and classes. • OWL adds more vocabulary for describing properties and classes: among others, relations between classes (e.g. disjointness), cardinality (e.g. "exactly one"), equality, richer typing of properties, characteristics of properties (e.g. symmetry), and enumerated classes.
Почему не SSA • It is a vain hope that a single upward-compatible language, developed in the Semantic Web's infancy, will suffice for all the future semantic developments on the Web • Еvery technology, including language design, eventually becomes obsolete and no technology can address all problems • More realistic architecture must allow multiple technological stacks to exist side-by-side • E.g., SWRL is a technology, which extends OWL-DL. However, rule-based technology (not description logic based) is mature with decades of theoretical development and practical and commercial use (logic programming and nonmonotonic reasoning (LPNMR))
DLP layer • Description Logic Programs (DLP) layer is a set of all statements in Description Logic that are translatable into Horn rules (FOL) • This layer at least should be assumed to assure upward compatibility: the semantics of DLP in the OWL stack and in the rules stack are the same • Logic languages that are based on pure first-order logic (OWL) do not support constraints and have no notion of their violation. Instead, they provide restrictions (statements about the desired state of the world) • They produce inferences: if a person is said to have at most one spouse and the knowledge base records that John has two, Mary and Ann, then OWL would conclude that Mary and Ann is the same person. In contrast, a rule base with nonmonotonic semantics will view such a data base as inconsistent.
Interoperability in MSA through language extension: problems • It is claimed that it is incorrect to say that Datalog is an extension of the DLP layer because, given a single fact, such as knows(pat,jo), DLP and Datalog give different answers to the question of whether pat knows exactly one person. • Under the OWL semantics the answer will be “unknown" since it is not possible to either prove or disprove that pat knows exactly one person; under the rules semantics the answer will be “yes.“ • Both answers are right! A user who chooses to write an application using the rules stack does so because of a desire to use the language and semantics of that stack. Otherwise, a competent user should choose OWL and SWRL.
Databases and MSA • MSA is extensible and additional stacks can be added to it as long as they can be made interoperoperable (e.g., OWL QL, OWL RL) • Each layer is a syntactic and semantic extension of the previous layer • Application of description logic of W3C in DB & IS • Straightforward: OWL+RDFS, RDF • As an extension above large databases (probably, relational)e.g., OWL QL • However, it seems that OWL cannot be considered as a pure extension of the relational layer at least due to the difference between constraints and restrictions mentioned above. Would be good to analyze OWL QL in this respect
DL-Lite Family • DL-Lite objectives: to capture basic ontology languages, while keeping low complexity of reasoning (reasoning includes also answering unions of conjunctive queries over the instance level (Abox)) • DL-Lite reasoning tasks are polynomial in the size of the TBox, and query answering is LogSpace in the size of the Abox • DL-Lite allows for a separation between TBox and ABox reasoning during query evaluation: the part of the process requiring TBox reasoning is independent of the ABox, and the part of the process requiring access to the ABox can be carried out by an SQL engine • The logics of the DL-Lite family are the maximal DLs supporting efficient query answering over large amounts of instances.
DL-Lite family formation • DL-Litecore allows for expressing ISA assertions on concepts, disjointness between concepts, role-typing, participation constraints • DL-LiteF adds to the core the possibility of expressing functionality restrictions on roles • DL-LiteR adds ISA and disjointness assertions between roles. OWL 2 QL is based on DL-LiteR • D-LiteA adds possibility of using together role inclusion assertions and functionality assertions, and so on • Very simple DLs like DL-LiteR are suitable for support of basic ontology languages, conceptual data models (e.g., Entity-Relationship, and object-oriented formalisms such as UML class diagrams)
OWL axioms supported by DL-LiteR • subclass axioms (SubClassOf) • class expression equivalence (EquivalentClasses) • class expression disjointness (DisjointClasses) • inverse object properties (InverseObjectProperties) • property inclusion (SubObjectPropertyOf not involving property chains and SubDataPropertyOf) • property equivalence (EquivalentObjectProperties and EquivalentDataProperties) • property domain (ObjectPropertyDomain and DataPropertyDomain) • property range (ObjectPropertyRange and DataPropertyRange) • disjoint properties (DisjointObjectProperties and DisjointDataProperties) • symmetric properties (SymmetricObjectProperty) • assertions other than the equality assertions (DifferentIndividuals, ClassAssertion, ObjectPropertyAssertion, and DataPropertyAssertion)
OWL axioms not supported by DL-LiteR (1) • existential quantification to a class expression or a data range (ObjectSomeValuesFrom and DataSomeValuesFrom in the subclass position) • self-restriction (ObjectExistsSelf) • existential quantification to an individual or a literal (ObjectHasValue, DataHasValue) • nominals (ObjectOneOf, DataOneOf) • universal quantification to a class expression or a data range (ObjectAllValuesFrom, DataAllValuesFrom) • cardinality restrictions (ObjectMaxCardinality, ObjectMinCardinality, ObjectExactCardinality, DataMaxCardinality, DataMinCardinality, DataExactCardinality) • disjunction (ObjectUnionOf, DisjointUnion)
OWL axioms not supported by DL-LiteR (2) • property inclusions (SubObjectPropertyOf involving property chains) • functional and inverse-functional properties (FunctionalObjectProperty, InverseFunctionalObjectProperty, and FunctionalDataProperty) • transitive properties (TransitiveObjectProperty) • reflexive properties (ReflexiveObjectProperty) • irreflexive properties (IrreflexiveObjectProperty) • asymmetric properties (AsymmetricObjectProperty) • keys (HasKey)
Онтологии в ядре информационных систем Использование информационных ресурсов на основе концептуализации предметных областей. Доступ к данным, опосредованный онтологией (концептуальным взглядом на данные)
Linking data to DL-LiteR • Relational databases store data, whereas instances of concepts are objects, each object should be denoted by an ad hoc identifier (impedance mismatch) • This idea traces back to the work done in deductive object-oriented databases providing Skolem functions taking values as arguments and returning OID • Mapping of relational schemas into the concept definitions is straightforward • Through a mapping we associate a conjunctive query over atomic concepts, domains, roles, attributes, and role attributes (generically referred to as predicates in the following) with a first-order (more precisely, SQL) query of the appropriate arity over the database. • Formally, a mapping assertion is an assertion of the form: φ ⤳ ψ, where φ is an arbitrary SQL query of arity n > 0 over DB, and ψ is a UCQ over T of arity n′ > 0 without non-distinguished variables
QA applying DL-LiteR • An interpretation I is a model of T that should satisfy all mapping assertions in M wrt DB. Mapping assertions, denoted with (T.M,DB) , where DB is a database as defined above, T is a DL-LiteR TBox, and M a set of mapping assertions between DB and T . • The ontology conveys only incomplete information about the domain of interest, and we want to guarantee that the answers to a query that we obtain are certain, independently of how we complete this incomplete information. • For QA we split each mapping assertion φ ⤳ ψ into several assertions of the form φ ⤳ p, one for each atom p in ψ • We unify the atoms in the query q to be evaluated with the right-hand side atoms of the mappings, thus obtaining a UCQ • Then, we unfold each atom with the corresponding left-hand side mapping query. Observe that, after unfolding, we obtain an SQL query.
Система QuOnto • QuOnto is a tool for representing and reasoning over ontologies of the DL-Lite family. The basic functionalities it offers are: • Ontology satisfiability check • Intensional reasoning services: concept/property subsumption and disjunction, concept/property satisfiability • Query Answering of UCQs • Reasoning services are optimized • Can be used with internal and external DBMS (include drivers for Oracle, DB2, IBM Information Integrator, SQL Server, MySQL, etc.) • Implemented in Java ; APIs are available for selected projects upon request.
Интеграция баз данных на основе онтологий
Решение на основе DL-Lite • The (federated) source database is external and independent from the conceptual view (the ontology). • Mappings relate information in the sources to the ontology; sort of virtual Abox GAV is used, mappings are such that the result of an (arbitrary) SQL query on the source database is considered a (partial) extension of a concept/role. • The distinction between objects and values in DL-LiteA are resolved to deal with the impedance mismatch problem
MASTRO-I: the OBDI system MASTRO-I is based on the system QuONTO, a reasoner for DL-LiteA, and is coupled with a data federation tool: • the global schema is expressed in terms of a TBox of DL-LiteA, • mapping language allows for expressing GAV sound mappings between the sources (the source schema in MASTRO-I is assumed to be ONE flat relational database schema obtained from the federated DB) and the global schema • the mapping language has specific mechanisms for addressing the impedance mismatch problem (values in sources vs the instances of concepts in the ontology as objects) • answering unions of conjunctive queries can be done through a very efficient technique (LOGSPACE with respect to data complexity) which reduces this task to standard SQL query evaluation.