1 / 92

Ontology-based Data Integration & Modeling

Ontology-based Data Integration & Modeling. Outline. Introduction Applications Challenges Ontologies Knowledge representation Case Studies. Outline. Introduction Applications Challenges Solutions Ontologies Knowledge representation Case Studies.

cameo
Download Presentation

Ontology-based Data Integration & Modeling

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ontology-based Data Integration & Modeling

  2. Outline • Introduction • Applications • Challenges • Ontologies • Knowledge representation • Case Studies

  3. Outline • Introduction • Applications • Challenges • Solutions • Ontologies • Knowledge representation • Case Studies

  4. Integration of Geospatial Data Connecting users • Applications require integration of information from • multiple disciplines • Geospatial information is inherently distributed • Users count on one another for portions of the data

  5. Information integration Decision making involves deriving information from diverse heterogeneous data sets. This requires • Knowledge of domain for information/knowledge extraction from raw data • Knowledge of geospatial data processing/formats • Computer hardware and software resources As a result, the use of geospatial data is currently laborious and expensive

  6. Global Spatial Data Infrastructure (GSDI)

  7. National Spatial Data Infrastructure (NSDI)

  8. National Resource Database (NRDB)-DOS initiative

  9. Huge geospatial data has been created under different projects by different organizations over last 20 years. • Each project has a different purpose and is carried out for different areas. • Different projects utilize different vocabularies, taxonomies, content standards. E.g., NNRMS, NRIS, NRDB, NRDMS (National Projects, India). • This includes data on several themes at different scales. • Data is available in different formats.

  10. Products Current data and information systems provide on-line ordering and access of data / standard products but not user-specific information and knowledge. • If a product that the user wants does not exist, the system has no way to create it on demand. • Rich in geospatial data but poor in up-to-date geospatial information and knowledge.

  11. Outline • Introduction • Applications • Challenges • Ontologies • Knowledge representation • Case Studies

  12. Data and Knowledge • Knowledge vs. information vs. data • Data is observable bits and bytes • Information is related data • Knowledge enables decisions • Data  Knowledge • Make the data accessible … in an extensible way • Make the data “smart” … data relations and constraints • Present the data in the terms of the user • Deck of Cards as Example • Cards are data • Sequences / same color or suit are information • Which game and rules are knowledge

  13. Data Integration Efficient and maximal use of data by integrating data across disparate databases while retaining the integrity and essence of the source data

  14. Interoperability Resolving interoperability issues like : • Syntactic heterogeneity • Schematic heterogeneity • Semantic heterogeneity • Systemic heterogeneity

  15. Policy Enforced Interoperable Communities of Interest Semantic Broker Info Region (PACOM) Community 2 Community 3 Community 1

  16. Users - Data Providers Data creators can get the following advantages: • Provide smart querying • Better services to user with available data • Efficient use of available data • Quality control – Verification, consistency, relationships • Data correctness and interpretation • Data Reuse • Long-term data preservation

  17. Users - Data Consumers Data may be distributed and heterogeneous. However user gets the following advantages: • Single uniform interface • Consistent visualization • Smart Query • Integrated analysis & modeling • Use of data without the need for knowledge of schemas, vocabularies, formats etc. • Better decision making

  18. Better decision making Decision making can be improved by integrated analysis and modeling of geo-spatial data of multiple themes for many applications: • Sustainable development • Disaster management • Determining risk zones for hazards • Urban planning • Ground water prospecting Ground water prospect map prepared by integrating the lithological, structural, geomorphological and hydrological information provide better understanding of ground water regime than the conventional hydrogeological map.

  19. Application Integration User Query & results (Reasoning/Inference) Engine Data Sources Domain ontologies Task specific ontologies Data ontologies

  20. Solve complex geospatial problems using distributed geospatial services: The Geo-object and Geo-tree Concepts Without service With service Modeling and virtual data services User request User received Archived data (geo-object) User (requested) geo-object Intermediate geo-object High level services (classification, modeling, etc) Data transformation services (format transformation, reprojection etc) Data access services

  21. Integrating different classifications

  22. Information Integration DEM Spatial Data

  23. Uniform interface

  24. End result of Integration Text Table HTML

  25. Uses of semantic data integration • Systems that interoperate (even in the face of change) • Compatible data, data models, services, and applications. • Collaboration • Cross-domain information sharing • Information sharing capabilities that allow effective information exchange across multiple communities of interest. • Mining complex data • Data mining of distributed, complex scientific data, including exploratory analysis and visualization • Data fusion • Long-term Data Preservation • Developing tools to make data relevant for longer periods of time

  26. Outline • Introduction • Applications • Challenges • Ontologies • Knowledge representation • Case Studies

  27. Geospatial Data Repository SOI OTHER RRSSC FGDC NNRMS OGC Shapefile NRIS Multiple Sources Multiple Themes & Scales NRSA Distributed Data repository Multiple Standards Multiple Formats

  28. National Projects Projects are carried out at national /regional and local level through different DOS centres • National (Natural) Resources Information System (NRIS) • Database for 30 districts in 17 states with few case studies relevant to these nodes • Rajiv Gandhi Drinking Water Mission (RGNDWM) • Nationwide Groundwater Prospects Mapping at 1:50,000 scale • Generation of digital database pertaining to ground water regime such as lithology, geomorphology, geological structures and hydrology • Integrated Resources Information System for Desert Areas (IRIS-DA) • Preparation of land and water resource management and utilization plans to aid State and district / block level officials in planning development works • 83 desert/drought-prone blocks in 18 districts of 4 states (total 76,527 sq. km) • National Agriculture Technology Project (NATP) • Development of regional scale watershed plans and improving crop productivity • 8 watersheds in different states • National Resources Census • Covers natural resources like land, water, soils, forests etc. • Conducted at a repeat cycle, will depict changes and modifications to provide a snap-shot of the country’s natural resources.

  29. Data integration is a challenging task due to differences in • Vocabulary • Schemas • Data formats • Scale of data • Classification • Representations

  30. Types of heterogeneity Data from multiple sources are characterized by multiple types of heterogeneity : • Syntactic Heterogeneity: is a result of differences in representation format of data • Schematic Heterogeneity: Different classification schemes in spatial data and schemas in structured data and databases • Semantic Heterogeneity: differences in interpretation of the 'meaning' of data are source of semantic heterogeneity • System Heterogeneity: use of different operating system, hardware platforms

  31. Syntactic & Semantic heterogeity

  32. Schematic heterogeneity NRIS RGNDWM Part of classification schemes of two projects

  33. Semantic heterogeneity • Semantically equivalent • Igneous extrusive rocks and volcanic rocks • Semantically unrelated • Disaster (floods etc.) and disaster in sports • Semantically related • One system classifies “person” as “male” and “female” and other system as “student” and “professor”

  34. Heterogeneity of classifications, attribute names and values

  35. Outline • Introduction • Applications • Challenges • Solutions • Ontologies • Knowledge representation • Case Studies

  36. Standardization • Consistency must be maintained within an organization, at least within a project • Vocabulary must be defined in unambiguous terms – dense forest, large hill • Schemas must be standardized. • Entity representations must be specified explicitly • Use a common data model for storage, like XML • Proper and complete metadata for each dataset

  37. Build an intelligent system • Store all assumptions , relations and other implicit information explicitly • Store all relevant information in a machine-processable format • Build scalable solutions • Verify consistency

  38. What is an Ontology? • An Ontology is an explicit description of a domain OR ‘a specification of a conceptualization’ • Concepts • Properties and attributes of concepts • Constraints on properties and attributes • Individuals (often, but not always) • An ontology can link to other ontologies to form larger ontologies

  39. Evolution • More sophisticated semantic technologies exploit ontologies and • Provide scalability and flexibility • Handle all types of data (unstructured, semi-structured, structured) • Create SmartData – enhancing raw data with context and relationships • Accommodate SmartQuerying – flexible, intelligent querying • Enable powerful enterprise decision making

  40. The Role of Ontologies • Ontologies enable • unambiguous identification of entities in heterogeneous information systems • assertion of applicable named relationships that connect these entities together. Ontologies define - a common vocabulary - shared understanding

  41. Role of Ontologies Specifically, ontologies play the following roles: • Content Explication • Enables accurate interpretation of data from multiple sources through explicit definition of terms and relationships. • Query Model • In some systems, the query is formulated using the ontology as a global query schema. • Verification • Verifies mappings used to integrate data from multiple sources. These mappings may either be user specified or generated by a system.

  42. Ontology-driven Information System Lifecycle • Building a scalable and high performance system with support for: • Ontology creation and maintenance • Ontology-driven Semantic Metadata Extraction • Utilizing semantic metadata and ontology • Semantic search/querying/browsing • Information and application integration • Analysis/Mining/Discovery – relationships Schema Creation Ontology API Analytic Application Creation Ontology Population MB KB Metadata Semantic Server

  43. Architecture There are three main architectures for ontology based data integration : • Single Ontology approach - • A single ontology is used as a global reference model in the system. • Simplest but difficult to scale. • Multiple Ontologies - • Each individual data source is modeled by an ontology. • Multiple ontologies are used in combination for integration. • Flexible but requires creation of mappings between the multiple ontologies. • Hybrid approaches – • Involves use of multiple ontologies that subscribe to a common, top-level vocabulary which defines the basic terms of the domain. • Makes it easier to use multiple ontologies for integration in presence of the common vocabulary

  44. Types of Ontologies • Upper ontologies: modeling of time, space, process, etc • General purpose ontology/nomenclatures: WordNet • Domain-specific or Industry specific ontologies • News: politics, sports, business, entertainment • Financial Market (C) • Terrorism (L/G) • Biology • Application Specific and Task specific ontologies • Risk/Anti-money laundering (C), Equity Research (C), Repertoire Management (C) • P= Public, G=Government, L=Limited Availability, C=Commercial

  45. Outline • Introduction • Applications • Challenges • Solutions • Ontologies • Knowledge representation • Case Studies

  46. Understand completely • Understand the domain from domain experts • Many assumptions are implicit • Define terms and vocabulary unambiguously • Understand the data • Schemas • Assumptions • Representations • Understand the task • Scope must be clear • Process must be defined clearly • All links / mappings must be established • Test with use cases and scenarios

  47. Web Ontology Language (OWL) There are three sub-languages of OWL that are distinguished by their expressiveness. • OWL-Lite • simple class hierarchy • simple constraints • OWL-DL • more expressive • amenable to automated reasoning / inference • OWL-Full • most expressive • complex

  48. Components of OWL • Individuals – instances of classes • Properties – binary relations on individuals - functional, transitive, symmetric - partitions • Classes - interpreted as sets that contain individuals and described using formal descriptions that state precisely the requirements for membership of the class. - hierarchy, disjoint - restrictions

  49. Outline • Introduction • Applications • Challenges • Solutions • Ontologies • Knowledge representation • Case Studies

  50. Spatial Data Integration Required for effective use of a repository of multiple datasets created under different projects from heterogeneous sources Scope and purpose of project: Building an ontology driven information system for a data repository with the following functions: • Discovery • Query • Translation

More Related