320 likes | 356 Views
This research focuses on semi-automatic ontology alignment for integrating geospatial data. Topics covered include motivation, mapping types, user interface, propagation rules, ontology merging, challenges, and conclusions. The application discussed is WLIS (Wisconsin Land Information System) and a case study on land use codes. The paper delves into issues of data heterogeneity and alignment, with a focus on semantic, schematic, and syntactic heterogeneities. The importance of ontology-based data integration, XML documents, and agreement documentation is highlighted. Concepts of ontology alignment, different mapping types, and the significance of full vs. partial mappings are discussed, along with propagation rules and user interfaces for alignment. The text also covers ontology merging implications and its role in guiding the merging process for global and local ontologies.
E N D
Semi-Automatic Ontology Alignment for Geospatial Data Integration* Isabel F. Cruz William G. Sunna Anjli Chaudhry Department of Computer Science University of Illinois at Chicago *This research was supported in part by the National Science Foundation under Awards EIA-0091489 and ITR IIS-0326284.
Overview • Motivation: application, data heterogeneity, data integration • Alignment • Mapping types • User interface • Semi-automatic alignment • Propagation rules • User interface • Ontology merging • Challenges and discussion • Conclusions and future work
Application • WLIS (Wisconsin LandInformation System): web-based system linking data from distributed, heterogeneous data sources • Case study: land use codes • Sample query: “Find all the agricultural lands in Dane and Racine counties.” • Different authorities use different land use coding systems leading to syntactic, schematic, and semantic heterogeneities
Heterogeneity “Find all the agricultural lands in Dane and Racine counties.” Parcel-based example Each highlighted parcel has its own land use classification code
Land Use Code Heterogeneity in WLIS Land Use Code Land Use Code Land Use Code Land Use Code There are 72 counties and hundreds of cities and towns in the state; each may have their own system of classifying Land Use codes
Racine County Dane County Commercial Commercial Retail Sales and Services Retail Sales Retail Services Land Under Development Intensive Nonintensive Classification: Semantic Issue
Land Use Codes Synonyms
Land Use Codes Synonyms Value heterogeneity
Ontology-Based Data Integration Application Query Ontology does not change (Local as View approach) Mediator Ontology Wrapper Wrapper Wrapper Local Ontology Local Ontology Local Ontology Source Source Source [Fonseca & Egenhofer 99]
Ontology-Based Data Integration Application Query Concepts in the Ontology and Local Ontology need to be aligned Mediator Ontology Wrapper Wrapper Wrapper Local Ontology Local Ontology Local Ontology Source Source Source [Fonseca & Egenhofer 99]
Agreement Document • XML document that act as a wrapper layer for the underlying local data source • Stores information about how entities in the global ontology map to the entities in the local data source • Uses XML to capture the hierarchical ordering of entities and their mappings • Supports query operations using XPath/XSLT to hide details of how data is structured in local data source • Minimizes need for programmer intervention and maintenance as it is declaratively specified
Ontology Alignment • Alignment is the process of mapping concepts from one ontology to concepts of another ontology • Concepts are mapped based on how “similar” they are to each other • Similarity takes different shapes: • Similarity in definition • For example,automobile and car have very similar definition in any given dictionary • Similarity in text • For example: agriculture and agricultural have the same prefix and have 4 letters in common
Mapping types • Exact: the connected vertices equivalent in meaning • Subset: the vertex in the global ontology is a subset of the vertex in the local ontology, i.e. less general in meaning. • Superset: the vertex in the global ontology is a superset of the vertex in the local ontology, i.e. more general in meaning • Approximate: the connected vertices are close in meaning (e.g., they intersect in some properties) but are not equivalent in definition. • Null: the vertex in the global ontology does not have an equivalent vertex in definition in the local ontology
Mapping Types Exact Industry Industry Exact Mining Manufacturing Production Mining Exact Rubber Construction Material Electrical Supplies Rubber and Glass Superset Subset
Agreement Maker • Visual interface for creating agreements • Existing mappings displayed to the user • Displayed list of mappings updated as user identifies more mappings
Semi-automatic Alignment • Framework that defines the values associated with the vertices of the ontology as functions of the: • values of the children vertices, or • user input • User (or system) establishes some mapping types • System propagates the mapping types along the ontologies (bottom-up) as much as possible
Full vs. Partial Mappings Superset a d b e c f Exact Superset Full Mapping
Full vs. Partial Mappings Subset a d b e c f g Exact Exact Partial Mapping
Ontology Merging • Alignment can be used to: • Determine the need for merging ontologies • Alignment values of Null in the mappings of the global ontology to a local ontology, thus leading to loss of resolution in the query process • Guide the merging process • Combining the local ontologies with the global ontology
Ontology Merging Approximate Commerce Commercial Sector User-defined Sales Services Non- Intensive Intensive NULL NULL Commerce Commercial sector Non- Intensive Intensive Function Scale Exact Exact Sales Services Non- Intensive Intensive NULL NULL
Ontology Merging Approximate Commerce Commercial Sector User-defined Sales Services Non- Intensive Intensive NULL NULL Commerce Commercial sector NULL Non- Intensive Intensive Function Scale User-defined Exact Exact Sales Services Non- Intensive Intensive NULL NULL
Ontology Merging Approximate Commerce Commercial Sector User-defined Sales Services Non- Intensive Intensive NULL NULL Superset Commerce Commercial sector System-defined NULL Non- Intensive Intensive Function Scale User-defined Exact Exact Sales Services Non- Intensive Intensive NULL NULL
Challenges in Semi-Automatic Alignment Superset User-defined Local Global Subset Residential Buildings Apartment Buildings System-defined One-family Residence One-family Residence Two-family Residence Multi-family Residence Exact Subset
Challenges in Semi-Automatic Alignment Superset • Specialization of the node Residential Buildingsin the global ontology is not total thus causing inconsistency • Solution: • Introduction of other children • Introduction of a virtual node User-defined Local Global Subset Residential Buildings Apartment Buildings System-defined One-family Residence One-family Residence Two-family Residence Multi-family Residence Exact Subset
Monotonic Mappings A F Superset Superset Subset Exact B G Subset Superset Superset Subset Superset Exact C H Superset Superset Superset Superset Superset D I E J
Discussion Local Global ??? Residential Buildings Superset Subset Apartment Buildings Apartment Buildings Virtual node One-family Residence One-family Residence Two-family Residence Multi-family Residence Exact Subset
Conclusions • Real-world GIS problems • Semantic heterogeneities among non-spatial data • Ontologies are hierarchies of concepts, with “atomic nodes” • Data integration • Ontology-based • Alignment: mappings between ontologies are declaratively specified • Semi-automatic alignment • Automatic propagation of mappings along the ontologies, following well-defined rules • Building block for a more comprehensive approach that will integrate other approaches, e.g., similarity • Identified some challenges to be addressed
Future Work • Explore layers of similarities between concepts and define new mapping types • Assign similarity scores that measures how close concepts match with each other • Use multiple dictionaries to further automate the mapping process by establishing mappings between concepts that are similar in meaning • Upgrade agreement maker to incorporate new ways of alignment process automation • Test with “large” ontologies: • Wetland ontologies • Rock classification systems • Integration with querying • Data integration across several themes