200 likes | 375 Views
Approaches to Geospatial Database Integration. Steffen Volz GIS Research Group Institute for Photogrammetry University of Stuttgart Dagstuhl Seminar, March 8 th 2006. Geospatial Data Infrastructures. Applications. Geospatial Data Infrastructure Federated Open
E N D
Approaches to Geospatial Database Integration Steffen Volz GIS Research Group Institute for Photogrammetry University of Stuttgart Dagstuhl Seminar, March 8th 2006
Geospatial Data Infrastructures Applications • Geospatial Data Infrastructure • Federated • Open • http://www.nexus.uni-stuttgart.de Many different Geospatial Databases
Levels of Integration Ontology A Ontology B Intensional level Schema level local schema B1 local schema A1 data B12 data A11 Data/Extensional level
Integration Process • Same structure of the integration processes on schema and data level: • Identify corresponding elements • Merge them into one consolidated result
Schema Level Integration • Schema Matching and Integration: Schema Matching local schema B local schema A Schema Integration global schema of integration platform
Street ATKIS GDF Road Road Street Street Road Schema Correspondences Data-Driven Schema-Matching • Basic Idea: Derive correspondences between different conceptual schemas by analyzing the relations between corresponding data representations local schema A GDF local schema B ATKIS Matches
ATKIS GDF 94.86% Road Element Street 81.98% 100% 13.85% 14.65% 60.75% 1.99% Way Road 15.88% 73.28% 2.14% 3.15% 86.15% Lane Intersection 12.07% 39.25% Results of Schema Matching
Set Relations between Object Classes • Semantic relationships between object classes: • Equivalence (ClassATKIS ClassGDF): not found • Inclusion (ClassATKIS / ClassGDF): WayRoad Element • Overlap (ClassATKIS ClassGDF): e.g. LaneRoad • Difference (ClassATKIS ClassGDF): e.g. WayRoad
Schema Integration Approach • Schema integration according to the Upward Inheritance principle (see CONRAD 02) Schema correspondences: Integrated Schema: ClassATKIS/ ClassGDF ClassATKISClassGDF ClassATKIS ClassATKISClassGDF ClassGDF ClassATKISClassGDF ClassATKISClassGDF or ClassATKISClassGDF ClassATKIS ClassGDF
Conflated Result Query Area Q Data matching Data matching Instance Level Integration • Data Matching and Integration: Data set A Data set B
Instance Matching • Matching: Identification of corresponding objects based on • Object properties: • Geometry • Thematics • Object relations • Topological relations • Semantic relations • Different similarity measures are added by a weighted sum and normalized onto an interval between 0 (no similarity) and 100 (maximum similarity)
gn1 g1 gn2 g2 gn3 a2 a1 an1 an3 an2 Pre-Processing Step • Topological Splitting • Usually, many n:m matches between street data • Computationally expensive • Idea: mutual splitting of objects to achieve as many as possible 1:1 matches
Pre-Processing Step • Topological Splitting • Usually, many n:m matches between street data • Computationally expensive • Idea: mutual splitting of objects to achieve as many as possible 1:1 matches gn1 g1 gn2‘ gn2 g2‘ gn3 g4 a1 a3 a2 an1 an3 an2 an1‘
Iterative Instance Matching Approach • Multiple iterations of the algorithm Stage 1: seed nodes Stage 2: after 2 iterations Stage 3: result
Matches Algorithms on Matched Data • Shortest path analysis in MRep street databases GDF network Start End ATKIS network
Start A 3 12 B C MRep nodes J 8 7 5 5 10 6 F D E Transition edges End G H K 12 8 4 I Shortest Path Search in MRep Data Sets • Graphs in multiple representations Graph A CTE = (Simmax - SimTE) * Graph B
Instance Level Integration • Data Matching and Integration: ATKIS origin GDF origin
Instance Level Integration • Data Matching
Instance Level Integration • Data Merging (Conflation)
Open/Future Issues • Schema Level • How can we increase the quality of schema level integration (by using more than extensional information)? • How do we measure/express the information loss during Schema Mapping? How can we express the quality of Schema Mapping? • Data Level • How do we measure the inconsistency of features without using thresholds? • How do we express the quality of the conflation process? • How can we use semantic aspects in the matching process? • How do we deal with inconsistencies between different layers?