240 likes | 416 Views
Finding Spatial Equivalences Across Multiple RDF Datasets. Juan Salas, Andreas Harth. Outline. Motivation NeoGeo Vocabularies Geospatial Datasets Integration Challenges Finding Geometric E quivalences Conclusion. Motivation. Geodata is becoming increasingly relevant.
E N D
Finding Spatial Equivalences Across Multiple RDF Datasets Juan Salas, Andreas Harth
Outline Motivation NeoGeo Vocabularies Geospatial Datasets Integration Challenges Finding Geometric Equivalences Conclusion
Motivation • Geodata is becoming increasingly relevant. • Location-based services • Mobile applications • Ever increasing amount of sensor data (phones, satelites) • Different sources. • Many formats: • GML, KML, Shapefile, GPX, WKT, RDF?… Applications require integrated access to geodata.
NeoGeo Vocabularies • Geometry Vocabulary– http://geovocab.org/geometry • Representation of georeferenced geometric shapes. • Spatial Ontology– http://geovocab.org/spatial • Representation and reasoning on topological relations based on the Region Connection Calculus (RCC).
Geospatial Datasets • GADM-RDF– http://gadm.geovocab.org • RDF representation of the administrative regions of the GADM project: http://gadm.org • NUTS-RDF– http://nuts.geovocab.org • RDF representation of Eurostat's NUTS nomenclature. They serve as: • New geospatial information on the Semantic Web. • Bridges between already published spatial datasets. • Proof-of-concept platforms.
Integration Challenges • Vocabularies – http://geovocab.org/doc/survey.html • Survey of several well-known Linked Data datasets (Ordnance Survey, GeoLinkedData.es, LinkedGeoData.org, GeoNames, DBpedia). • Identified properties and classes mapped to the NeoGeo vocabularies published at GeoVocab.org • Instances • Finding equivalences between regions across multiple datasets at the geometry level.
Finding Geometric Equivalences Geometric shapes will not be vertex by vertex equivalent. A sensible criterion for finding geometric equivalences is needed. • NUTS-RDF and GADM-RDF have different: • Sampling values • Scales • Starting points • Rounding effects
Algorithm Overview WGS-84, Plate Carrée projection 1 Hausdorff distance 1 spatial:EQ *
1. Retrieve sample data • The algorithm requires: • WGS-84 coordinate reference system. • Plate Carrée projection: X = longitude Y = latitude • Coordinates are treated as Cartesian. • Distorts all parameters (area, shape, distance, direction). • Geometric shapes are equally distorted on both datasets. • Local reprojections are avoided (e.g. UTM). • Units will be presented in centesimal degrees.
2. Similarity threshold function The Hausdorff Distance provides a measure of similarity between geometric shapes. Can be intuitively defined as the largest distance between the closest points of two geometric shapes.
2. Similarity threshold function Smaller regions need a lower Hausdorff Distance threshold than larger regions.
2. Similarity threshold function We calculate the midpoint value between the Hausdorff Distances for a correct guess and the lowest wrong guess.
2. Similarity threshold function We perform regression on the midpoint values to obtain the Hausdorff Distance threshold function.
Poor Geospatial Information Sometimes location is approximated as a single point. Can lead to false assertions while calculating containment relations. <http://dbpedia.org/resource/Germany> geo:lat 52.516666; geo:long 13.383333 . <http://nuts.geovocab.org/id/DE30_geometry> rdf:type ngeo:Polygon . Germany is not contained in Berlin. Other properties must be considered to calculate containment relations (e.g. rdf:type). Other spatial relations (e.g. spatial:EQ) cannot be calculated.
Optimizations The cost of calculating the Hausdorff distance depends on the amount of vertices. The Ramer-Douglas-Peucker algorithm allows to simplify geometric shapes, using an arbitrary maximum separation.
Spatial Databases • The algorithm works also well with spatial databases (e.g. PostgreSQL / PostGIS): SELECT g.gadm_id, n.nuts_id FROM nuts n INNER JOIN gadm g ON (n.geometry && g.geometry) WHERE n.shape_area BETWEEN (g.shape_area * 0.9) AND (g.shape_area * 1.1) AND ST_HausdorffDistance( ST_SimplifyPreserveTopology(n.geometry, 0.5), ST_SimplifyPreserveTopology(g.geometry, 0.5) ) < g.max_hausdorff_dist;
Evaluation GADM 2_13988 Leicestershire NUTS UKF2 Leicestershire, Rutland and Northamptonshire • Not every NUTS region matches a GADM region. • Many NUTS regions represent parts or aggregations of GADM administrative boundaries. • 1,671 NUTS regions => 965 matches & 13 false positives.
Conclusion • NeoGeo vocabularies: • Survey and mappings to other vocabularies. • NUTS-RDF and GADM-RDF datasets: • GADM-RDF links to DBpedia, UK Ordnance Survey and NUTS-RDF. • Linked Data Services for accessing/querying spatial indices (withinRegion, boundingBox). • Work on spatial similarity metrics: • Promising results
Future Work • NeoGeo vocabularies. • Temporal context. • Datasets: • More Earth and space science data. • Add more instance mappings. • Spatial similarity: • Improve precision. • Develop tools to support the mapping process. • More experiments: • Querying of integrated data and reasoning.
Acknowledgements European Commission's Seventh Framework ProgrammeFP7/2007-2013 (PlanetData, Grant 257641)