500 likes | 641 Views
Source Description-based Approach for the Modeling of Spatial Information Integration. Yoshiharu Ishikawa and Hiroyuki Kitagawa University of Tsukuba {ishikawa,kitagawa}@is.tsukuba.ac.jp. Outline. Background Our Objective and Approach Motivating Example Data Model
E N D
Source Description-based Approach for the Modeling of Spatial Information Integration Yoshiharu Ishikawa and Hiroyuki Kitagawa University of Tsukuba {ishikawa,kitagawa}@is.tsukuba.ac.jp
Outline • Background • Our Objective and Approach • Motivating Example • Data Model • Query Specification and Source Description • Query Processing • Conclusions and Future Work
Background: Spatial Information Sources (1) • Spatial information sources: emerging new information sources on the Internet • information sources that provide region- or location-oriented information • some of them support mobile users with GPSs and hand-held devices
Background: Spatial Information Sources (2) • Need for the technology to integrate spatial information sources • description of spatial information sources by taking their contents into consideration • efficient and effective query planning and processing Spatial Information Integration
Background: Spatial Information Sources (3) • Standarization Efforts of Spatial Technologies • OpenGIS [5]: standardization of GIS system • POIX [6]: language for location-oriented information exchange • G-XML [7]: XML vocaburary for geographic information description • RWML [8]: road information description language • Spatial Information Services • Digital City [10], citysearch.com [11]: location-oriented information services • Ekimae Tanken Club [12]: provides local information nearby a specified rail station • MONET system [13]: provides information for car drivers
Background: Heterogeneous Information Integration (1) • Popular approach for information integration • well-known wrapper-mediator approach • Wrapper • encapsulates the detail of each information source • provides abstract uniform view of the source • Mediator • selects appropriate information sources for a given query • query planning and processing
Background: Heterogeneous Information Integration (2) Unified Access to the Integrated Information Mediator Heterogeneous Information Integration System Wrapper Wrapper Wrapper Wrapper Information Source A Information Source B Information Source C Information Source D
Outline • Background • Our Objective and Approach • Motivating Example • Data Model • Query Specification and Source Description • Query Processing • Conclusions and Future Work
Our Objective • Development of a spatial information integration framework for location-aware information services • integration of heterogeneous spatial information sources • heterogeneity of the contents of the sources • heterogeneity of the capabilities of the sources • provide useful location-oriented information service to mobile users • selection of neighborhood geometric features
Our Approach • Development of a description method to represent spatial information sources • based on the source description framework: describes the contents and the service of the source • introduction of spatial data types and spatial operators: based on OpenGIS standard • Development of query planning and processing methods that effectively utilize source descriptions • selection of appropriate information sources for a given query • effective use of the query processing power of each information source
Outline • Background • Our Objective and Approach • Motivating Example • Data Model • Query Specification and Source Description • Query Processing • Conclusions and Future Work
Motivating Example (1) Mediator Heterogeneous Information Integration System Wrapper Wrapper Wrapper Wrapper Information Source A Information Source B Information Source C Information Source D
Motivating Example (2) • Global Schema • based on the relational model • represents a virtual database schema • each information source is (partially) mapped to the global schema relationRestaurant { relationEvalouation { namestring; namestring; categorystring; scorereal; addressstring; }; locationpoint; };
Query issued by the user: show top-20 nearest restaurants such that within 1000 meters from the current position the score is more than or equal to 2.5 stars Motivating Example (3) 7 6 1000m 2 5 1 4 3 p SELECT r.name, r.address FROM Restaurant as r, Evaluation as e WHERE r.name = e.name, e.score >= 2.5 Distance(r.location, p) <= 200 ORDER BY Distance (r.location, p) STOP AFTER 20 SQL representation
Information Source A: provides restaurant info for a specific area Contents: contains information of restaurants within the rectangle area r Capability: given name or address, it returns the matched restaurants Motivating Example (4) r
Information Source B: supports spatial conditions to query restaurant info Contents: contains information about restaurants Capability returns restaurants within the specified circle area receives additional condition on restaurant category Motivating Example (5) category = “Chinese”
Information Source C: supports spatial conditions to query restaurant info Contents: contains information about restaurants Capability returns restaurants that match the specified name if an optional polygon is given, it only returns restaurants within the specified polygon region Motivating Example (6) name like “%Sushi”
Information Source D: provides restaurant evaluation scores given restaurant name, it returns the evaluation score Motivating Example (7) select * from Source-D where name like “%Sushi” score name 3.0 Tokyo Sushi 2.7 Edo Sushi
Outline • Background • Our Objective and Approach • Motivating Example • Data Model • Query Specification and Source Description • Query Processing • Conclusions and Future Work
Data Model for Integration • The relational model enhanced with spatial data types and spatial operations • Spatial data types and spatial operations are based on OpenGIS proposal [5] • A wrapper for each spatial information source wraps the operations of the source, then provides OpenGIS-conformed operations • A wrapper for a source provides a subset of OpenGIS operations, depending on the capability of the source
Spatial Data Types • Based on OpenGIS Proposal • To simplify the problem, we only considers Point, LineString, and Polygon types Geometry Geometry Point Point Curve Curve Surface Surface GeometryCollection MultiPoint MultiPoint MultiCurve MultiCurve LineString Polygon MultiSurface MultiSurface Our Target
Spatial Operations (1) Spatial Predicates of OpenGIS equals(g1,g2) g1 and g2 are equal disjoint(g1,g2) g1 and g2 ao not have any overlap intersects(g1,g2) g1 and g2 have intersections touches(g1,g2) g1 and g2 touch at one or more points crosses(g1,g2) g1 and g2have intersections within(g1,g2) g1 is contained in g2 contains(g1,g2) g1 contains g2 overlaps(g1,g2) g1 and g2have one or more overlaps
Spatial Operations (2) semantics name return type MBB of g envelope(g) Geometry mindist between g1and g2 distance(g1,g2) Double isempty(g) g is empty Integer intersection of g1 and g2 Geometry intersection(g1,g2) unified region of g1 and g2 Geometry union(g1,g2) Spatial Functions of OpenGIS
Outline • Background • Our Objective and Approach • Motivating Example • Data Model • Query Specification and Source Description • Query Processing • Conclusions and Future Work
Source Description Framework • Source Description Framework: a formal framework to specify meta information for an information source • proposed by Information Manifold [3] • A source description consists of: • Contents Description: describes the contents of the source in terms of the global schema • Capability Description: describes the types of queries which the source can support • We extend the source description approach by considering OpenGIS data types and operations
Query Description (1) • Query Description • An extension of a conjunctive query: it can contain • spatial predicates (e.g., intersects, contains) • spatial functions(e.g., envelope,distance) • use of additional comparison operators (e.g., ≤) • General form of a conjunctive query R1,…,Rn: global relations u, u1,…,un:sequences of variables c1,…,cm(m 0) : conditions ans(u) R1(u1),…,Rn(un), c1,…,cm
Query Description (2) • Show restaurants within 1000 meters from the current position and their scores are larger than or equal to 2.5 stars SELECT r.name, r.address FROM Restaurant as r, Evaluation as e WHERE r.name = e.name, e.score >= 2.5 Distance(r.position, p) <= 1000 ans(n, a) Restaurant(n, c, a, l), Evaluation(e, s), n = e, s 2.5, distance(l, p) 1000
Spatial Query Conditions • For spatial query condition, we allow the following spatial range restriction predicates (g is a geometric constant) • equals(g, g) and equals(g, g) • within(g, g) • contains(g, g) • Also, we allow distance-based range restriction conditions(g is a Geometry object, d is a real constant, is < or ≤) • distance(g, g) θd
Source Descriptions (1) • A source description consists of • contents description • capability description pat: mandatory input arguments (input pattern) out: denotes the condition issued to the underlying source when the input arguments (pat) are given contents : S(u) R(u), c1,…,cn example: S(n, c, a, l) Restaurant(n, c, a, l), c = “Italian” filters:pat out
Information Source A: provides restaurant info for a specific area Contents: contains information of restaurants within the rectangle area r Capability: given name or address, it returns the matched restaurants Information Source A r
Source Description for A • Source A provides restaurant information • provides information within r • also allows retrieval by restaurant name and address Source A contents: SARestaurant(n, c, a, l), contains(r, l) filters: <n: string> n = n, <a: string> a = a
Information Source B: supports spatial conditions to query restaurant info Contents: contains information about restaurants Capability returns restaurants within the specified circle area receives additional condition on restaurant category Information Source B category = “Chinese”
Source Description for B • Source B provides restaurant information • inputs are a query point (p) and a threshold value of distances (d) • allows an additional filtering condition based on the restaurant category (c) Source B contents: SBRestaurant(n, c, a, l) filters: <p : Point, d : real> distance(l, p) d, <c: string> c = c
Information Source C: supports spatial conditions to query restaurant info Contents: contains information about restaurants Capability returns restaurants that match the specified name if an optional polygon is given, it only returns restaurants within the specified polygon region Information Source C name like “%Sushi”
Source Description for C • Source C provides restaurant information • returns restaurants that match the specified name (n) • allows additional filtering condition based on polygonal region (g) Source C contents: SCRestaurant(n, c, a, l) filters: <n: string> n = n, <g: Polygon> contains(g, l)
Information Source D: provides restaurant evaluation scores given restaurant name, it returns the evaluation score Information Source D select * from Source-D where name like “%Sushi” score name 3.0 Tokyo Sushi 2.7 Edo Sushi
Source Description for D • Source D provides restaurant evaluation scores • allows retrieval by restaurant name and/or evaluation score Source D contents: SDEvaluation(n, s) filters: <n: string> n = n, <s: real> s θs(θ in {=, ≠, <, >, ≤, ≥})
Outline • Background • Our Objective and Approach • Motivating Example • Data Model • Query Specification and Source Description • Query Processing • Conclusions and Future Work
Overview of Query Processing (1) • Query Plan Construction 1. Preprocessing - Validation of the correctness of the given query according to the global schema - deletion of redundant variables - simplifications of expressions 2.Selection of useful information sources based on contents description 3.Pushing query conditions into the underlying information sources as possible 4. Generation of the integrated query plan
Overview of Query Processing (2) Integration of Subquery results Receives partial results query result Source selection based on contents description Pushing subqueries to the sources Mediator • query validity check • query simplification Wrapper Wrapper Wrapper Wrapper Source A Source C Source B Source D
Usage of Source Descriptions • Contents Description • used to select useful information sources to process the given query • also used to eliminate redundant join conditions • Capability Description • used to decide whether a wrapper on a source can process the given query condition using its query processing capability • also used to generate a subquery to an information source
Selection of Information Source (1) • Unifies the given query condition and a contents description of a information source Query : ans(u) R1,…,Rn, c1,…,cm Contents Description : SR(v) Ri(v), e1,…,en possibility condition for an information source to fulfill the given query condition: x1…xn(c1 … cm e1 … en) = true
Selection of Information Source (2) Example: a query over the global schema: ans(n) Restaurant(n, c, a, l), distance(l, p) 1000 Source Description for E: SE (n, c, a, l)Restaurant(n, c, a, l), c = “Italian”, contains(r, l) Source E has a possibility to satisfy the subquery if: c, l (c = “Italian” contains(r, l) distance(l, p) 1000) = true
Selection of Information Source (3) simplification of the possibility condition: l(contains(r, l) distance(l, p) 1000) = true intersects(r, circle(p, 1000)) = true supported area by source E query region r 1000m p
Elimination of Redundant Joins Example: a query over the global schema: ans(n, m) Restaurant(n, c, a, l), BusStop(m, p), distance(l, p) 200 Contents Description for Sources F and G: SF (n, c, a, l)Restaurant(n, c, a, l), contains(r, l) SG (m, p)BusStop (m, p), contains(s, p) F and G may satisfy the query if distance(r, s) 200 200m region of A region of E
Pushing Query Conditions (1) • Check the possibility that the given query condition can be processed by the source • When the query condition and the filtering condition (supported by the source) are equivalent • direct push • There is no equivalent condition, but if the source has more general condition • transform into more general condition then push to the source • we need an additional step to check the retrieved results exactly satisfy the given query condition
Pushing Query Conditions (2) Capability Description of the Source: Source C contents: SCRestaurant(n, c, a, l) filters: <n: string> n = n, <g: Polygon> contains(g, l) ans(n) Restaurant(n, c, a, l), contains(r, l) Query: push contains(r, p) to the source C
Pushing Query Conditions (3) Query: ans(n) Restaurant(n, c, a, l), distance(l,p) 1000 Source Description for the Source: Source H contents: SHRestaurant(n, c, a, l) filters: <n: string> n = n, <g: Polygon> intersects(l, g) push condition intersects(p , envelope(circle(p, 1000))) then examine distance(p, circle(p, 1000)) 1000 for the retrieved data
Outline • Background • Our Objective and Approach • Motivating Example • Data Model • Query Specification and Source Description • Query Processing • Conclusions and Future Work
Conclusions and Future Work • Conclusions • Proposal of a framework for heterogeneous spatial information sources • based on source description framework • contents description • capability description • use of data types and operations of OpenGIS proposal • query processing strategies • source selection • pushing query conditions • Future Work • investigation of source selection and query planning strategies • more formal framework (e.g., constraint-based approach)